Based Multi-class Chinese Text Automatic Classification

Posted on:2003-01-30

Degree:Master

Type:Thesis

Country:China

Candidate:Z L Lu

Full Text:PDF

GTID:2208360092499097

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the application and popularization of the computer and Internet technology, the data and information obtained through various channels is increasing at a fantastic speed, and the contradiction between "abundant data and usable information" comes to prominence. How to find quickly and effectively, and position accurately the useful information while eliminating the useless and irrelevant contents out of such a large amount of information has become a bottleneck of knowledge acquisition and information filtering, which is the mainstream technology in the field of information development and processing.This very thesis focuses on the discussion of the automatic classification methods of Chinese texts on the basis of machine learning. The basic conception of machine learning is to load the human knowledge and methods as well as the knowledge concerning the objects to be recognized by classification into the computer, which works out the rules of classified recognition and the programs of analysis; the automatic classification of the text is to judge on the text unclassified in accordance with the rules of recognition and the programs of analysis, aiming at classifying the text. The classifier is the core of the classifying system, which can be improved through machine learning whenever necessary.Through discussing such core technologies in the automatic processing of Chinese information as automatic word segmentation, feature selecting and automatic representation of texts, the thesis makes some improvements and perfection on the current methods of automatic word segmentation and text space reduction of Chinese texts, therefore improved their efficiencies and effects. With regard to the methods of text classification, the paper introduced two supervisory automatic classification methods of Chinese texts based on multi-classification, i.e. fuzzy clustering and boosting, which settled the problem of low percentage of recall. Through comparing the results of experiments with the two methods, an automatic classification system of multi-classification texts is constructed based on the boosting method, which received good effects in application and provides a good resolution to the problem of real-time classification of information.

Keywords/Search Tags:

multi-classification, machine learning, word segmentation, term space reduction (TSR), test representation, fuzzy clustering, classifier

PDF Full Text Request

Related items

1	Cluster-based Image Segmentation And Classifier Design
2	Research On The Term Weighting Scheme And Text Representation Strategy For Text Categorization
3	Research On Multi-View Fuzzy Clustering Methods Based On Representation Learning
4	Research Of Manifold Learning In Data Dimension Reduction And Classification
5	Research On New Hierarchical Fuzzy Classification Learning Method
6	Design And Implementation Of Multi-classifier Based On Information Classification System
7	Integrating information theory measures and a novel rule-set-reduction technique to improve fuzzy decision tree induction algorithms
8	Research On Fuzzy Clustering Algorithm In Pattern Classification
9	Spectral Clustering And Dimension Reduction Algorithms With Their Applications
10	Research On Chinese Word Segmentation Based On Text And Audio