Font Size: a A A

Research And Design Of Classification Algorithm Based On Massive Multi-label Text

Posted on:2018-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y FangFull Text:PDF
GTID:2348330518994697Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Text classification is an important processing technology of text processing, and it is also an important research field of natural language.With the advent of the information age, the rise of mobile Internet, text information was explosive growth, the increase in the size of information and information complexity to enhance the automatic classification algorithms and computing power has brought great challenges. Secondly,in the case of multi-label network users, the research on the classification of such text is not particularly deep. This paper will combine clustering analysis. Classification algorithm to study the relationship between multi-label and classification criteria to improve the classification of multi-label classification algorithm, combined with distributed technology to solve the problem of massive text calculation, the final deployment of a multi-label classification system can be implemented experimental system .The main work of this paper is as follows:1. This paper analyzes the data characteristics of multi-label text, and studies the classification algorithm and data processing technology related to traditional classification. The text processing techniques such as text segmentation, text feature and text clustering will be described in detail. The technical application related to multi-label classification is analyzed and studied, and the technical difficulties of multi-label classification are clarified.2.This paper designs a classification model of massive text based on multi-label. Based on the traditional classification technique, this model reduces the data freedom of the multi-label text by clustering analysis. At the same time, it combines the keyword extraction technique to clarify the classification criteria, and finally improves the accuracy of multi-label classification. At the same time, this paper also designs a data processing framework based on Map / Reduce, which greatly improves the run-time efficiency of multi-label classification.3.This paper realizes the multi-label classification system based on distributed platform, and uses label-clustering and text-classification in the distributed framework, and successfully applies the system to a text categorization project of a unit, and has good operation results.
Keywords/Search Tags:multi-label classification, text feature, label clustering, hadoop
PDF Full Text Request
Related items