Research On Eigenvector Mapping Algorithm Based On Multi-label

Posted on:2019-02-16

Degree:Master

Type:Thesis

Country:China

Candidate:T Wang

Full Text:PDF

GTID:2348330542498863

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Multi-label text classification has been the research difficult problem in the field of Natural Language Processing(NLP),because of the uncertainty number of its class label,but also the biggest problem in the task of text classification.At this stage,multi-label text classification algorithms focus more on the output space of multi-label classifiers,but the research on text vectorization of input space is relatively lacking.Due to the fact that the classification problem takes a lot of manpower for text annotation,a good text representation algorithm is crucial for improving the classification performance when the annotation sample size is relatively small.In this paper,an eigenvector mapping algorithm based on multi-label information is proposed,and based on this,an improved multi-view semi-supervised learning algorithm is proposed to improve the classification performance furthermore,to provide data support for tobacco control public opinion analysis.In this paper,the main work and research can be divided into the following three parts:First of all,crawling a large number of news reports related tobacco control from some major news search engine websites through web crawler,and then,making multi-label manual annotation and text preprocessing for some data.Secondly,analyzing the status quo of text vectorization representation and multi-label classification.In view of some shortcomings at the present stage,some concrete improvement measures are put forward.The text representation of this paper is based on word embeddings,which avoids the problems of uncontrolled vector dimension or lacking of text semantic information in traditional multi-label text classification.In the input space of the classifier,the multi-label is highly correlated,the eigenvectors of the text are mapped based on the feature information of the positive and negative samples corresponding to the multi-label,therefore,the input characteristics of the same news under different labels are mapped to a different vector representation.The effectiveness of this algorithm is verified on a tobacco control dataset.Finally,in order to make full use of unlabeled news data to avoid wasting resources and improve classification performance furthermore,this paper improves semi-supervised learning on the basis of feature mapping vector representation.Using the structural characteristics of news data,this paper constructs a multi-view structure by using different classifiers for news headlines and texts,and takes concrete measures on sample imbalance,drawing lessons from the integrated learning in the final model discrimination stage,and improves the generalization ability of the model.

Keywords/Search Tags:

Multi-label text classification, word embeddings, text representation, eigenvector mapping, Semi-Supervised Learning

PDF Full Text Request

Related items

1	Research On Text Classification Algorithms Based On Machine Learning
2	Research On Multi-label Text Classification Based On Semi-Supervised Learning
3	Research On Text Classification Algorithms Based On Semi-supervised Learning
4	Research On The Essential Technology Of Multi-Label Chinese Text Classification
5	Research And Implementation On Text Classification In Vertical Domain
6	Multi-label Text Classifification Model Based On Correlation-guided Representation
7	A Research On Text Vector Representation Based On Semantics
8	Research On Multi-Label Text Classification Based On Deep Learning
9	Research On Key Techniques Of Short-text Representation And Classification Based On Hybrid Semantic
10	Research On Semi-supervised News Text Classification Method Based On Deep Learning