Font Size: a A A

Research Based On FastText And Unbalanced Data

Posted on:2019-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:J B DuFull Text:PDF
GTID:2417330566993787Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
The problem of unbalance data has been a major research topic in the fields of statistics,machine learning,and computer science.If apply statistical methods directly based on balanced data with ignoring the unbalance data,the performance of model result is poor.Many scholars have conducted in-depth research on this issue and achieved remarkable results.Fast Text algorithm has a wide range of applications in text classification.The algorithm is a single-layer neural network for text classification.It can quickly and accurately classify the balance data,but it has insufficient capacity for processing unbalanced data.In order to solve this problem,this paper adopts unequal proportion of undersampling to train a single Fast Text classifier,and then combines several weak classifiers through Bootstrap.This can make full use of the majority of data information and achieve the goal of improving classifier performance.Applying the method proposed in this paper to the actual scene of text sentiment classification,a model with a higher degree of fit can be trained for unbalanced sentiment data,which can effectively improve the accuracy,recall rate and F-score of sentiment classification.
Keywords/Search Tags:Text classification, Unbalanced data, FastText, Undersampling, Bootstrap, Unproportional
PDF Full Text Request
Related items