Research On Automatic Classification Of Chinese Bibliographies Based On Machine Learning

Posted on:2019-11-29

Degree:Master

Type:Thesis

Country:China

Candidate:Q Q Chen

Full Text:PDF

GTID:2428330545990151

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

How to accurately and efficiently give the classification number of a book is always a key research issue in the field of book classification.So far,the classification number of books has mainly been generated through manual classification.However,the manual classification method has been difficult to adapt to the increasing number of books published.Therefore,it has become an urgent task to replace books manually with machines.If the machine wants to complete the automatic classification of books,the simplest and most direct method is to introduce the results of machine learning in the field of text classification.Many scholars have already done this research and achieved certain results.However,there is still room for improvement in the current research on book auto-categorization.On the one hand,these studies neglect the structural characteristics of bibliographic texts in extracting book features,and treat the information of books as a text to extract features so that the extracted features cannot Highlight the characteristics of the book.On the other hand,these studies use machine learning algorithms whether support vector machine SVM or BP(back propagation)neural network.In the training process,it takes a lot of time and effort to adjust parameters,and it is easy to produce local optimal solutions.It will directly lead to an increase in the training time of the classifier and a decrease in the classification effect.Based on this,this article mainly studies from the following two aspects:(1)Researching the problem of unspecified and inaccurate book feature extraction results,a book hybrid theme model based on LDA(Latent Dirichlet Allocation)topic model and TF-IDF(term frequency-inverse document frequency)algorithm is proposed.The model uses the improved LDA theme model to extract the hidden topics as the feature information,and uses the TF-IDF algorithm to extract the feature information from the book title information,and combines the two in a certain proportion.The experimental results show that the classification accuracy can reach 80%to 85%using the mixed features.Compared with the traditional feature model,the accuracy rate can be increased by 2%-5%.(2)The current research on automatic book classification methods(such as support vector machine SVM and feedforward neural network BP)has long time-consuming and low classification accuracy,and has introduced the Extreme Learning Machine(ELM)to In the field of book classification,an ELM-based automatic classification model for Chinese books has been proposed.The model uses the mixed features of the book as the training object.The ELM algorithm is used to calculate the hidden layer output matrix and weights of the neural network to obtain a classification model.The experimental results show that the classification model based on ELM can improve the classification accuracy from 3%to 7%compared with the classification model based on SVM or BP,and the time required for ELM is only about 1/3 of them.

Keywords/Search Tags:

Book Classification, LDA, mixed feature, Extreme Learning Machine

PDF Full Text Request

Related items

1	Application Research On Feature Extraction And Classification Of EEG Signal With The Method Of ELM
2	Research On The Text Classification Method Based On Extreme Machine Learning
3	Research On Classification Methods Based On Extreme Learning Machine
4	Improved Extreme Learning Machine For Indentification And Classification Research
5	Research And Application Of Classification Method Based On Extreme Learning Machine
6	Research On Text Classification Methods Based On Extreme Learning Machine
7	Evolutionary Extreme Learning Machine Based Feature Weighted Nearest Neighbor Classification Algorithm
8	Research On Meta-heuristic Optimized Extreme Learning Machine Based Classification Algorithms And Application
9	The Research On Sar Image Target Recognition Technology Based On Feature Fusion And Extreme Learning Machine
10	Research On The Classification Of Stroke TCD Data Based On Extreme Learning Machine