Font Size: a A A

Research On Multi-Label Classification And Automatic Indexing Of Agricultural Text

Posted on:2023-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:H M XiangFull Text:PDF
GTID:2543307022990699Subject:Agricultural Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,the problem of information overload has become increasingly prominent.In particular,in the process of agricultural information construction,there are massive text data with low standardization and uneven quality,and the contradiction between "data flooding" and "knowledge poverty" is more prominent due to the relatively weak ability of user groups to obtain and use information.In order to meet people’s more efficient and accurate access to valuable agricultural information and users’ personalized needs,through text multi label classification and text-based agricultural web page information automatic indexing technology,to realize the effective organization and efficient utilization of online agricultural information resources is also one of the key problems to be solved in agricultural information services,which has high research value and significance.This paper mainly carries out the following three aspects of research:(1)Aiming at the problem of information overload and the weak ability of users to retrieve and search agricultural information resources,a multi-dimensional network agricultural text classification and annotation system is developed,and the corresponding coding system is designed.The multi-dimensional network agricultural text classification and annotation system is divided into five dimensions: time dimension,regional dimension,agricultural variety dimension,industrial chain dimension and agricultural information category dimension.(2)According to the five dimensional network agricultural text classification and annotation system,this paper studies the multi label classification of agricultural text according to the category dimension of agricultural information.This paper classifies agricultural texts based on ALBERT-Seq2 Seq model.The model first uses ALBERT to dynamically extract the text word vector,then uses the sequence to sequence model to classify the agricultural text information,predicts the corresponding labels of the article through the Bi-LSTM encoder and the decoder integrating the attention mechanism,and obtains the results of multi label classification.The F1 value of the model on the agricultural text multi label dataset is 89.5%,and the loss value is 0.0469.Compared with ALBERT,ALBERT-Text CNN and ALBERT-Dense Net,the F1 value of the model is increased by 0.2%,1.3% and 1.5% respectively,and the loss value is reduced by 4.7%,14.3%and 10.3% respectively.The experimental results show that the ALBERT-Seq2 Seq multi-label classification model proposed in this paper can effectively improve the effect of agricultural text classification.(3)According to the five dimension network agricultural text classification and annotation system,the time dimension and regional dimension use the methods of regular matching and literal matching to match the keywords of the text.For the Agricultural Variety dimension and industrial chain dimension,the agricultural text automatic indexing technology is adopted.Firstly,the TF-IDF fusion word vector technology is used to filter,screen and merge the words after word segmentation,and then the keyword automatic indexing of agricultural text is carried out by introducing the characteristics of word position,part of speech and word span,combined with TF-IDIWF algorithm.The F1 value of the algorithm on agricultural text unbalanced data set is 57.08%,which is 9.12% and 1.24% higher than TF-IDF and TF-IWF algorithms respectively;Compared with TF-IDF,TF-IWF,Text Rank,LSI and LDA algorithms,the indexes on the balanced data set are significantly improved.Compared with TF-IDIWF algorithm,the F1 value of TFIDIWF algorithm on unbalanced data set is increased by 0.8% and that on balanced data set is increased by0.12%.The experimental results show that the TF-IDIWF automatic indexing algorithm combined with automatic word assignment indexing technology can effectively improve the accuracy of agricultural text indexing.
Keywords/Search Tags:Classification and labeling system, Multi label classification, Automatic indexing, Agricultural text, Multidimensional labeling
PDF Full Text Request
Related items