| In the era of big data,forestry-related departments generate a lot of valuable data information in the course of daily activities and information construction.However,for the massive forestry texts distributed on the network,there are two main characteristics: first,the number of texts is large,the degree of dispersion is high,and the category distinction is poor;second,the larger texts do not have a unified description structure,which makes information extraction difficult.Therefore,in order to improve the availability of forestry text,this paper focuses on how to extract valuable information from forestry text accurately.Information extraction technology can be divided into supervised extraction algorithm and unsupervised extraction algorithm.Because supervised algorithms have high labeling costs and are prone to overfitting,unsupervised algorithms have gradually become the focus of research in recent years.Existing unsupervised algorithms have the following deficiencies in information extraction: First,the extracted text information is mainly considered from the perspective of keywords,ignoring the information type of words,and the keywords are inadequate in the characteristics of comprehensive words;The second is that the classification of text categories is low,and there is a lack of a unified key information extraction method for a certain type of text.This article will solve the above problems from the following three aspects:1)Fusion of word frequency-inverse document frequency features,length features,word span features and other five types of features to optimize keyword extraction formulas;2)A capsule network text classification model based on attention mechanism is proposed to classify forestry texts,and put forward a method of constructing text category label vectors based on text content to improve the classification effect;3)For the forestry texts in the same category with clear categories,from the perspective of "keywords + information types",a complete key information extraction processes is finally proposed.In this paper,10000 forestry texts are used as experimental data,a total of 5 categories,400 texts in each category,to train text classification models and construct word information collections;using 400 forestry texts marked with keywords as experimental data,to explore the extraction effect of keyword extraction formulas.The experimental results show that: 1)The keyword extraction formula proposed in this paper that combines multiple features of words is superior to other extraction algorithms in terms of extraction effect.The extraction results are the best among the four indicators such as accuracy rate,Fmeasure,MRR,and Bpref,and the recall rate ranks second;2)The classification model proposed in this paper is superior to other model combinations in classification effect.The classification accuracy rate is 95.07%,the recall rate is 92.96%,and the F-measure is 94.00%;3)The extraction results of the entire set of key information extraction processes proposed in this paper are very representative in content,and after reasonable exploration,the parameters involved in the related technology are determined,in which the threshold is set to 0.4 when constructing the graph structure of a single text,The threshold value is set to 0.5 when the graph structure of each text is merged and clustered,λ1 is set to 0.7,and λ2 is set to 0.3 when cluster filtering is performed.In summary,the research content of this article can have a positive impact on the extraction of key information in forestry texts. |