| With the continuous strengthening of information construction,the recording medium for aquaculture information has changed from traditional paper-based methods to electronic forms.Text classification can extract text with higher potential value from these massive electronic documents.Text classification is also a preliminary work for entity recognition and relationship extraction,which can improve the efficiency of constructing knowledge graphs.Due to the long professional vocabulary and degree adverbs in the aquatic field,the efficiency of traditional machine learning text classification is relatively low.Therefore,most current research uses deep learning for text classification.However,existing research methods still have the problem of insufficiently comprehensive feature extraction.Therefore,this article focuses on the research and implementation of a deep learning-based text classification system for the aquatic field.The specific research work is as follows:(1)A multi-feature fusion aquatic disease prevention and control text classification method with a fusion attention mechanism is proposed to address the problem of sparse and highdimensional text representation caused by a large number of overlong proprietary terms in the aquatic disease corpus.This method first uses ERNIE for character-level language modeling of the corpus and adds position features to improve text representation ability.Next,Bi LSTM is used to capture bidirectional semantic dependency relationships,and an attention mechanism is used to address information loss,thereby obtaining global text features.Then,the feature fusion completion module is used to fuse the original features of ERNIE with Text CNN to extract local text features.Finally,the global and local features are concatenated,and the Softmax function is used to obtain the classification results.Experimental results show that the accuracy,recall,and F1 values of text classification using this method all reach over 96%,effectively solving the problem of insufficiently comprehensive feature extraction.(2)To solve the problem of resource waste in the aquatic field due to one-to-many classification in texts,a cross-embedded attention Bi LSTM multi-label text classification model is designed.Ro BERTa model is first used to mix character-level and word-level representations of the aquatic disease corpus and dynamically update the masking mode in the text to enhance semantic representation ability.Then,the cross-embedded attention mechanism Bi LSTM is used to learn long sequence semantic information and solve the problem of long-distance dependencies,with emphasis on high-influence features.Text CNN is then used to extract the features and reduce the dimensionality of the feature vector.Finally,Soft Max is used to calculate the classification.Experimental results show that the accuracy,recall,and F1 values of the attention-embedded Bi LSTM multi-label text classification model reach 97.18%,98.06,and 97.51%,respectively,making it an effective multi-label classification method for aquatic disease texts.(3)An aquatic disease text processing system is implemented,and an aquatic disease knowledge graph is constructed.The text with potential value is filtered by the text classification function module,and the triple information is extracted by the knowledge extraction function module and stored in Excel.Then,the backend data processing module is built using Spring Boot,and the frontend display interface is built using Vue to realize the aquatic disease text processing system.Finally,the aquatic disease knowledge graph is constructed from the triple data in Excel. |