| Named entity recognition technology is a hot spot in the current research of natural language processing problems,Chinese named entity recognition of chemical text resources,which has extensive significance for domestic medical and chemical engineering education and other fields.Chinese chemical naming entity structure does not have strict word formation rules to follow,and the recognition entity contains letters,numbers,special symbols and other forms,the traditional word vector model cannot effectively distinguish between nested entities and ambiguous entities in chemical terms.In the context of information education,entity recognition of high school chemistry test question samples can improve and improve the effect of the test question retrieval system.It is of great application value to provide more feasible guidelines for teachers’ precision teaching and students’ personalized learning.The main work of this thesis includes:1.In view of the fact that there is no strict word formation rule to follow in the structure of Chinese chemical named entities,and the characteristics of identifying entities containing letters,numbers,special symbols and other forms,this thesis divides the named entities of high school chemistry test resources into four categories:substances,properties,quantities,and experiments,and constructs a vocabulary list of chemistry subjects to assist manual labeling.Based on the BILSTM-CRF classical model,a lightweight ALBert-BiGRU-CRF model was constructed to identify the named entities of high school chemistry papers,and the accuracy,recall and F1 values reached 94.23%,93.56% and 93.89%,respectively.2.This thesis constructs the domain knowledge graph of high school chemistry,traverses the synonyms and upper(lower)synonyms of the search keywords on the knowledge map for keyword enhancement,uses the weighting improvement algorithm of TF-IDF to calculate the score of keyword combination in each topic respectively,and realizes keyword-based question recommendation according to the search ranking results.At the same time,the results of Word2vec model training are used to label the word vector of the long text to be retrieved after word segmentation,and the sentence vector of the question text is constructed by taking the average value,and then the correlation size between the retrieved text and each question is calculated and sorted by cosine similarity,so as to realize the question recommendation based on the long text of the question.3.Design and implement the high school chemistry test question recommendation prototype system,match the most similar question resources according to the keyword information retrieved by teachers,and automatically mark and store the test questions uploaded by teachers and users,so as to provide good resource acquisition classification retrieval services for teachers to prepare lessons. |