Font Size: a A A

A Study On Term Extraction For Middle School Mathematics Based On Sequence Labeling Model

Posted on:2022-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:X HuaFull Text:PDF
GTID:2517306497452014Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Terminology is the most useful content in the text of a specific field,it not only has its own applications,such as dictionary and document indexing,ontology construction,etc.,but also is the key information of many downstream natural language processing tasks,such as machine translation,text classification,etc.Therefore,the automatic term extraction technology has always been a research hotspot.The middle school mathematics knowledge graph is of great significance to the personalized recommendation of online wisdom education,and the middle school mathematics terms as the main part of the middle school mathematics knowledge graph,the automatic extraction of middle school mathematics terms is more important,the middle school mathematics term extraction is also the foundation of the middle school mathematics knowledge graph construction work.The existing term extraction methods based on statistical information such as word frequency cannot accurately identify the extracted terms in the specific context,and the precision is not enough,that is,the specific text of the extracted terms cannot be obtained,and the specific context of the terms is ignored.Therefore,this paper transforms the middle school mathematics term extraction task into the sequence labeling task,based on the Lattice-CRF model,which is an excellent sequence labeling model in the field of named entity recognition.In order to better apply it in the field of middle school mathematics term extraction,this paper improves it and realizes more accurate middle school mathematics term extraction.This paper focuses on the task of middle school mathematics term extraction,and carries out the following research work:(1)The construction of middle school mathematics data set.In order to solve the problem of the lack of tagging corpus for middle school mathematics research,we collected the original corpus from standard textbooks,teachers' teaching plans and related documents,and referring to the existing open standard data sets,the method of programming automatic annotation combined with manual correction is used to construct the middle school mathematics term data set for term extraction task in this paper.(2)A model for the extraction of middle school mathematics terms is proposed,named Lattice-Label Attention-CRF.By observing and analyzing the middle school mathematics text corpus,it is found that term part-of-speech plays a certain role in identifying and extracting terms,especially polysemous term words.In addition,the attention mechanism can make the model pay more attention to the target.In this paper,the label used to identify term words is embedded,on the basis of the Lattice-CRF model,combining sequence features and part-of-speech features,the label attention mechanism is introduced,and the middle school mathematics term extraction model Lattice-Label Attention-CRF is proposed.(3)An improved BERT-LLA-CRF model for middle school mathematics term extraction incorporating BERT model is proposed.The problem of polysemy cannot be solved by using the trained fixed character and word vector,and there are defects in the representation of polysemy.In order to solve this problem,this paper uses the BERT pre-training model as the feature extractor,the BERT pre-training model can dynamically extract character features as character vectors from the input sentences,then calculate the word vector according to the corresponding character vector.The representation of the same character and word in different contexts is different.On the one hand,it can improve the polysemous term extraction,on the other hand,the overall term extraction effect can be improved.And fine-tune the BERT pre-training model in the field of middle school mathematics,so that the extracted character and word vector semantic representation are more suitable for the field of middle school mathematics,and further improve the effect of model term extraction.
Keywords/Search Tags:Middle school mathematics terminology, Knowledge graph, Term extraction, Attention mechanism, BERT
PDF Full Text Request
Related items