Font Size: a A A

Research On Hybrid Retrieval Model Towards Formula And Its Context

Posted on:2023-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:C ChenFull Text:PDF
GTID:2568306782966829Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Mathematical formula retrieval is a hot research topic currently,and it has essential significance in scientific literature retrieval.However,the complex structure of mathematical formulas makes obstacles for text-based retrieval methods to effectively utilize structural information.Although the tree-based formula retrieval methods can better represent the structural features of formulas,the retrieval efficiency is low due to the high matching complexity of the tree structure.However,it is currently a difficult problem to ensure retrieval efficiency while representing the structural features of mathematical formulas.In addition,mathematical formulas are highly abstract,and two formulas with similar structures may appear in different research fields,which affects the relevance of formula retrieval results.In response to the above challenges,this dissertation proposes a mathematical formula feature representation method based on word embedding technology and constructs a hybrid retrieval model oriented to formulas and contexts.The work of this dissertation is summarized as follows:1)Aiming at the complex structure of mathematical formulas,we propose the representation method on word embedding technology.We adopt the N-ary operation tree to represent the structure of the mathematical formula and perform word segmentation and serialization according to its operating form.Then we learn the feature vector of operation structure using word embedding technology.Finally,we design a weighting strategy based on the level of operation structure and word frequency to get more accurate formula features.Our method uses the vector for formula matching,which expresses the formula’s structural features and improves retrieval efficiency.2)Facing the low domain relevance of formula retrieval results,we construct a hybrid retrieval model that combines mathematical formulas and contexts based on the above methods.We extract the keyword information of the context of the mathematical formula,then obtain its semantic vector through the word embedding model,finally calculate the matching score based on the similarity of the mathematical formula and context,and get the retrieval list.The context of the formula can supplement the semantic features and improve the domain relevance of the retrieval results.3)Based on the above method and model,we evaluate and analyze them on two public datasets,respectively.Firstly,we evaluate the effect of the mathematical formula feature representation method on retrieval tasks through comparative experiments and verify the effectiveness of the weighting strategy,specification,and processing adopted by the representation method though ablation experiments.Then,according to the domain-related comparison of retrieval results,it is proved that the context semantics supplement the importance of formula retrieval,and finally analyze the impact of the keyword extraction technology on the hybrid retrieval model.
Keywords/Search Tags:Mathematical Information Retrieval, N-ary Operation Tree, Word Embedding, Formula Embedding, Mathematical Formula Similarity
PDF Full Text Request
Related items