| The mathematical formula is a special kind way of symbolic expression, and is a kind ofnonlinear structure symbol description. In some science and technology documents, it canmake the articles show more clear logic relationships. How to put retrieving the mathematicalformulas into practice as the common text is one of the topics in the field of informationretrieval currently.This paper studies the retrieval methods for mathematical formulas based on the Latex,by using the Lucene framework. Firstly, choice Latex as the mathematical formula descriptionlanguage. After analyzing the Latex language in details, structure an analyzer for themathematical formulas. According to the common character set and the special character setin the formulas, the analyzer is structured separately. The special character set includesfunctions, operators, formula synonymous symbols and so on. According to the conventionalwords segmentation algorithm, realize the splitting of mathematical formulas and analyze theLatex mathematical formula completely. Then, based on Lucene framework, make up threemodules for Latex mathematical formula, which are pretreatment module, index module andsearch module. In theory, the pretreatment module can convert different types of files, andchange them into the type which can be processed by Lucene. The index module creates theindex according to the Level classification method, which makes the operation as the mainline. The search module uses the fuzzy inquiring in Lucene, and improves the recall of theretrieval system. Finally, the retrieval prototype system based on the Latex mathematicalformula gets a relative satisfactory experimental effect. |