Font Size: a A A

Research And Implementation Of Space Information System Based On Chinese Retrieval Error Correction

Posted on:2020-06-30Degree:MasterType:Thesis
Country:ChinaCandidate:S CuiFull Text:PDF
GTID:2392330602952129Subject:Engineering
Abstract/Summary:PDF Full Text Request
The rapid development of space technology has accumulated a large amount of space intelligence information,which puts forward higher requirements for information management.Traditional manual management methods can no longer meet the needs.In order to manage space intelligence more scientifically and fully tap and make good use of the potential value of information,this thesis completes the design and development of space intelligence system.In addition to efficient information management and information retrieval functions,the system also associates data in various kinds of information,and provides personalized statistical functions and omni-directional data visualization services.In order to meet user needs and improve user experience,this thesis also studies the Chinese search engine error correction technology in detail.In the process of retrieval,users may not retrieve the desired results because of input errors.Retrieval error correction technology is to correct the erroneous input,and then return the corrected results for retrieval,so as to present the correct content to users as far as possible and improve the retrieval error tolerance rate.At present,Chinese error correction methods are divided into dictionary-based method and statistics-based method.Dictionary-based method is difficult to adapt to the rapidly changing information environment,and it is difficult to deal with all kinds of input errors in search engines by using only one error correction method.Therefore,this thesis summarizes several common error types through the analysis of log information,and aims at these problems.A set of error correction methods for Chinese retrieval based on statistics is designed.The main idea of the error correction method proposed in this thesis is to use hidden Markov model and editing distance method to determine candidate sets together,and then solve the problem of candidate set selection through multiple evaluation model,and finally select the optimal error correction results.The determination of candidate set is divided into two parts.One is based on hidden Markov model,and computation by Viterbi algorithm,this part is point at homophone errors;the other is based on editing distance method,this part is point at multi-word,few-word,wrong-word and sequence exchange errors.In this way,the candidate data in the candidate set covers many common types of input errors.The multiple evaluation model proposed in this thesis takes into account the character string quality,probability and point mutual information characteristics of candidate data.These evaluation elements have certain reference significance for judging the quality of the string.The character string quality is closely related to the number of keyword searches and clickthrough rate.The string probability can measure whether the candidate data conforms to Chinese language habits or not,and point mutual information reflects the degree of correlation between keywords in candidate data.This thesis assigns weights to these evaluation elements through experiments,and completes the establishment of multievaluation model.The corpus used in this method is extracted from search engine logs,which ensures the authenticity of training set and test set.In addition,the error correction method designed in this thesis also improves the process of aerospace information system.For example,the idea of processing Chinese characters and numbers separately is proposed to reduce the difficulty of error correction;the addition of fast error correction steps shortens the average time of error correction and improves the accuracy of error correction;the introduction of error correction judgment steps reduces the workload of error correction and improves the retrieval efficiency.The error correction method proposed in this thesis has achieved good results.In the experiment,the accuracy and F1 value of several traditional error correction methods have been significantly improved,and the feasibility of this method has been verified.
Keywords/Search Tags:Error Correction in Chinese Retrieval, N-gram Model, Multi-strategy Evaluation Model, Information Management
PDF Full Text Request
Related items