Research And Implementation Of Space Information System Based On Chinese Retrieval Error Correction

Posted on:2020-06-30

Degree:Master

Type:Thesis

Country:China

Candidate:S Cui

Full Text:PDF

GTID:2392330602952129

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

The rapid development of space technology has accumulated a large amount of space intelligence information,which puts forward higher requirements for information management.Traditional manual management methods can no longer meet the needs.In order to manage space intelligence more scientifically and fully tap and make good use of the potential value of information,this thesis completes the design and development of space intelligence system.In addition to efficient information management and information retrieval functions,the system also associates data in various kinds of information,and provides personalized statistical functions and omni-directional data visualization services.In order to meet user needs and improve user experience,this thesis also studies the Chinese search engine error correction technology in detail.In the process of retrieval,users may not retrieve the desired results because of input errors.Retrieval error correction technology is to correct the erroneous input,and then return the corrected results for retrieval,so as to present the correct content to users as far as possible and improve the retrieval error tolerance rate.At present,Chinese error correction methods are divided into dictionary-based method and statistics-based method.Dictionary-based method is difficult to adapt to the rapidly changing information environment,and it is difficult to deal with all kinds of input errors in search engines by using only one error correction method.Therefore,this thesis summarizes several common error types through the analysis of log information,and aims at these problems.A set of error correction methods for Chinese retrieval based on statistics is designed.The main idea of the error correction method proposed in this thesis is to use hidden Markov model and editing distance method to determine candidate sets together,and then solve the problem of candidate set selection through multiple evaluation model,and finally select the optimal error correction results.The determination of candidate set is divided into two parts.One is based on hidden Markov model,and computation by Viterbi algorithm,this part is point at homophone errors;the other is based on editing distance method,this part is point at multi-word,few-word,wrong-word and sequence exchange errors.In this way,the candidate data in the candidate set covers many common types of input errors.The multiple evaluation model proposed in this thesis takes into account the character string quality,probability and point mutual information characteristics of candidate data.These evaluation elements have certain reference significance for judging the quality of the string.The character string quality is closely related to the number of keyword searches and clickthrough rate.The string probability can measure whether the candidate data conforms to Chinese language habits or not,and point mutual information reflects the degree of correlation between keywords in candidate data.This thesis assigns weights to these evaluation elements through experiments,and completes the establishment of multievaluation model.The corpus used in this method is extracted from search engine logs,which ensures the authenticity of training set and test set.In addition,the error correction method designed in this thesis also improves the process of aerospace information system.For example,the idea of processing Chinese characters and numbers separately is proposed to reduce the difficulty of error correction;the addition of fast error correction steps shortens the average time of error correction and improves the accuracy of error correction;the introduction of error correction judgment steps reduces the workload of error correction and improves the retrieval efficiency.The error correction method proposed in this thesis has achieved good results.In the experiment,the accuracy and F1 value of several traditional error correction methods have been significantly improved,and the feasibility of this method has been verified.

Keywords/Search Tags:

Error Correction in Chinese Retrieval, N-gram Model, Multi-strategy Evaluation Model, Information Management

PDF Full Text Request

Related items

1	Research On Error Model Building And Error Correcting Technique Of Coordinate Measuring Machines
2	Research On Multi Scale Information Retrieval In BIM Model Based On Natural Language Processing
3	Research Of Chinese Text Information Extraction In The Information Platform For Development Strategy Study Of Equipments
4	The Investigation And Application Of Hybrid Wind Speed Forecasting Model Based On Phase Space Reconstruction Theory And Error Correction Model
5	Research On Wind Speed Forecasting Based On Error Correction And Fuzzy Evaluation
6	Research And Application Of Water Environment Parameter Retrieval Algorithm Based On Multi-source Remote Sensing Data
7	Study On Threat And Error Management Model Of MPL Training
8	The Static Correction Method Based On Probability Model Of The Beam Structure
9	Design And Research Of Parallel Retrieval Compact Automated Storage And Retrieval System
10	Research On Retrieval And Analysis Of MEP Information In Facility Management Based On Domain Ontology