Research Of Statiscal Language Model N-best Reranking Algorithm

Posted on:2014-04-11

Degree:Master

Type:Thesis

Country:China

Candidate:Y M Guo

Full Text:PDF

GTID:2298330422990602

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Language model is a very important area of natural language processingresearch. It uses statistical methods to estimate the probability distribution of wordsin the language. This probability distribution is used to calculate the probability of asentence. With the launch of Apple’s Siri, the mature of online translation systemand the popularity of Intelligent Pinyin input method, as a important part of theseapplication, statistical language model has also catch people’s attention. However,the statistical language model, after all, is based on statistics. Therefore, using othertechniques to rerank the N-best result of the language model has begun to receiveattention.Currently reordering algorithm using these models: make the N-best resultsalignment by word, then reconstruct output and recalculate score, select highestscore as a new selected optimum candidate; or add other information to languagemodel outputs to reorder, then select the highest score as the best result; or put thetest data to different systems, and integrate these different systems results. However,these methods do not analyze the perspective of linguistics to improve N-best.For language model N-best rerankering, we do a series of work. First, in orderto improve accuracy of language model, making reordering better, the subject firstestablish a wide coverage, the larger corpus of data, build the foundation for a goodperformance of statistical language models. Second, when using the corpus fortraining, to build a dictionary and selected the most representative of the dictionaryused for language model training. Third, reranking algorithms for N-best resultswhich mentioned in the previous paragraph do not improve the linguistic analysis,we propose POS N-gram model， POS-word co-occurrence model and a newreranking algorithm， by inserting sub-models which reflect Other language featuresof N-best from multiple perspectives and through linear interpolation method to linkthe other sub-models with N-best result and use the minimum error rate trainingmethod to train a group of weights of the sun-models, then re-score the N-best resultand select the optimal candidate. Fourth, we use the above language model andre-sorting algorithm in speech recognition applications, using863data for testing,through the experimental results, can be found language model used a large-scale,wide coverage corpus has better performance and presented rerankering algorithmcan find better optimal candidate of speech recognition.

Keywords/Search Tags:

statistical language model, re-ranking, minimum error rate training, Speech recognition

PDF Full Text Request

Related items

1	Integration of multiple knowledge sources in speech recognition using minimum error training
2	Application Research On Statistical Language Model Of Large Vocabulary Continuous Speech Recognition System
3	Study On Several Key Problems In The Training Process Of Phrase-based Statistical Machine Translation
4	A generalization of the minimum classification error (MCE) training method for speech recognition and detection
5	Researching Of The Mogolian Language Model Based On Speech Recognition
6	Research On Multi-group Parameter Tuning And Decoding In Statistical Machine Translation
7	Research On Statistical Language Model Of Large-Vocobulary Continuous Speech Recognition System
8	Research On Low-bit-rate Wideband Speech Coding Algorithms Based On The Sinusoidal Speech Model
9	Post-Processing Technique For Speech Recognition
10	Discriminative Training Based On TANDEM For Speech Assessment And Evaluation System