Font Size: a A A

Research Of Statiscal Language Model N-best Reranking Algorithm

Posted on:2014-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y M GuoFull Text:PDF
GTID:2298330422990602Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Language model is a very important area of natural language processingresearch. It uses statistical methods to estimate the probability distribution of wordsin the language. This probability distribution is used to calculate the probability of asentence. With the launch of Apple’s Siri, the mature of online translation systemand the popularity of Intelligent Pinyin input method, as a important part of theseapplication, statistical language model has also catch people’s attention. However,the statistical language model, after all, is based on statistics. Therefore, using othertechniques to rerank the N-best result of the language model has begun to receiveattention.Currently reordering algorithm using these models: make the N-best resultsalignment by word, then reconstruct output and recalculate score, select highestscore as a new selected optimum candidate; or add other information to languagemodel outputs to reorder, then select the highest score as the best result; or put thetest data to different systems, and integrate these different systems results. However,these methods do not analyze the perspective of linguistics to improve N-best.For language model N-best rerankering, we do a series of work. First, in orderto improve accuracy of language model, making reordering better, the subject firstestablish a wide coverage, the larger corpus of data, build the foundation for a goodperformance of statistical language models. Second, when using the corpus fortraining, to build a dictionary and selected the most representative of the dictionaryused for language model training. Third, reranking algorithms for N-best resultswhich mentioned in the previous paragraph do not improve the linguistic analysis,we propose POS N-gram model, POS-word co-occurrence model and a newreranking algorithm, by inserting sub-models which reflect Other language featuresof N-best from multiple perspectives and through linear interpolation method to linkthe other sub-models with N-best result and use the minimum error rate trainingmethod to train a group of weights of the sun-models, then re-score the N-best resultand select the optimal candidate. Fourth, we use the above language model andre-sorting algorithm in speech recognition applications, using863data for testing,through the experimental results, can be found language model used a large-scale,wide coverage corpus has better performance and presented rerankering algorithmcan find better optimal candidate of speech recognition.
Keywords/Search Tags:statistical language model, re-ranking, minimum error rate training, Speech recognition
PDF Full Text Request
Related items