Font Size: a A A

The Algorithm Research Of Chinese Information Extraction Based On The Hidden Markov Model

Posted on:2015-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2298330431993047Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of network, the number of text documents onthe web is huge and growing fast. How to obtain the user need information from thehuge network information, it is a subject of artificial intelligence and network. Inorder to obtain the users of different levels and particle size of information fromdifferent sources, people invented different kinds of technology about accessing toinformation. Strictly speaking, information retrieval is the technology of documentretrieval, text classification, text filtering, text clustering and so on, that can findrelevant documents needed by the users from a large collection of documents. Andnetwork technology can extract smaller relations or events from the relevantdocuments, in order to satisfy the users’ deeper and more fine-grained informationneeds. Saying from this meaning, information extraction(IE) is a useful complementto the document information processing technology. Information extraction is as ameans which transform from the unformatted information into a format, in order tolay the foundation for the information processing such as database query, data mining,text mining. In addition, information extraction also supports the function ofinformation retrieval, knowledge quiz, personalized information services or improvestheir performance.In the traditional first-order hidden Markov model, the output probability of theobservation is only dependent on the current state of the model; In the improvedfirst-order hidden Markov model, the output probability of the observation is not onlydependent on the current state of the model, but also dependent on the previous stateof the current state of the model; In the second-order hidden Markov model, in viewof time, the transition probability and the observed values of the model in a momentdepend on the model of historical state.Hidden Markov model dues to the use of the release probability matrix,it can dostatistical training on specific text vocabulary in order to improve the precision andrecall of information extraction. But it does not take the context feature information oftext and the lexical information of the text itself features in to account. However,thisinformation is very useful for text information extraction. MEMM starting from themaximum entropy, takes the context feature information of text and the lexicalinformation of the text itself features in to account. It has greatly improved theperformance of the information extraction. But it not statistics on specific text vocabulary, just considers the abstract characteristics. So MEMM is less than HMMin some cases.In this paper, a model of the improved first-order hidden Markov and a model ofthe second-order hidden Markov are proposed, and the algorithm of the ML and thealgorithm of the Viterbi are analyzed, by these two algorithms to contrast these threemodels’ precision in information extraction. Experiments show that the improvedfirst-order hidden Markov model and the second-order hidden Markov model aremore precise than the traditional first-order hidden Markov model. At the same time,this paper will research algorithm of information extraction in Chinese papers basedon HMM and MEMM, and analysis precision in information extraction.
Keywords/Search Tags:information extraction, hidden Markov model, maximum entropy, precision
PDF Full Text Request
Related items