The Algorithm Research Of Chinese Information Extraction Based On The Hidden Markov Model

Posted on:2015-04-08

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zhang

Full Text:PDF

GTID:2298330431993047

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the development of network, the number of text documents onthe web is huge and growing fast. How to obtain the user need information from thehuge network information, it is a subject of artificial intelligence and network. Inorder to obtain the users of different levels and particle size of information fromdifferent sources, people invented different kinds of technology about accessing toinformation. Strictly speaking, information retrieval is the technology of documentretrieval, text classification, text filtering, text clustering and so on, that can findrelevant documents needed by the users from a large collection of documents. Andnetwork technology can extract smaller relations or events from the relevantdocuments, in order to satisfy the usersâ€™ deeper and more fine-grained informationneeds. Saying from this meaning, information extractionï¼ˆIEï¼‰ is a useful complementto the document information processing technology. Information extraction is as ameans which transform from the unformatted information into a format, in order tolay the foundation for the information processing such as database query, data mining,text mining. In addition, information extraction also supports the function ofinformation retrieval, knowledge quiz, personalized information services or improvestheir performance.In the traditional first-order hidden Markov model, the output probability of theobservation is only dependent on the current state of the model; In the improvedfirst-order hidden Markov model, the output probability of the observation is not onlydependent on the current state of the model, but also dependent on the previous stateof the current state of the model; In the second-order hidden Markov model, in viewof time, the transition probability and the observed values of the model in a momentdepend on the model of historical state.Hidden Markov model dues to the use of the release probability matrixï¼Œit can dostatistical training on specific text vocabulary in order to improve the precision andrecall of information extraction. But it does not take the context feature information oftext and the lexical information of the text itself features in to account. Howeverï¼Œthisinformation is very useful for text information extraction. MEMM starting from themaximum entropy, takes the context feature information of text and the lexicalinformation of the text itself features in to account. It has greatly improved theperformance of the information extraction. But it not statistics on specific text vocabulary, just considers the abstract characteristics. So MEMM is less than HMMin some cases.In this paper, a model of the improved first-order hidden Markov and a model ofthe second-order hidden Markov are proposed, and the algorithm of the ML and thealgorithm of the Viterbi are analyzed, by these two algorithms to contrast these threemodelsâ€™ precision in information extraction. Experiments show that the improvedfirst-order hidden Markov model and the second-order hidden Markov model aremore precise than the traditional first-order hidden Markov model. At the same time,this paper will research algorithm of information extraction in Chinese papers basedon HMM and MEMM, and analysis precision in information extraction.

Keywords/Search Tags:

information extraction, hidden Markov model, maximum entropy, precision

PDF Full Text Request

Related items

1	Research And Implementation Of Web Information Extraction Based On Improved Hidden Markov Model
2	Algorithm Research For Text Information Extraction Based On Hidden Markov Model
3	Research On Domain Entity Attribute And Event Extraction Technology
4	The Application Of Information Entropy In Machine Learning Algorithm
5	Application Research Of Hidden Markov Model In Information Extraction
6	Speaker Recognition Based On Continuous Hidden Markov Model
7	Research On Extracting Chinese Entity-relationship Based On Maximum Entropy Model
8	Web Text Information Extraction And Classification
9	Research On WLAN Indoor Location Alogrithm Based On Information Entropy
10	Web Free Text Information Extraction Based On TABLE Layout And Hidden Markov Model