Font Size: a A A

Disambiguation Of Biomedical Abbreviations

Posted on:2012-08-01Degree:MasterType:Thesis
Country:ChinaCandidate:H YangFull Text:PDF
GTID:2214330368492448Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advancement of computing technology and biotechnology, the amount of biomedical literature is increasing in an explosive speed. The literature contains the latest research progress and rich biomedical knowledge, how to extract the information from biomedicine literature is becoming an important research area in the field of bioinformatics. The disambiguation of biomedical abbreviations has the special significance to biomedical fields and natural language processing, and is essential for applications such as machine translation and information retrieval. The biomedical abbreviation has two forms in literature. One is that the abbreviation and its full form all appear in the literature. Another is that only abbreviation appears in the literature. Because of the form of abbreviation, the disambiguation of abbreviation has two main tendencies. The first ones rely on heuristic method, which need construct the set of rules. The second ones use techniques from statistics and machine learning to induce models of language usage from large samples.Due to the complexity and variability of biomedical texts and the form of abbreviation, the disambiguation of biomedical abbreviations is a difficult task. According as the form of abbreviation, this paper uses the rule-based and machine learning methods to disambiguate abbreviations. The input of this disambiguation system is texts. First, this system uses the identification method to find the abbreviations; second, uses the rule-based and co-occurrence method to disambiguate local abbreviations; then, judges whether to find the correct full form of the abbreviation. If the system does not find the full form in this step, then goes to the next step of global abbreviation disambiguation. The Vector Space Model is the best to disambiguate global abbreviations.Currently, there is no unified corpus for disambiguation of biomedical abbreviations, and the past research all focused on the one form of abbreviation, and these are not complete disambiguation system. This paper shows that our system realizes the disambiguation of all forms abbreviation and has a higher performance. The achievement by this paper exhibits great reference value to the future research in disambiguation of abbreviation.
Keywords/Search Tags:Biomedical Abbreviation, Abbreviation identification, Disambiguation of Abbreviations, Heuristic Method, Machine Leaning
PDF Full Text Request
Related items