Font Size: a A A

Extraction Of Geographic Information Elements And Spatial Positioning Method For The Webpage Text

Posted on:2016-09-07Degree:MasterType:Thesis
Country:ChinaCandidate:K Y WangFull Text:PDF
GTID:2180330461454173Subject:Photogrammetry and Remote Sensing
Abstract/Summary:PDF Full Text Request
The rapid development of internet and computer technology boosts network information, in which geographic information is increasing in content and gradually presenting an exponential growth, and has become an important way of acquiring and updating geographic information. Like other types of web contents, the geographic information among web information also exists in the form of text. The non-structural feature of the geographic information makes it not easy to be recognized automatically by machine, thus unable to serve as important GIS data sources for further statistics and analysis. Meanwhile, geographic information industry is increasingly requiring the information to reflect current situation. Therefore, the large-scale geographic information industry data on the internet need to be automatically extracted and converted into spatial data with geographic coordinates.Effective ways to solve the above problems are to extract the text-form geographic information online and to locate the spatial geographic information. The former way is to carry out semantic analysis and processing upon the text acquired through web crawler and then to extract the geographic information from it effectively. The latter means that the geographic information extracted is provided with spatial coordinates through approaches such as standardization and geographic information matching processing in order to be better analyzed and processed by GIS analysis tools. According to the problems existing in the recognition, extraction, and spatial location of the geographic information in multi-source text, this thesis contains the following parts:Firstly, on the basis of previous studies home and abroad on geographic information recognition, extraction and spatial location, this thesis makes use of machine learning method to recognize geographic information thus achieving part of speech tagging of Hidden Markov Model so that geographic information can be recognized through part of speech with supervised classification. The candidate addresses are pre-extracted by the method of word segmentation combined with prefix and suffix lexicon of toponymy and address. With the rules established and featuring characters of administrative districts, the matching and filtering of toponymy and address are realized. Geographic entities will be extracted by a combination of labeling and geographic entities recognition windows for the accurate extraction of geographic entities and address text information ultimately. Finally, the effectiveness of the approach advanced in this thesis is verified by experiments.Secondly, the spatial location of extracted geographic entities and toponymy and addresses is based on the spatial location reference library. As regards the toponymy and addresses, the extracted ones will first be processed for standardization according to the model rules, then determine the general areas of addresses with the help of their contextual information, and afterwards get matched with the addresses in the spatial location reference library. If the matching works, the spatial geographic coordinates of the corresponding addresses will be settled. If the matching doesn’t work, the location will not be settled and instead be processed by spatial location fuzzy algorithm. As for the geographic entities, the extracted entities will be matched with those in the spatial location reference library, according to which the spatial location of toponymy and addresses are achieved. Finally the spatial location of geographical elements will be realizedThirdly, based on the geographic information extraction and spatial location method, this thesis carries out the research to conduct geographic information recognition and extraction on multiple internet sites, and also offers visual display on the front end interface of the prototype system.
Keywords/Search Tags:Geographic information, Extraction, Spatial location, Hidden Markov Model, Prefix and suffix, Recognition window
PDF Full Text Request
Related items