Font Size: a A A

Application Of OCR Text Recognition Technology In Real Estate Data Integration

Posted on:2019-10-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z MaFull Text:PDF
GTID:2370330566969981Subject:Cartography and Geographic Information System
Abstract/Summary:PDF Full Text Request
Digital recognition is an important branch of text recognition.In the past hundred years of exploration,mature technology and development have been obtained.Although this technology is widely used in various industries,and in the nearly one hundred years of the development of digital identification,a large number of outstanding algorithms and research results have emerged.However,for the surveying and mapping industry,the application of word recognition technology is still in its infancy.Therefore,some specific work word recognition technologies for the surveying and mapping industry still have enormous research value and space.Digital identification refers only to the process of automatically recognizing Arabic numerals through the use of computers.In the project of real estate data integration,digital identification can greatly reduce workload and work intensity,and has very important promotion and application value.This article focuses on how to solve the problem of how to generate a digitized map directly from a scan containing boundary points in a paper file in the survey area.The purpose of this paper is to solve the two technical difficulties of identifying boundary points in paper files and using Arc Engine for secondary development of mass production of maps.Finally,a system capable of automatically recognizing the boundary point information and automatically generating a digitized map is formed.The main work of the full text includes the following:(1)Research on image recognition algorithms for digital recognition,including three parts of image graying,image binarization,image denoising,and studying the three parts of the algorithm and its application In the designed system.(2)Select algorithm for character recognition,select the mature third-party library Tesseract character recognition engine and Baidu OCR character recognition service,and compare and analyze them.The analysis standard is the correct rate,error rate,rejection rate,and identification location after recognition.Time spent,comprehensive analysis.Ultimately make the best choice for your project.In the application of selecting a good identification method,the image pre-processing method mastered in the first step is used to improve the picture quality as much as possible and improve the recognition accuracy.(3)Use Arc Engine to develop software that can generate batch maps directly.(4)Combining the work of the first three steps,design a system that can identify the boundary point information in the scanned document and finally generate a digital map by computer.(5)The system is actually applied to the real estate data integration project in Qing Xin District to verify whether it meets the demand.Through the finally selected character recognition method,combined with the preprocessing work on the scanned image,the output of the recognition result is provided to the batch conversion software.This completes the system flow,and finally satisfies the clean area real estate data integration project for files without digitized images.Digitized tasks achieve ideal results.
Keywords/Search Tags:Digital Recognition, Tesseract, Baidu OCR, Arc Engine, Real estate data integration
PDF Full Text Request
Related items