| Recently,with the development of optical character recognition,the application and demand for document recognition are increasing day by day.It is of practical value to automatically recognize documents for mining key information.As important objects in document pages,tables provide an intuitive and natural way to present data.Due to the challenges caused by the sophisticated table structure and text layout,it is difficult to accurately perform table recognition accuracy,which can not satisfy common needs for users and developers.Therefore,the design and implementation of the table recognition system have practical value.The main contents of this thesis are summarized as three-fold:First,the thesis has established a large-scale table recognition dataset and proposed a method for generating data in HTML.Collected for table images from scientific documents,the dataset includes corpus and cell structure.Deep learning based models for table recognition have a fair comparison in the case of large-scale training and test data.Methods for generating dataset and evaluation criteria are also introduced.Then,the thesis proposes a three-stage algorithm for table recognition.The algorithm is divided into text detection,text recognition,and table structure recognition.Text detection is based on the network of corner detection and area segmentation.Text recognition is based on the algorithm of sequence recognition in arbitrary lengths.Table structure recognition is based on the model of graph network.To conquer actual problems,algorithms are improved by all kinds of engineering approaches.Analyzed the comparison experiments on datasets and sampled images,the performance and speed of the algorithm are verified,error results are visualized,and algorithm defects are detected.Algorithm support is provided for system design,while an input image is recognized accurately with the combination of detection,recognition,and table structure reconstruction.Finally,the thesis provides designs and implements of the online table recognition system.Based on the Flask framework,the system includes web applications,which consist of front-end,databases,and back-end programs.Modular design is used to achieve uploading,recognition,display in real-time,and results-correction.More details and descriptions are involved,which include functional analysis,function implementation,performance evaluation,related datasheet design,advantages,and limitations analysis of the system. |