| With the development of information technology,the business processes of many organizations have gone paperless.However,transaction of cross-organization business is often performed by paper forms or scanned files,and then manually recorded into the internal information system.This kind of manual input is time-consuming,inconvenient and low-efficient.With the increasing demand of document input,the research on automatic input becomes more and more important.The automatic input of table which is the most common information carrier,includes text recognition and table structure recognition.The focus of this paper is the latter,table structure recognition.Table structure recognition makes use of table lines.We first detect table lines to build the table and obtain all cells.Semantic segmentation model is used to segment the area of table lines.It outputs the category by pixel and represents the region where the table lines are located.The Semantic segmentation model is Unet,which is easy to converge and rich in feature fusion.In the next step,we use directed simple connected chain to extract table line area in the segmentation map and obtain clear table line information.The chains belong to the same line are merged and we’ll get all the lines of the table.Finally,we propose to use tabular line matrix to restore the table structure.The information of cells will be obtained by traversing the matrix.Our model is evaluated on the most widely used ICDAR2013 dataset.Compared with the SOTA model of image table structure recognition Deep De SRT,we will find that any metric of our model is significantly better than Deep De SRT.The precision is higher by 30.8%,and recall is higher by 16.9%.To improve the model,the location information of table text is added to the model.We get the table structure by the former model and fill the text block to the corresponding cell.Then analyze the layout of the cells with multiple text blocks to update table lines.Finally the improved model is evaluated on ICDAR2013 dataset.The precision is higher than former model by 1.1%,and recall is higher by 2.2%. |