Research On Key Technologies Of Cross-document Table Fusion Based On Deep Learnin

Posted on:2024-01-21

Degree:Master

Type:Thesis

Country:China

Candidate:X B Hu

Full Text:PDF

GTID:2568307106482074

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

Due to high relevance,tables are widely applied in scientific literature to record data,providing a valuable reference for subsequent research.The aggregation of table data from different literatures into a single,standardised collection provides more comprehensive,accurate and systematic scientific and technical information,which slovies data supporting the solution of increasingly complex scientific problems.Deep learning-based crossdocumentation table fusion technology can extract and fuse key-value pairs in tables by extracting semantic features of the tables,resulting in efficiency in integrating scientific and technical data.However,text boundaries cannot be accurately detected due to the dense text within the table.In addition,the different descriptions of the same key in different literatures make it difficult to extract table semantics in a small sample setting,so that table fusion still faces the following challenges:(1)In order to prevent NMS operations to improve the efficiency for extracting table structures,this paper proposes an image processing-based method for extracting table information.Firstly,Retina Net is used to extract and fuse multi-scale document screenshot features to get the table location information.Secondly,a text recognition network str-PG-Net is proposed to detect the text skeleton by morphological methods,and a full convolutional network is used to extract text centreline and text border features;furthermore,a binary classification neural network is used to get text orientation features,which are combined with the joint decoding of skeleton centroids to accurately detect the text position and avoid NMS operations in order to improve the table information extraction efficiency.Finally,a heuristic algorithm based on text spacing is used to detect the cell position,so as to obtain the table structure information.The experimental results show that the method proposed in this paper improve the table information extraction efficiency and optimise the capability of cell detection in the case of small samples.(2)To address the problem of sparse semantic data of tables in a small sample environment,this paper proposes a semantic model-based table fusion method.Specifically,character embedding trains to alleviate data sparsity,a Bidirectional Long and Short-Term Neural Networks(Bi-LSTM)is used to extract table key semantic features,and softmax is used to identify table key.In addition,a table classification method based on title semantic features is proposed by combining title-specific tables with table key semantic features,merging similar tables based on table key semantics,and using a graph database to store the fused tables.Experimental results show that the method proposed can identify unknown keys and improve the accuracy of table key classification.

Keywords/Search Tags:

Table information extraction, Table fusion, Image process, Semantic models

PDF Full Text Request

Related items

1	The Research And Implementation Of Table Recognition System Based On Deep Learning
2	Table Recognition Based On Digital Image Processing
3	Research And Implementation Of The Web Page Table Structure Recognition
4	TableSeer: Automatic table extraction, search, and understanding
5	Design And Implementation Of PDF Format Based Table Extraction Method
6	The Design And Implementation Of Packet Classification Based On Rule Table
7	Research On Technology Of Table Information Extraction In Semi-Structured Texts
8	Middle And Small Restaurant Management Information System Design And Development
9	Design And Implementation Of Web Data Table Detection System Based On Visual, Lexical And Semantic Features
10	Method Of Entity Table Information Extraction In Web Page