| The emergence of interactive tasks among multi-modal data requires a high demand for integrating knowledge from different modalities.Therefore,multi-modal knowledge graphs have emerged to meet the requirements of such tasks by integrating knowledge from various modalities.However,constructing a multi-modal knowledge graph for the textile industry still faces several challenges.Firstly,there is a vast amount of reusable knowledge in table documents,but the current knowledge graph construction methods rarely use automated methods to extract structured data from table documents,resulting in a relatively inadequate content of the knowledge graph.Secondly,most of the existing entity alignment methods are designed for the traditional knowledge graph,which do not fully consider the image information and require a large amount of manually annotated data.Therefore,the main challenge is how to align cross-modal entities.Finally,constructing the ontology and data layer for the textile industry and realizing the visualization of the multi-modal knowledge graph is an urgent issue to be solved.To solve the above problems,this paper proposes a series of multi-modal knowledge graph construction and application methods for the textile industry.The main research work is as follows:(1)A Table Document Knowledge Extraction(TDKE)model based on the improved LayoutXLM is proposed.The existing table document knowledge extraction model is affected by OCR recognition and suffers from text boundary box misalignment,which leads to inaccurate entity relationship modeling.The proposed TDKE model in this paper addresses this issue by developing a BBox row alignment algorithm for text boundary boxes in table documents.This algorithm calculates and corrects the position of the boundary boxes to ensure alignment between the rows,thus preventing misalignment and inaccuracies in entity relationship modeling.Then,the LayoutXLM pre-training model is used to extract key-value pairs from the aligned table document.The experimental results demonstrate that this approach can significantly enhance the accuracy of entity and relationship extraction in table documents.Moreover,the extracted structured data can serve as a dependable source for building domain multi-modal knowledge graph.(2)A Chinese Cross-Modal Entity Alignment(CCMEA)pre-training language model is proposed.The model is based on unsupervised learning methods and first extracts different single-modal features using visual and text dual-stream encoders.Then,to address the problem that different modal features are difficult to interact,a cross-encoder is designed to guide the learning between cross-modal features,making the single-modal features more refined.Finally,contrastive learning is used to enhance the matching and differences between image and text entities to highlight the connections and differences between cross-modal entities.Experimental results show that this method has good generalization ability on downstream small sample datasets,and without the need for complex multi-modal data annotation.It achieves the cross-modal entity alignment task of the domain multi-modal knowledge graph.(3)A multi-modal knowledge graph for the textile industry is constructed.Based on the designed textile industry knowledge ontology,the data layer of the textile industry multi-modal knowledge graph is constructed by synthesizing the table document knowledge extraction,Chinese cross-modal entity alignment methods proposed in this paper,as well as BERT-BiLSTM-CRF named entity recognition,web crawler and other technologies.Finally,the KGBuilder tool is used to construct and visualize the textile industry multi-modal graph,providing data support for downstream applications.(4)A multi-modal knowledge repository management system for the textile industry is developed.By integrating technologies such as SpringBoot and Vue,a multi-modal knowledge repository management system has been developed for the textile industry.This system provides users with functions such as knowledge retrieval,knowledge import,and expert review. |