Font Size: a A A

Design And Implementation Of Enterprise Portrait System Based On Text Classification

Posted on:2023-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y ZhouFull Text:PDF
GTID:2558307058497204Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As a part of industry-university-research,an enterprise is a place to transform academic value into real productivity.In the process of enterprise development,it will face challenges such as scientific and technological innovation,and university scholars are required to participate in cooperation to realize technological innovation.However,the current corporate information is scattered and disordered on the Internet,and it is difficult for scholars to select suitable companies for cooperation from a large number of companies.Collecting,sorting and analyzing enterprise information distributed on the Internet,and building a complete multiangle enterprise portrait system,can not only help university scholars to select suitable enterprises for cooperation,but also help other users to understand the whole picture of the enterprise,which has important application value.This thesis designs and implements an enterprise portrait system,uses crawler technology to obtain information of different dimensions of enterprises from different web pages,and aggregates them to form enterprise portraits through analysis and sorting.The system has a fine-grained description of the enterprise industry,covering the situation of the enterprise spanning multiple industries.At the same time,the enterprise demand is regarded as a dimension of the enterprise portrait,which makes the enterprise portrait system of this thesis more suitable for the field of production,education and research.The main work of this thesis is as follows:(1)Proposed a corporate portrait labeling system suitable for industry-university-research backgrounds.The corporate profile in this article includes three dimensions of information:corporate business information,corporate patent information,and corporate demand information.Enterprise industry and commerce information describes the basic situation of the enterprise,and enterprise patents and demand information serve as the basis for decisionmaking in industry-university-research cooperation.In the existing research on the labeling system of corporate portraits,a new dimension of corporate needs has been added.The dimension of corporate needs intuitively expresses the needs of the company,and at the same time can help scholars find companies that match their research directions more quickly.(2)A method that combines label attention and self-attention mechanisms is proposed to complete the multi-label classification of the enterprise industry.On the multi-label MircoF1 indicator,the public data sets AAPD and RCV1-V2 are used to prove that this method is better than traditional The multi-label classification method based on deep learning improves the label F1 value.When generating multiple classifications of enterprise needs,the method of CNN combined with LSTM is used to construct a classifier,which not only extracts the local features of the text but also takes into account the global features.It is found on the test set that the accuracy of the classification of enterprise needs by this method reaches 93%.(3)Designed and implemented an efficient and stable corporate portrait system.The corporate portrait system includes a data acquisition part and a corporate portrait application part.Use the Scrapy framework to obtain enterprise data,and use the Airflow automatic workflow framework to complete the automatic data cleaning.According to the multi-label classification method and multi-class classification method proposed in this thesis,enterprises are re-labeled with industry multi-label and demand label.The enterprise profile application part realizes the enterprise retrieval function and the enterprise data visualization function.
Keywords/Search Tags:Enterprise Portrait, Text Classification, Web Crawler, Attention mechanism
PDF Full Text Request
Related items