Font Size: a A A

A Preliminary Into The Method Of Corporate Portrait Based On Web Public Data

Posted on:2020-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:D Y GuanFull Text:PDF
GTID:2417330575985948Subject:Statistics
Abstract/Summary:PDF Full Text Request
This paper makes a preliminary exploration of the method of corporate portrait based on the open data of the network.The corporate portrait is a type of user portrait,which transforms the analysis object of the portrait from the individual user to the enterprise,and is an extended application of the portrait of the individual user.User portraits are a new method of data analysis that describes and analyzes multidimensional data and plays an important role in precision marketing and personalized services.The portrait of the company can systematically analyze the more fragmented information of the enterprise,and display the enterprise information in a streamlined and intuitive manner by means of labels.However,because enterprises,especially small and medium-sized enterprises,have less public information,it is easy to describe the characteristics of enterprises more accurately because of insufficient information when conducting corporate portraits.Therefore,it is of great practical significance to study how companies conduct portraits.This paper takes Small and medium enterprises as an example,and discusses the methods and applications of corporate portraits based on their open network data.First,establish an indicator system to climb the relevant data of 2242 SMEs on the website.The data is cleaned,and the two clustering mean value estimation and SMOTE oversampling techniques are used to solve the missing and unbalanced problems in the data.Second,the tags are extracted and modeled,and analyzed from the fact label layer and the model label layer.Through the word cloud map,explore the label factual characteristics of basic information,activity and innovation ability of enterprises with different risk states.By establishing four regression models and correlation analysis,statistical methods and machine learning-based data science methods are combined to explore the correlations between tags.After empirical research,the regression model has a poor fitting effect on the correlation between the risk status,activity and innovation ability of the enterprise.Using the Apriori correlation mining algorithm,we can mine the strong and effective association rules between the nine enterprise risk situations,the enterprise activity level and the innovation ability.The rules show that when the company’s activity level and innovation ability develop well and the pace is consistent,the enterprise’s risk is low;when the company’s activity level and innovation ability develop slowly,or when it is lost,the company’s risk is high.In this paper,the common methods of missing values and unbalanced data are deeply discussed,and the problem of incomplete information processing is solved.The label is analyzed by means of visualization and model building.The label characteristics of different types of enterprise risk status are visually displayed from the fact level,and the correlation between labels is further analyzed after the classification algorithm.The data mining algorithm-association algorithm is introduced to image the enterprise,and the correlation between the tags is mined.The algorithm analysis and demonstration of the enterprise portrait in the existing research are enriched,and the fusion of data science methods and statistical methods is realized.It has certain theoretical value and strong practical significance.
Keywords/Search Tags:network public data, corporate portrait, label, relevance analysis
PDF Full Text Request
Related items