Font Size: a A A

Job Salary Forecast Based On Text Similarity And Collaborative Filtering

Posted on:2019-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ZhangFull Text:PDF
GTID:2417330545452657Subject:Master of Applied Statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,online recruitment has replaced the traditional recruitment form to become the mainstream recruitment method in China.A large amount of recruitment information has been posted on various recruitment websites.However,the recruitment websites have some problems such as unreasonable job salary and intentional hidden salary due to various reasons such as the recruitment website technology and management.Job salary is a key factor for job seekers when they are looking for a job.How to accurately understand the actual status of job salary and determine whether job salary is reasonable have become a problem that needs to be resolved.This article takes the recruitment information on the Lagou online as an example and writes a crawler program to capture data.Because the recruiting information includes structured data and text data,this paper adopts a collaborative filtering algorithm that combines the similarity of texts,and prediction of target job salary by nearest position of target job.This paper provides a reference for job seekers to determine their salary status.Firstly,a descriptive statistical analysis is performed on the structured data in the recruitment information,and a contingency table is established between each index factor and the position salary.Through the independence test of the contingency table,the dependency relationship between salary and various factors is obtained.And this relationship is applied to the calculation of structural data similarity.Secondly,Chinese word segmentation and stopping word process are performed on the text data in the recruitment information.The LDA algorithm and Doc2vec algorithm are used to train the processed results,and a vector representation of the text data is obtained.The cosine of the angle between text vectors is used as a measure of the similarity between texts.Because the LDA algorithm does not consider the order relationship between words in the text and Doc2vec algorithm does not pay attention to the impact of a single word on the entire document,so this paper proposes an improved text similarity calculation method,introducing weight coefficients ?.The similarity obtained by the two methods is weighted averaged and the averaged value is taken as the final text similarity.In the end,this paper introduces the weight coefficient ? when measuring the overall similarity of the job,and uses the weighted average of the similarity between the structured data and the text data as the final similarity between the posts,and finds the set of neighbors with the highest degree of similarity through the degree of similarity.Using the similarity as the weight,the weighted average of the salary of the neighboring set positions is taken as the salary forecast of the target position.The model is tested on the training set for many times and verified on the verification set to obtain the optimal parameters of the model.At this time,the prediction error(MAE)of the model for the lower limit of the job salary is 1.27,and the prediction error(MAE)for the upper limit of the job salary is 1.95,which can better predict job salary.This paper synthesizes the text similarity results calculated by LDA algorithm and Doc2vec algorithm by introducing weight coefficients.To verify the validity of the method used in this paper,the prediction results obtained by the LDA algorithm and the Doc2vec algorithm are compared with the prediction results of this paper.The experimental results show that the method used in this paper has better predictive power.
Keywords/Search Tags:Salary forecast, Collaborative filtering, Text similarity, LDA theme model, Doc2vec algorithm
PDF Full Text Request
Related items