| With the fast development of big data,enterprise recruitment has gradually changed from the traditional offline recruitment mode to the online recruitment mode in recent years.Online recruitment has become the mainstream mode of enterprise recruitment with the advantages of low cost,easy operation and the ability to send resumes without leaving home.In addition,due to the impact of this year’s epidemic,many domestic enterprises have experienced a decline in benefits and a corresponding decrease in recruitment personnel,while the annual number of college graduates is gradually rising,leading to a more severe employment situation this year.The number of resumes received by enterprises is also far more than in previous years,which poses a greater challenge to the matching and screening of online resumes.In order to solve the problem that recruitment websites can’t realize automatic parsing and intelligent matching of resume information,this thesis proposes an algorithm of automatic parsing of Chinese resume,as well as a matching and screening algorithm for resume requirements in recruitment field.In this thesis,personalized matching is combined with enterprise recruitment scenarios.Based on automatic information extraction,personalized matching algorithm and evaluation algorithm,automatic matching of resumes is realized according to enterprise recruitment needs.Second screening can be carried out according to the enterprise’s individual needs of the employer to achieve the precise match between the enterprise and the job seeker.The main research contents and innovations of this thesis are as follows:(1)The text information will be extraction according to the hierarchical structure of Chinese resume.First of all,different file format resume unity to TXT format,secondly due to text the relatively unified format of resume and recruitment requirement,extracting keywords to block the text after the advanced research,text block is divided into the following two types:text block contains keywords and text block does not contain attribute keywords.Text blocks containing attribute keywords can be extracted in turn according to attribute keywords.For text blocks that do not contain attribute keywords,they need to be extracted according to experience customized rules obtained from research on resume data.(2)Matching the information between the recruitment text and the resume according to the semi-structured characteristics of the resume,divide the resume information into structured and unstructured information,and adopt the idea of "divide and conquer" for matching respectively.Different algorithms are used for different types of text in structured information: discrete numerical matching is used for numerical text;Domain knowledge text adopts ontology based domain knowledge algorithm;The text of post name is based on the text similarity algorithm between characters.The matching degree of structured text is obtained by the weighted summation of different attribute values according to the preference of enterprises for different attributes of applicants’ resumes.(3)Based on the pre-training model ELMO and sentence vector SIF,keywords were extracted from the resume text to achieve keyword retrieval and matching.Key words were extracted from work experience,self-evaluation and other contents.After extracting industry keywords with rich semantics,the keyword set of the resume was generated.According to the recruitment keywords provided by the enterprise and resume keyword set to match.(4)In view of the problems of simple and template information matching existing in the current resume online recruitment,the TOPSIS algorithm is used to conduct the secondary screening of resumes.Enrich the selection dimension of resume,according to the company’s different personalized recruitment needs,such as educational background,work experience,school honors and other personalized content,and screen out more suitable candidates.Compared with traditional 0-1 rule matching,the F1 value of the structured information matching algorithm proposed in this thesis is improved by 3%.Doc2 vec vector model is adopted for unstructured text.Compared with the traditional text similarity algorithm,F1 value is increased by more than 10%.Compared with the mainstream algorithm,the keyword extraction algorithm is more complete and the keywords extracted are more complete.The secondary filtering of resumes makes the information matching algorithm more practical. |