Font Size: a A A

Research On Retrieval Technology Of Unstructured Text Data In Two-Ticket Training System

Posted on:2021-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:R YangFull Text:PDF
GTID:2492306560496324Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
The development of the national economy requires the support of the power industry,and China’s power generation system is still dominated by thermal energy.To ensure the smooth operation of thermal power generation,power generation enterprises should strengthen personnel management,actively train the professional capabilities of power generation personnel and strengthen safety awareness.In order to meet the needs of power generation companies in personnel management and training,a “two-vote training system” has been developed.This system is designed to assist power production personnel to participate in two-voice training and study at anytime,anywhere,and flexibly participate in two-vote training exams to assist power company management personnel.Conduct two-vote training arrangements and assessments.This paper considers the problems existing in the two-vote training for power generation personnel,and based on the engineering implementation of the two-vote assessment system,conducts research on the retrieval technology of unstructured text data.This paper first uses the Jieba word segmentation method to perform Chinese word segmentation on two votes,uses hidden Markov model and Viterbi algorithm to identify unregistered words in the text,and performs text noise reduction on the word segmentation results.The TF-IDF algorithm is used to obtain the two-vote content keywords,and a keyword storage data table is designed based on the two-vote content keywords to facilitate the storage and query of the two-vote information keywords.In the text search part,in view of the content of the two votes,the KMP algorithm and the inverted index algorithm are selected to implement the text search function of the two-vote training system.The engineering implementation results of the two algorithms are compared and analyzed.The query results of the two algorithms are comparable,and the query time of the inverted index is shorter,which is suitable for the case where the data volume of th e two votes is large.In the ranking of query results,the BM25 model algorithm and the vector space model algorithm are selected to implement the ranking function in the two-vote training system.Comparing and analyzing the ranking results of the two algo rithms with the results of expert review,it is concluded that the ranking order of the vector space model algorithm is more accurate,and the employee learning is more targeted.The ranking order of the BM25 model algorithm is more comprehensive and easy to manage.The combination of the inverted index algorithm and the vector space model algorithm can retrieve the two-vote content that is more relevant and more comprehensive to the user’s input search information,which is convenient for power production personnel to query the two-vote learning materials and improve the training effect.
Keywords/Search Tags:KMP, inverted index, BM25, vector space model, Chinese word segmentation, TF-IDF
PDF Full Text Request
Related items