Font Size: a A A

Research And Implementation Of Campus Search Engine With Entity Analysis And Short Text Clustering

Posted on:2015-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:P J WangFull Text:PDF
GTID:2298330467962278Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the coming of the era of big data, How to let users obtain desired information rapidly and accurately in growing data has become a more and more important problem. Vertical search that provides authoritative information through the deep mining of industry knowledge and entity search that returns directly the answer through integrating information based on entity are two developing direction in the future.Based on the two points mentioned above, We build a campus entity search engine, which not only provides the search service for teachers and students but also’shows the data visualization after data mining of campus information. The main work of this paper is presented as following:First, we analyze the research problems of the platform system according to the tasks and goals, and then we give an overall design, including some characteristic functions such as "relationship map","teacher card","events calendar","character bus" and "social topics" etc. We also study the technology implementation plan and divide reasonably the whole system into modules.Second, we complete the framework design of the vertical search in our system and complete the development of the important modules. Our task contains three parts including data collection, data processing and data retrieval. We study the application strategy issues of practical scene and use the open source tools to accomplish the development of the offline part of the system.Third, we implement a template algorithm based on a rule of trigger words to and it works out that the method has a good performance in the functions of "events calendar" and "teacher card". Besides, we implement a method for calculating the authority of users based on the thoughts of pagerank algorithm, and the method works well in the functions of "relationship map" and "character bus".At last, we propose a topic words detecting and similar words mining algorithm for the short text clustering technology involved by the feature of "social topics". We implement a topic module based on LDA, and the results of the experiment demonstrate the effectiveness of both of the methods. We also analyze the two methods combing the practical scenarios as well.
Keywords/Search Tags:vertical search in campus, entity information extraction, authority of users, short text clustering, LDA model
PDF Full Text Request
Related items