Font Size: a A A

Web User Interest Mining Based On Ontology

Posted on:2015-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:X Y SuFull Text:PDF
GTID:2268330428997990Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The Internet service pattern is coming to be distributed, active and individua l.Especially by the updating of search engine, people pursue immediate, accurate andindividual search. Individual service needs the search engine understand users’ webbehavior patterns well, give the very learning and recommendation algorithmaccording to individual needs.Detecting users’ interests, intents, and behavior habits when they search is veryimportant for providing individual search results, improving search engineperformance and improving the satisfaction of classify. The search engine users’interest pattern is the base of individual search service, but there’re still manyproblems left. In the area of expressing user interest model, tracking the interestdrifting and optimizing results, there’re still large space left to detect. Modelingusers’ interest could lay the important base of going deeper of detecting users’ intentand search context so that query recommendation and result ranking can be done.Currently, the research about mining users’ interests involves two aspects: First,the discovery of users’ interests. Based on users’ search histories and visit behaviors,some computing standards are maken to judge the interests. As different interestsituations such as long term interest and very short interest exist, the standards aredifferent too. A clear, comprehensive and enforceable standard is the base of interestdiscovery. Second, the expression of users’ interests. Current methods often makeuse of open directory or area ontology, examining the relationship between ontologytags or directory semantic tags, then extracting class categories and building users’interests tree or individual ontology. The problem is how to deal with the huge scaleof tags. Users individuation results of too many redundancy tags, which needs Tedious semantic computing.The problems about the two aspects are the cores of this paper.This paper goes from the theory of user context, discusses a novel methods foruser interest mining, proposes the method of web user interest mining based onontology. In this paper a novel interest score computing methods is proposed withusing the broader concept collection of web pages description vectors as the interestdescription items. The broader concepts form a sequence according to the time orderin the search history log, so that different interest patterns can be found out andusers’ short term interests can be given. This paper proposes four interest patternsreferencing to existing research.The inner relationships of the interest concepts collections based on the interestscores are examined with the ontology, so that the result interest collection andontology fragments figures can be worked out for clearer description of short terminterest. At last the long term interest can be calculated by the incrementalaccumulation of short term interest. The incremental pro gress could be explained asthe fragment map overlay and pruning.The whole progress of getting interest description avoids the problems aboutmerging similar tags by using similarity calculation and document clusteringalgorithm used by previous interest mining methods. Comprehensive andauthoritative ontology concept tags also solves the problem of huge scaleredundancy tags results from semantically similar tags in previous researches. Thispaper provides a new thinking way of interest discovery. Comparing to Bayesianmethods and support vector machine methods, this method describes users’ intere stsmore specifically and getting a more optimal combination results.In the experiments, we choose the Wordnet and Wikipedia as the referencedontology, use existing methods extracting the features of common ontology fromWikipedia, making the entries and classification in Wikipedia corresponding to the concepts and their own relationships in a common ontology so that the Wikipediacould be used for tags extraction and word senses disambiguation. The experimentuses a commercial search engine open data as the source of search log, whichcontains83566records.64273effective visit behavior records which can be used intext processing regard as computational objects. We choose the data from50different user ids to do the experiment:48263records from40users as the trainingset for determining the values of the parameters in formulas, and13216records from10users as the test set.
Keywords/Search Tags:search engine, users’ interest, common ontology, interest pattern
PDF Full Text Request
Related items