Font Size: a A A

Research On Techniques Of Text Retrieval Modelbased On Semantic Analysis

Posted on:2017-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:F YuanFull Text:PDF
GTID:2308330488985676Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advent of "Internet+" under the Innovation 2.0, every field of social life become inseparable from the Internet, which lead to the explosive growth of all kinds of data on the Internet, including unstructured data, structured data and semi-structured data. Internet Companies can analysis habits behind the user’s behavior from complex data, then design out products and services more in line with user "taste"; but it is more and more challenging for most casual users to how to retrieve useful information automatically from such massive data through a computer.The topic models represented by LDA model are commonly used semantic mining tools in Information Retrieval System. This kind of probability statistical topic models can identify latent topic information in the corpus and eventually obtained a term frequency vector from each document by multiple probability distribution matrixes. Due to the uncertainty of natural language mainly for randomness and fuzziness, the topic models based on probability statistics can only solve the randomness of natural language, but ignore the fuzziness of natural language and the semantic association between words in the document. So with the growth of network data, the topic models as information retrieval tools can return information related to user needs, but can’t catch user intents and get accurate information, the users are always not satisfied with the retrieval result. For it, the research work is as follows:Firstly, this paper introduces Cloud Model as a bridge to integrate Semantic Knowledge Information into the tag topic model and proposes Cloud-based Semantic Tag Topic Model. This model merges the tag matrix based on semantic knowledge and the tag matrix based on probability statistics into a new semantic tag matrix by a series of cloud space transformation, and then uses it to make topic modeling. Cloud Model not only can make the conversion of uncertainty between quantitative rep esentation and qualitative expression, but can also reflect the correlation of fuzziness and randomness. Hence it overcomes the shortcoming of the traditional topic model without considering the fuzziness in language. The new model assigns semantic close words to one topic to enhance topic coherence.Secondly, in order to enhance the effect of the tag topic model, this paper proposes a feature acquisition method based on semantic association to acquire better document tags. On the basis of the traditional feature selection, this method introduces Optimal Membership Degrees and Comparison Probability to consider fuzziness and semantic association between words. Comparison Probability measures the fuzziness of a word and the semantic correlation between two words; Optimal Membership Degrees measures the close degree of words and the topic. These two indicators reflect the semantic association between the words, words and documents to acquire document features and tags, and improve their quality.
Keywords/Search Tags:Information Retrieval, Topic Model, Cloud Space Transformation, Semantic Association, Feature Extraction
PDF Full Text Request
Related items