Font Size: a A A

The Research On Several Problems In Recommender Systems And Crowdsourcing

Posted on:2018-08-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:C ZhengFull Text:PDF
GTID:1318330518496816Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Given the phenomenal growth rate of computer network and mobile internet, each year, there are large volumes of data generated around the world.By 2020, the global data is expected to rise to 40ZB. The data contains enormous value, but the density of data value is becoming lower and lower.The big data brings convenience to people handling the problem of information starvation, but it also leads human to the era of information overload. The key to providing intelligent service and improving customers' satisfaction is the effective and efficient data mining.Generally, the users of internet can obtain information through passive and active manner. Using the traditional passive method to receive information,under this circumstance, people can get the information which is unified or personalize published by the internet. In another case, people may seek information or help from search engine or crowdsourcing platform using internet. In order to solve the problem of information overload, this work chooses to research 2 applications of internet which are recommender systems and crowdsourcing.The major work and contributions of this work are as following:1. Aiming at the users' direct feedback (rating data) in recommender systems, this work proposes a contextual modeling probabilistic tensor factorization scheme for recommender systems to match the information supply and demand handling the problem of information overload. Contextual information has been proven to be valuable factor for building personalized recommender systems. However, most existing solutions based on probabilistic matrix factorization in recommender systems do not provide a straightforward way of integrating information such as ratings, social relationships, item contents and contexts into one model simultaneously,ignoring some influence between them. This work deem the given data as a User-Item-Context-Rating tensor and introduce a high dimensional method of Collaborative Filtering named probabilistic tensor factorization (PTF) which is a generalization of probabilistic matrix factorization. Then, we further extend PTF to a new model named Contextual Modeling Probabilistic Tensor Factorization (CMPTF) which systematically integrates topic modeling, social relationships and contexts in contextual modeling manner to further improve the quality of recommendation. The experiments with regards to 2 datasets shows the effectiveness and robustness of CMPTF.2. Aiming at the users' indirect feedback (check-in frequency) in Point of Interest Recommendation, this work proposes 2 temporal-geographical topic models for Point-of-Interest Recommendation (TGTM) to match the information supply and demand handling the problem of information overload.With regard to the ratings, the ranking data has fixed numeric range which is normally 1-5. High ranking score directly means that the user like the product pretty much. However, when users browsing web sites or check-in at Point-of-Interests, the related counting data is increased gradually by users' behavior,and the numeric value of data reflects users' interests indirectly. What's more,the numeric range of counting data is not fixed. Generally, the data of check-in consists of user ids, textual contents, posting timestamps, geographic information and so on. This work proposes 2 novel time-location-content aware POI recommendation models which jointly integrate auxiliary temporal,textual and spatial information to improve the performance of POI recommendation. Specifically, TGTM takes advantage of LDA model to analyze the topic match degree between POI and users. Then TGTM use the spatial coordinates to measure the geographical attractiveness of POI to users.After that, the model utilize temporal information to partition the original user-POI check-in data into sub-groups so that behavior in similar temporal scenario can be grouped. Lastly, TGTM take advantage of the above information under unified probabilistic matrix factorization framework to infer the POIs which may attract users. This work set different priors to the latent matrices to test the influences accordingly. Comprehensive experiments conducted using real-world dataset demonstrate the superiority of our approach.3. In order to return high quality information to users solving the problem of information overload, this work proposes an answer integration scheme for open crowdsourcing tasks. Because a key challenge of crowdsourcing markets is to ensure the quality of the answers from the workers with different abilities and reliabilities. The collected answers are often consolidated either using a simple majority voting strategy or more sophisticated solutions. The majority of crowdsourcing markets are occupied by open questions with unstructured answer formats and no possible answers are suggested in advance. This work believes that quality of data depends on both workers and tasks. This work uses the Chinese Restaurant Process to model the procedure of collecting answers,and let the concentration parameter to denote the difficulty of the task.Considering the difficulties of tasks, workers' reliabilities and their answers,this work proposes a scheme for consolidating answers and design EM algorithm to learn related latent variables. To save time, this work also uses the idea of entropy to measure the chaos of answer space. When the chaos of answer space get stable, then the EM algorithm will be activated. Comparative study conducted using live experiments demonstrates the superiority of our crowdsourcing scheme for open questions.
Keywords/Search Tags:information overload, recommender systems, poi recommendation, crowdsourcing
PDF Full Text Request
Related items