| | Researcher expertise search, homepage finding and metadata annotation |  | Posted on:2014-03-19 | Degree:Ph.D | Type:Dissertation |  | University:The Pennsylvania State University | Candidate:Gollapalli, Sujatha Das | Full Text:PDF |  | GTID:1458390005496528 | Subject:Information Science |  | Abstract/Summary: |  PDF Full Text Request |  | Expert Search, the problem of retrieving people with expertise on a queried topic, has important applications. For instance, conference organizers can use expert search capability while forming a panel for reviewing papers. Similarly, recruiters can use expert search to track potential employees for their companies. Although significant progress has been made on this problem, existing models for expert search are not tailored for academic domains. In academic domains,  expert search and similar expert finding involve ranking researchers in response to topic and name queries based on academic documents. Academic or research documents are different from webpages in terms of their type (e.g., homepages, publications, grant proposals), structure (e.g., abstract, sections), associated metadata (e.g., venue, authors) and connections (e.g., citations).;Enabling expert search in an open-access digital library such as CiteSeer is challenging since academic documents are not directly available for estimating expertise. Instead, CiteSeer acquires freely-available publications and other relevant academic documents by crawling the Web. Previous studies indicate that researchers list their publication information online using their homepages since this substantially increases the impact of their work. It becomes imperative, therefore, to periodically track researcher homepage URLs in CiteSeer for obtaining up-to-date collections of academic documents. In addition to their use as a resource for academic documents, professional homepages of researchers also typically include descriptions of research interests and other metadata that is crucial in tasks such as author disambiguation and profile extraction.;Despite several studies on homepage finding in the context of the general web, academic homepage finding is not fully addressed in existing research. The first question we address in this dissertation is: how can we acquire an accurate homepage collection? We study this question in two settings. First, we study academic homepage finding on the Web. Given the results of web search for a researcher name query, our goal is to identify the correct homepage in the set of pages retrieved from the Web. We design features based on insights from content analysis of known academic homepages to learn a ranking function for academic homepage retrieval. In the second setting, we address homepage finding on university department websites where academic homepages need to be discriminated from other kinds of academic webpages. Despite training the classifier on "outdated" webpage instances, we show that unlabeled data and multiple views of webpages can be used to adapt our classifier to current-day academic websites.;In the second part of this dissertation, we address expert search for academic domains. We study ranking models for researchers in response to topic and name queries. We use the content of research documents and the structural connections among documents to build query-dependent graphs for scoring researchers. We propose a simple extension to PageRank for combining evidence from multiple types of documents. This model scores researchers based on the structural connections among the documents and the importance of each document-type. In addition, we propose Author-Document-Topic graphs for scoring researchers based on the topical content of documents generated by them. Our models handle name and topic queries uniformly and show state-of-the-art retrieval performance on expert finding tasks. |  | Keywords/Search Tags: | Expert, Search, Finding, Topic, Academic, Metadata, Name |  |  PDF Full Text Request |  | Related items | 
 |  |  |