Methods for semi-automated index generation for high precision information retrieval | | Posted on:2002-01-10 | Degree:Ph.D | Type:Dissertation | | University:Stanford University | Candidate:Berrios, Daniel Charles | Full Text:PDF | | GTID:1468390011497288 | Subject:Health Sciences | | Abstract/Summary: | PDF Full Text Request | | This dissertation presents new methods for performing knowledge-intensive indexing of documents in a semi-automated fashion for high-precision information retrieval systems. Current methods for indexing medical information are clearly limited. Manual indexing is inconsistent, time consuming, and limited by the representational abilities of the indexing language. Word-statistical indexing is easy to automate, but retrieving documents then requires sophisticated linguistic support for query formulation and search terms that are highly discriminant. Knowledge-based information retrieval systems have improved search precision without novel indexing methods, which suggested that query and context models can help to bridge the gap in knowledge between user-formulated queries and document indexes. Information retrieval systems that rely on a complex, knowledge-based indexing schemes could theoretically provide perfect precision and recall, but require time-consuming, manual indexing by domain experts.; I devised a semi-automated indexing system, ISAID, by leveraging the knowledge represented in query and context models and domain-specific knowledge bases, and by using existing natural-language-processing tools and methods. I developed three methods for automated index proposal by this system using information extraction tools, vector-space models, and query graphs. In addition to preparing files for indexing and managing indexed files and indexing ontologies, the system allows users to create indexes directly in electronic form. Most importantly, the system can use one or more of these three methods to propose document indexes.; I evaluated the ability of ISAID to improve the speed and accuracy with which subjects indexed documents. Compared to manual indexing methods, the indexing system increased the speed of indexing several fold, and the proposal of indexes using the vector space method was consistent with choices made by human indexers. I also tested the impact of disabling index proposal by the system on indexing speed and accuracy. Subjects consistently indexed documents faster and more accurately with index proposal by ISAID than without, although I lacked the power to detect statistically significant differences over every level of inter-subject agreement. While some subjects required at least 2 hours of training to feel comfortable using ISAID, most felt, once trained, they were able to index a majority of clinically significant medical knowledge in documents. | | Keywords/Search Tags: | Index, Methods, Information retrieval, Documents, Precision, Semi-automated, ISAID, Using | PDF Full Text Request | Related items |
| |
|