Font Size: a A A

Visual Analysis For Fast Understanding Of Document Collection

Posted on:2020-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:X H LiangFull Text:PDF
GTID:2518306131961969Subject:Software engineering
Abstract/Summary:PDF Full Text Request
It is not easy to understand a collection of documents,especially for professional documents,such as papers.But usually it’s still important to understand a collection.For example,you can know about a domain by understanding a collection of papers in this field,or you can sort out a collection after understanding it.Therefore,a tool that can help people understanding a collection can be useful.In order to meet this demand,this paper designed a visual method,supplemented by data mining,to help users understanding a collection of documents faster and better.The method provided in this paper cluster twice according to the content,one for the whole collection,one for the little categories provided by the first clustering.After obtaining the results of clustering,this paper gives labels to all categories by the help of feature selection technology,whether it is the result of the first clustering or the second clustering.The labels can be used to understand the general content within the categories.In this paper,scatter plots are used as the main means in the visualization method.Each category obtained by the clustering method is displayed by a scatter plot,and each point in the graph represents a document.At the same time,there are word cloud and other means which can help users understanding the detail of a category or a document.The result of machine learning must have errors.Therefore,this paper provides interaction,which can dynamically modify the results of clustering.Users can split a category if its’ messy,or can merge two categories if there is no big difference between these two categories,or can move a small number of documents to another category.And searching something in the collection is also allowed.By combining the algorithm with the manual operation,the errors of algorithm is corrected,which gives user a reasonable understanding of the document collection.
Keywords/Search Tags:Document Collection, Natural Language Processing, Clustering, Visualization
PDF Full Text Request
Related items