Font Size: a A A

Research On Biography Generation Based On Events Of Character Roles

Posted on:2016-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:R ZhaoFull Text:PDF
GTID:2308330461978005Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In our daily life, character information as one of the most concerned information type has important social value. The traditional biographies offers a wealth of character information but due to the limit of the material and people’s work, the demand of character information cannot catch up with the pace of the information age of big data. Networks are gradually replacing the traditional media and becoming the main channel of getting character information. With the help of search engine, people can solve the problem of information filtering and positioning, but most of the data returned by search engine is scattered and incomplete, onthe other hand the data is also mixed with interference information suchas advertisement and duplicate web pages, which wastes people a lot of time to read and sort out in order to obtain useful information. So, it is significant to filter information, make the unstructured network information structured and build the biographies based on the networks. In order to solve this problem, this paper constructs off-line database about character information and conduct the research.The main work of the research is as follows:(1)This paper analyzes the automatic generating method of biographies. Reference the ideas of plate type and double polyphonic mode of traditional biographies, this paper puts forward the model of persona biography based on events of character roles. Among the events, the mode can divide the events into different categories based onthe identity of the characters, and events of every category can be described as a masterstroke which can demonstrate the characteristics of the characters. So, in this mode the orderliness of the event can be clearer.(2)This system analyzes the materials needed in the biography and extracts them from Internet. In connection with the character which is refining and simply structured, this paper designs a duplicated pages deletion method based on word fingerprint to make the materials purify. Cutting and grouping the high-frequency words by slide windows, then get all word fragment’s hash map. The hash code set is the word fingerprint of pages text. According to the matchofdifferent word fingerprints realize pages deletion. Experiments verify the feasibility of the proposed method。(3)This paper presents events extraction method based on the feature of event description words. According to characteristics of the same words which is usually used to describe the same type of the events, this kind of word is given a higher weight to structure weight matrix, then obtain features of the events description. It clusters the role events combined with the clustering method with adaptive neighbors. And finally system extracts different types of events by event timing summary. In the experiment, the effect by using the feature of event description words is obvious, and precision, recall and F-score respectively reach about 93%, 89%,89% on average.(4)The system can display the abstract of the characters sequence in time in a visualization way with the help of the visualization tools and choose the appropriate abstract sequence in time and the network model connecting to the words describing events.The system can constructed correlation matrix between the two things to make the events which happened in different time and in which people play different roles visualized and analyze the participation situation the words describing events and the meaning of the characteristics of the role of people.
Keywords/Search Tags:Biography, duplicated webpages removing, Event clustering, timelinesummary, visualization
PDF Full Text Request
Related items