Font Size: a A A

On The Knowledge Organization Of Ancient Local Chronicle From The Perspective Of Social Network Analysis

Posted on:2018-02-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:N LiFull Text:PDF
GTID:1365330602468622Subject:History of science and technology
Abstract/Summary:PDF Full Text Request
During the 1950s,Wan Guoding,a famous agricultural historian in China,organized to collect and edit a set of local agricultural information called Local Chronicle:Produce(<方志物产>),copied from more than 100 literary and historical units located in more than 40 large and medium-sized cities,containing 431 volumes and more than 30,000,000 words.Local Chronicle:Produce has long time,wide range,large in amount,rich in content,high value,significance.It has high research value and could provide an important source of information for agricultural history research and regional history research.Now,digital research based on the content of Local Chronicle:Produce is still in the initial stage,no matter in the choice of geographical scope,or the application of research methods,or text content exploration,it requires continuous attention and development.In order to serve the science research and society development better,we should combine traditional research methods and modem information technologies,from single province to many provinces to nationwide to realize research from part to whole,from single object to multiple objects,for further exploring and using the value of Local Chronicle:Produce.Based on the environment of information society and background of digital humanities,this study takes the digital texts of Local Chronicle:Produce of Shanxi as research corpus,intelligently identifies the Named Entities in the texts and extracts the relationships between them to construct the social network data source,then achieves the visual display of the relationships between the entities via the Social Network Analysis methods.According to the actual needs,the study analyzes the network from different perspectives in order to carry out knowledge discovery.The main contents are as followed:(1)The full-text database construction of Local Chronicle:Produce of ShanxiBy combing the textual features of the Local Chronicle:Produce of Shanxi,this paper has designed a set of normalized text standards on the basis of previous studies,and achieved the formation of all the texts using text processing software.At the same time,we design the database tables and fields and input batches of texts to complete the database construction.There are three tables in the database:the book table,the produce classification information table and the produce basic information table.Among them,the book number is the primary key number of the book table which is the foreign key of the produce classification information table,and the classification number is the primary key of the produce classification information table which is the foreign key of the produce basic information table.Thus,the design of the database not only ensures the integrity of the information,but also reduces the information redundancy.Under the condition of ensuring that information could be added,deleted,modified,queried and other basic operated,it also can easily achieve the joint inquiry of three forms.(2)The produce information research recorded in Local Chronicle:Produce of ShanxiOn the basis of systematically reviewing the development of the produce classification system in China,this paper designs and constructs a produce classification system which accords with the features of Local Chronicle:Produce of Shanxi combined with the features of it in Shanxi Volume.Using database technology and other information technology,the paper intelligently realizes the standardized processing of the original produce classification information to complete the original vacancy of the classification information and calculate the intelligent processing effect of the produce classification information.On the foundation of the intelligent standard of produce classification,the paper takes the name contained in the book as produce’s origin and unifies it to its prefecture name.According to the relationship between produce name and place name as well as the relationship between produce name and classification information,it introduces geographic information system technology to conduct the map visual display of the overall distribution of the produce,the distribution of different types’ produce,the distribution of category information,and so on.(3)Entities Recognition of Local Chronicle:Produce of Shanxi based on Conditional Random FieldThe research takes all the produce information whose annotation information is not empty as the research corpus.The produce alias,cited documents,mentioned characters,named place and produce usage which are contained in the produce annotation information are marked by means of artificial annotations.Basing on artificial labeling,the study corpus is divided into ten equal parts,nine of them are selected as training corpus each time,and the other one is used as test corpus.The conditional random field model is used to study the training corpus.Internal and external features of marked parts are analyzed to form a feature template so as to complete the identification of the model to be built.The test corpuses are used to test the recognition effect of the recognition model.The test indicators are accurate rate,recall rate and harmonic average.The result shows that the recognition effect of the conditional random field model is closely related to two factors:first is the size of the corpus.The conditional random field can play a better role in the large data environment.The overall quantity of Local Chronicle:Produce is large,but the quantity of Shanxi Volume is small,which leads a single learning content of the model.The result shows that the model is not perfect and the test results are still to be improved.The second is the degree of artificial annotation of the corpus.The less the number of the missing or wrong annotation,the higher the degree is.The more comprehensively the model is,the higher the degree of matching between the feature template and the test corpus,and the better the recognition effect.(4)Research on the Knowledge Organization of Local Chronicle:Produce of Shanxi based on Social Network AnalysisBased on the conditional random field model,according to the correspondence relationship between the produce name and the recognition result,the name and the produce alias,the name and place,the name and characters,the name and use,the name and time and other related data are extracted to form the data source required by social network analysis.The research uses the relevant software in social network analysis technology to graphically exhibit the data in the data source,and employs different perspectives for network analysis according to different features and requires.There are three perspectives:the overall network analysis of macro perspective,the local network analysis of mesoscopic perspective and the individual network analysis of microscopic perspective.The Network Analysis between produce name and produce alias.This paper analyzes how many aliases a produce has and an alias can become how many produce’s shared alias through vertical degree.Through line value,it analyzes an alias whether is a common one of a produce.Through the self-center network,it shows produce alias information or the produce information associated with an alias.And through the Unicom network,it finds that different types of produces hold the same alias.Beased on historical perspectives,the paper analyzes the origin of the produce alias and the phenomenon found in the produce alias network.The Network Analysis between produce name and cited character.This paper analyzes the number of characters quoted by a produce and the number of produce quoted by a character through vertical degree.Using the line values could analyze the times of a character quoted by a produce.The personal center network shows a character is cited by which produces and a produce cites how many characters.Through the conversion of the network dimension,we can extract a single person co-cited network.It finds the network of celebrities with the center degree of vertical degree,finds intermediaries in the network with the center of the distance,and finds the best information communicator in the network with the approaching center degree.The Network Analysis between produce name and its use.The paper chooses the medicinal value as a research object,and firstly takes the words describing the medicinal value as a research unit.Through vertical degree,it analyzes which medicinal value a produce has and which produce have the same medicinal value;through line value,it analyzes which medicinal value that a produce has record the most;through the conversion of the network dimension,according to the same medicinal value for the association,it extracts a single produce name network and explores the cluster and intermediary information of produce name through the center of the distance.Then it takes the word describing the medicinal value as a research unit to analyze the relationship formed due to the similar medicinal value.Research on the change of produce in time and space.First,the paper studies the produce changes in the timeline according to different standards for the division of the time period.Through the local network perspective,the first time period recorded in the produce is taken as an example to find the disappeared produce name,and the last time period recorded in the produce is taken as an example to find newly increased produce name.And then,the paper studies the produce changes in the line of space.Through vertex degree,it analyzes which area is of the most abundant produce,which area is of the most barren produce,which produce’s distribution is of the widest,and which produce’s distribution is of the narrowest.Furthermore,it takes cotton as research subject to analyze the introduction and propagation of cotton in Shanxi Province.Although the above research processes are finished,there are still shortcomings to be further improved.First of all,in the process of data formatting and corpus annotation,there is artificial labor participation which will inevitably have omissions.Therefore,the results need to be constantly checked and improved.Secondly,the formatting process is only partly automated.In the process of consummating produce classification information intelligently,there is still some unimproved classification information which can only be improved by manual identification.It still needs continue to explore more effective way to achieve the fully automated operation.What’s more,the research scope and corpus scape of this study is small,thus in order to explore the method’s feasibility,it still need to be tried on a larger scale corpus.Finally,the results of this study are subjective elements of the corpus content with no mixed subjective factors.The purpose is to provide information and research ideas for agricultural researchers.The recognition and utilization of the results still should be analyzed and studied professionally.In short,this paper applies the techniques and methods of philology,information science,computer and so on to the digitalization of Local Chronicle:Produce.It realizes automatic extraction of entities and visual display of association by named entity recognition and social network analysis,to find related information such as distribution and transition of produce.The research provides new methods and perspectives for knowledge organization of ancient local chronicles,and enlarges the application ranges of modem information technologies.
Keywords/Search Tags:Ancient Books Collation, Local Chronicle:Produce, Conditional Random Field, Social Network Analysis, Knowledge Organization, Shanxi
PDF Full Text Request
Related items