Font Size: a A A

The Citation Content Analysis Of Scientific Papers And Its Application

Posted on:2015-04-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:S B LiuFull Text:PDF
GTID:1109330467486877Subject:Science and technology management
Abstract/Summary:PDF Full Text Request
Citation analysis includes citation bibliographic information analysis and citation content analysis. The researches on citation content analysis are fewer than citation bibliographic information analysis. And most of the researches on citation content analysis focused on the subject contents of citing papers, fewer of them could reach to the depth of the literature full-text content level. There is no doubt that the citation content of the cited reference in the paper could provide us more relative reference information which could help us to understand the roles and values of cited references, and mining the intentions and tendentious of citers. With the development of electronic information technology and the continuously expanded and improved large databases construction, especially the construction of full-text databases, provide us the possibility to achieve multi-angle and systematic analysis of citation content. The main research works are reflected in the following five aspects.(1) Constructing the basic theoretical framework of citation content analysis systematically. On the guidance of scientometrics, bibliometrics, and content analysis theory, we proposed the concept, steps and the main research content of citation content analysis. The research scopes of the citation content include subject, time, topic, position, motivation, tendentious and strength of the citation. We also discussed the differences and relations between citation content analysis and the traditional bibliographic-based citation analysis. Both of them analyzed citations, so the methods used in traditional citation analysis are appropriate for citation content analysis. The difference is that the citation content analysis needs to use natural language processing technology to realize. And citation content analysis could reveal deeper, more detailed inheritance and innovation relationships between citing papers and references. We analyzed the functions of the citation content analysis, and revealed the application value of citation content analysis from three aspects, scientific paper evaluation, revealing the knowledge structure evolution, and information retrieval.(2) We proposed the conditions to realize the citation content analysis from two aspects, data and method. The date condition includes data accessibility, identifiable, structural, integrity, and continuity. Implementation method includes citation content extraction method, the application of the database and citation content analysis method. In this paper, we take the full-text data in PubMed Central database as data sources to achieve the acquisition and database storage of the citation content. We also built a citation content retrieval system based on the citation content database, which could provide a data platform for citation content retrieval and its application. (3) The citation position was analyzed from three angles, the analysis of citation location of one single citation, the analysis of co-citation position, and the network of co-citation position. First, citation analysis and natural language processing technology were combined to analyze the location of one single citation. We found the distribution rules and content characteristics of citation in each chapter. Second, in the location analysis of the co-citation, the co-citation relationship was divided into four levels according to location of where this co-citation happened. They are sentence level, paragraph level, section level and article level. Through the statistical analysis of the co-citation relationships occurred in each co-citation level, we found that the rules of co-citation distribution were basically the same in different journals. The sentence level account for the least co-citation distribution, and the article level account for the most co-citation distribution. The average distributions of the four levels are3.16%,7.29%,18.16%, and71.39%. We further found that the co-citation relationship distribution had a certain correlation with co-citation frequency. The higher co-citation frequency one has, the more co-citation relationships occur in sentence level, and the fewer co-citation relationships occur in article level. Third, we analyzed the features of each co-citation level network. CiteSpace and social network analysis method were used to identify the structural features of co-citation level network in traditional co-citation network. Results showed that the article level co-citation network covered38.58%of the traditional co-citation network. However, co-citation relationship in sentence level covered5.64%of the traditional co-citation network. It occupied the core position of the traditional co-citation network and constituted a sub-network of the traditional co-citation network. The coverage of sentence level co-citation network was higher than paragraph level and section level co-citation network.(4) In citation topic analysis, tag cloud and topic model technique were introduced. Tag cloud could show the topic words of citation content intuitively, and topic model could classify the citation content into different topics. LDA topic model was applied, a comparison of topics of citation content and topics of cited paper, topics of citing paper have been made. Results showed that the topics of citation content were involved in a wider range than cited paper, at the same time, these topics also had bigger differences with topics of citing paper’s, which means that citation content has its unique attributes and values in the process of citation knowledge evolution. According to information entropy theory, we compared the broad sense and narrow sense of topic terms within citation content and citing papers. Results showed that the meaning of topic terms in citation content was narrower than that of citing papers. Topic terms in citation content were more inclined to express a proprietary method or theory in a field.(5) Based on the above theories and methods of citation content analysis, we explored the application value of citation content in the field of information retrieval, co-citation analysis, and paper evaluation. First, a citation search and recommendation system was set up based on citation content. System evaluation results showed that this system had a very good retrieval and recommend effect on highly cited literatures and classical literatures. The average accuracy is56.5%, which is12.5%higher than Google Scholar and43.5%higher than PubMed. Second, different weights were given to each co-citation level based on the similarity of the citation content in each level. Results showed that, the average similarities of the four levels were1,0.77,0.64, and0.56. The average similarity of article level was much higher than the weight given subjectively. The performance of co-citation analysis was improved after given the weight of each co-citation level. Finally, according to citation content, citation motivation was identified. Citation property evaluation index was put forward. It was divided into positive citation, negative citation and neutral citation. Cue word method was employed to automatic classify the citation property. The average accuracy rate was over95%. Take BMC_Bioinformatics as a data sample, neutral citation accounted62.88%and negative citation accounted3.53. Citation quality evaluation index and an improved H index were put forward based on the times that a reference was actually cited in a paper.
Keywords/Search Tags:Citation content analysis, Citation position, Co-citation analysis, Citationretrieval, Citation evaluation
PDF Full Text Request
Related items