Font Size: a A A

Scientific Paper Discrimination Method Research Based-on Word Co-Occurrence Network And Support Vector Machine

Posted on:2011-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:J DuFull Text:PDF
GTID:2178330338480487Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Scientific paper always has standard format requirements, but the form is deceptive, strict syntax and good format can not guarantee the message is meaningful and valuable. With the purpose of saving journals and conferences reviewers'time and improving efficiency and quality of paper review, this study propose a method to discriminate scientific papers , and analysis the structure features of human knowledge system that mainly represented by natural language.In language, word and word interact in the sentence, it is not random, but bases on certain rules. These rules could be studied by language network. Word co-occurrence network is a form of the human language network. Each word in a sentence corresponds to a node in network. If two words in a sentence is neighborhood, the corresponding two nodes are considered to have a connection. By constructing the word co-occurrence network of papers, we find out the differences between authentic and inauthentic papers from the perspective of network analysis, and identify the inauthentic papers with a certain confidence, guarantee the paper reviewers review all are meaningful paper. It will rise the efficiency of society and pure the human knowledge.There are obvious similarities by analogizing the complex network growing mechanism and the characteristic of composing authentic papers, random network growing mechanism and the paper text generator generated or the characteristic of composing low quality papers. So we propose the hypothesis: there are essential differences in word co-occurrence network structure between authentic and inauthentic papers. This study uses various parameters of the language complex network to represent the paper, calculates various parameters of the network and output a vector, finally train samples to model by using support vector machine toolkit. At the end, we collect samples design experiments to verify hypotheses.The experiment results show that the word co-occurrence network of inauthentic paper has some kinds of small-world characteristic; there are essential differences in network structure between high quality papers and inauthentic papers generated by text generator, but the papers that have no significant difference also have no significant difference in network structure; at the same time papers from different domain can be distinguished easily.The results show that the discriminating method proposed in this study can find out inauthentic papers to some extent, but there are still certain deficiencies and areas for improvement, which is the direction of our future work.
Keywords/Search Tags:language network analysis, word co-occurrence network, paper discrimination, small-world network
PDF Full Text Request
Related items