Scientific Paper Discrimination Method Research Based-on Word Co-Occurrence Network And Support Vector Machine

Posted on:2011-04-24

Degree:Master

Type:Thesis

Country:China

Candidate:J Du

Full Text:PDF

GTID:2178330338480487

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

Scientific paper always has standard format requirements, but the form is deceptive, strict syntax and good format can not guarantee the message is meaningful and valuable. With the purpose of saving journals and conferences reviewers'time and improving efficiency and quality of paper review, this study propose a method to discriminate scientific papers , and analysis the structure features of human knowledge system that mainly represented by natural language.In language, word and word interact in the sentence, it is not random, but bases on certain rules. These rules could be studied by language network. Word co-occurrence network is a form of the human language network. Each word in a sentence corresponds to a node in network. If two words in a sentence is neighborhood, the corresponding two nodes are considered to have a connection. By constructing the word co-occurrence network of papers, we find out the differences between authentic and inauthentic papers from the perspective of network analysis, and identify the inauthentic papers with a certain confidence, guarantee the paper reviewers review all are meaningful paper. It will rise the efficiency of society and pure the human knowledge.There are obvious similarities by analogizing the complex network growing mechanism and the characteristic of composing authentic papers, random network growing mechanism and the paper text generator generated or the characteristic of composing low quality papers. So we propose the hypothesis: there are essential differences in word co-occurrence network structure between authentic and inauthentic papers. This study uses various parameters of the language complex network to represent the paper, calculates various parameters of the network and output a vector, finally train samples to model by using support vector machine toolkit. At the end, we collect samples design experiments to verify hypotheses.The experiment results show that the word co-occurrence network of inauthentic paper has some kinds of small-world characteristic; there are essential differences in network structure between high quality papers and inauthentic papers generated by text generator, but the papers that have no significant difference also have no significant difference in network structure; at the same time papers from different domain can be distinguished easily.The results show that the discriminating method proposed in this study can find out inauthentic papers to some extent, but there are still certain deficiencies and areas for improvement, which is the direction of our future work.

Keywords/Search Tags:

language network analysis, word co-occurrence network, paper discrimination, small-world network

PDF Full Text Request

Related items

1	Study Of The BAM Network Based On The Small Word Architecture
2	Small World Network Research And An Analysis Of It's Epidemic Spreading
3	Study Of Small-world Network Theory In The Transport Network
4	Research On Small World Topology Based Neural Network
5	Research Of Small World Effect In World Wide Web
6	Stability Of The Delayed Neural Network Models With Small World Connections
7	Researches Of Network Topology Construction Based On Cayley Graph And Small-world Phenomena
8	Research On Trim Of Multilayer FeedForward Small World Network Based On E-exponential Information Entropy
9	Research And Application Of Convolutional Neural Network Model With Small-world Features
10	Particle Swarm Optimization Based On Dynamic NW Small World Network