Authorship analysis: Discovering the author of a software document

Posted on:2007-04-05

Degree:Ph.D

Type:Dissertation

University:University of Louisiana at Lafayette

Candidate:Fanguy, Philip J

Full Text:PDF

GTID:1445390005968699

Subject:Computer Science

Abstract/Summary:

The purpose of this research is to study and develop methods of determining the author of software programs. The authorship techniques developed analyze a collection of the author's past programs to extract information to choose the likeliest author of a suspect program.; Two approaches were attempted to accomplish this task. First, traditional software complexity measures were examined, using analysis of variance, to identify differences between authors; however, no significant differences were detected using these measures alone (with the exception of measures on the comments of the programs). The second approach created lists of characterizing terms (words and phrases within the programs) for each author that can be used to identify the author in any future programs written by the author. The focus of this dissertation is on these term selection techniques and their ability to choose the most influential terms for author identification. The techniques were developed and tested on programs using the C++ programming language (obtained from intermediate level programming classes).; Five term selection techniques were attempted. The Probability Technique selects terms that are used more often by one programmer than the group of programmers. The Rank Technique selects terms that are ranked relatively higher for a programmer than for the group of programmers. The Quintile Technique groups the terms into six bins according to rank and selects the terms in a bin that few programmers use. The Probability Deviation Technique selects terms for each author with probabilities a number of deviations above the mean of probabilities for all authors. The final technique, the Bayesian Inference Ratio Technique, uses a ratio that compares the probability that the term is used by the programmer to the probability that the term is used by any programmer to determine if a term is selected. Of these term selection techniques, the Bayesian Technique produces the best results in terms of Authorship Accuracy (percentage of correctly identified suspect programs) and terms selected. The Bayesian Technique was validated in further studies on additional sets of programs. Term selection techniques, in general, and the Bayesian Technique specifically, are considered to be valid author identification methods.

Keywords/Search Tags:

Author, Programs, Term selection techniques, Software

Related items

1	Comparing computer software programs: Determining the most efficient system for teaching English language learners
2	The incorporation of communicative language teaching into the elaboration of interactive software for ESL/EFL learning
3	Programming for success: A study of repertoire selection practices by undergraduate-focused, religiously-affiliated, collegiate choral programs nationally recognized for performance excellence
4	Theory And Practice Of Teacher Professional Development Based On Collaboration
5	The Influence Of College Students’ Subjective Social Status On The Mate Selection Criteria
6	Musical accompaniments in the preparation of marimba concerti: A survey of selective interactive music software programs
7	The Analysis Of The Flux About Western Critical Theory Of Author
8	Dynamic state alteration techniques for automatically locating software errors
9	A Study On Intercultural Communication Conflicts And Countermeasures In The Short-term Overseas Study Tour Programs
10	An Interpretation Of The Ethical Selections Of Human Beings And Artificial Intelligence In The Lifecycle Of Software Objects