| Internet has become primary medium for information access, commerce in today’s globalized world and information is available in the Internet either in the native language of the user or in a non-native language. Therefore, it becomes easier to use another author’s contents from the Internet without proper citation or reference and this tendency is increasing day-by-day. Such use of another author’s contents, thoughts, ideas, or expressions and the representation of them as one’s own original work are known as plagiarism. Though plagiarism can be found in almost every field, it is a major problem in academic area as plagiarism destroys individual’s creativity and originality and defeats the purpose of education.At present, some commercial and noncommercial plagiarism detection softwares are available.However, most of them are unilingual in nature and none of- them considers checking of Bangla documents for plagiarism.According to different research tasks, Pan @ CLEF2012 the plagiarism detection can be divided into source retrieval and text alignment task. The text alignment tasks can be divided into seeding and merging two seed sub-phases. In this paper, cross-language plagiarism detection for the study, for cross-language plagiarism detection of the source text is aligned with the mission to retrieve a study, specific research including:Firstly, in plagiarism detection of the source retrieval, there is no effective keyword extraction method. This paper implements keyword extraction method based on small passages combining the characteristics of a text.Secondly, cross-language plagiarism detection seed search phase fails to fully consider the features of the translation and bilingual text. In this paper, the seeding algorithm combines translations features with bilingual features.Finally, Pan @ CLEF2012 plagiarism detection methods in the first place, there is room for improvement in merging. This paper proposes a merger based on dynamic programming algorithm, using the idea of dynamic programming in the merger process, to optimize text merge algorithm by reducing the time.Experiments show that, based on a small passage of keyword extraction method proposed in this paper, not only for cross-language plagiarism detection data, so that the performance of cross-language retrieval tasks on the source can be improved; integration of translation and bilingual features seed search method proposed in this paper, making the overall evaluation indicators cross-language plagiarism detection can be significantly improved; addition, based on dynamic programming algorithm proposed merger, on-time performance not only for the consolidation phase of the time were optimized, but also on the whole time cross-language plagiarism detection system were optimized.Results of this study not only provides new solutions strategy for cross-language plagiarism detection, specific methods and supporting technologies to improve the quality of text plagiarism detection, make up the cross-language plagiarism detection system deficiencies, improve cross-language retrieval speed plagiarism detection, recall rate and precision, while also text similarity calculation for many research questions to provide a new idea and an important reference. |