Font Size: a A A

Research And Application On Automatic Comparison Of Text

Posted on:2012-06-15Degree:MasterType:Thesis
Country:ChinaCandidate:L X WangFull Text:PDF
GTID:2178330335977743Subject:Computer applications and technology
Abstract/Summary:PDF Full Text Request
With the development of Internet and computer technology, varieties of resources emerge every day, the degree of information sharing is higher and higher, which greatly simplifies people's work and daily life. But at the same time, it causes some issues:high rate of web page repetition, encroachment of intellectual property, information divulgence and the like. The ability to detect the similar contents quickly and accurately became our concern. On the other hand, text similarity comparison which is the base of natural language processing, is widely used in such area as text classification, clustering, information retrieval, text copy detection and so on, attracting many scholars' concern. As a result, text similarity comparison is one of the effective ways to solve these problems both in theory and practice.In this dissertation, the research focusing on the answering to the above questions are deeply summarized as follows:First, text similarity technology is introduced to detect the encrypted text, providing a new effective method to detect encrypted instead of the current manual detect. The main achievement is:1) present a text based on natural language processing, automatic leak detection technology. This method is based on VSM similarity method model, combined with the word Segmentation, text encryption, Web information extraction and other natural language processing technology to non-disclosure under the premise of an irreversible encryption algorithm through the use of Web information extraction technology, Comparing the similarity of the cryptograph to detect a specific network Stand the existence of secret text, and the extent of disclosure.2) The research on accessing to specific web content, in combination with sub-block of visual advantages of Web information extraction, presenting a deep in web pages attracting method based on regular expressions which became our data source.Second, the current text similarity detection technology is in combined with other natural language processing technology, achieving the natural language processing based on text similarity detection system. This system has the chapter and verse to the statement of the multi-level detection; broaden the comparative approach, covering the plain text function and the detection of secret text; data sources including text and each local Internet text information; similar parts can be automatically positioning mark.The main features and innovations:1) The text similarity comparison technique was first introduced to the text leaking classified detection which effectively solved the manual detection problem and ensure the security of documents with secret.2) Design and implementation of text similarity detection system based on natural language processing. The system has multi-level, multi-data source, multi-comparison methods, multi-functional and multi-threaded computing features.
Keywords/Search Tags:Natural Language Processing, Similarity Comparison, Text Leaks, Web Information Extraction
PDF Full Text Request
Related items