Font Size: a A A

Improving Illegal Online Information Processing With Multi-dimensional Semantics

Posted on:2013-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiaoFull Text:PDF
GTID:2248330395950365Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, the Internet market is expanding rapidly but hidden danger which comes along with it becomes more and more serious. Counterfeit and shoddy products are advertised or even on sale on the Internet market, which brings great harm to legitimate businesses and consumers. In order to solve this problem, relevant government departments of demonstrative cities on E-business, State of Food and Drug Administration and State Department of Commerce are developing "Ecommerce Service Platform for Trusted Trading" to trace illegal online information and crack down on the illegal behavior on the Internet market. This research is one of the key technologies of the platform, which aims at tracing illegal online information and obtaining evidence automatically.This research is based on the earlier stage work, but comparing with the earlier state work, this research has combined different illegal online information processing technologies, improved the technology based on decision tree, added combined search technology and illegal image information processing technology. With these improvements, the monitor system’s technical indicators have been improved.According to the requirements, the platform needs to process the following types of illegal online information:the product information is not recorded or different with recorded information, lack of important product properties, selling contraband products or counterfeit and shoddy products, bait advertising and comparing with competitors’products. Based on these requirements, we divide the solution into two steps. The first step is to find potential websites based on meta-search and combined search technology and retrieve html documents and images with web crawler. The second step is to use decision tree technology and web product information extraction based on multi-dimensional semantics to analyze the text information and use optical character recognition and image retrieval technology to analyze images.To vilify the effectiveness of this solution, we’ve carried out some experiments on the key technologies. According to the results, the precisions of web product information extraction based on multi-dimensional semantics for veterinary drugs and pesticides reach75%and92%, and the recalls reach76%and75%, the precision and recall of illegal information processing based on decision tree reach87%and82%, the precision and recall of illegal image processing based on OCR reach69%and59%, the precision and recall of illegal image processing based on image retrieval technology reach86%and68%.
Keywords/Search Tags:Multi-dimensional Semantics, Illegal Information Tracing, Sophisticated Search
PDF Full Text Request
Related items