Font Size: a A A

Research On Topic Analysis And Matching Method Of Case-related News Based On Crime Classificatio

Posted on:2023-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:G W WangFull Text:PDF
GTID:2556306797482594Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the occurrence of hot cases and the rapid spread of public opinion news involved in the case,the topic analysis and matching of case-related news become particularly important,and it is also an important part of public opinion monitoring.However,due to the wide range of data sources of the case-related news,the format is diversified,the distribution of case-related news texts crawled by the network is unbalanced,the text structure and semantics are different,and there are many redundant information.It is difficult for conventional topic analysis methods to extract the topic information with preference involved in the case,which also leads to poor matching performance of case-related news texts.Firstly,this thesis filters out the news data of case-related news;Secondly,the charges are analyzed from the data of regular legal documents,and the classification of charges is integrated to assist the theme analysis of case-related news,aiming at generating topic information with preference involved in the case;Finally,the topic information with preference involved in the case and entity knowledge are integrated for matching,and provide technical support for the public opinion monitoring involved in the case.Mainly complete the following research work:(1)A corpus of the case-related news filtering,the case-related news topicanalysis and the case-related news matching construction method is proposedBased on Scrapy Web crawler framework,this thesis uses Xpath Web page parsing method to crawl news texts and legal documents data of popular sites.Construct analysis rules to analyze,mark and clean the crawled legal document data and news data related to 13 hot cases in recent years,and get case-related news filtering,case-related news topic analysis and case-related news matching data sets respectively.(2)A method for case-related news filtering of positive-unlabeled learning byintegrating Topic information is proposedDue to case-related news originates from various fields and has different writing styles,it is difficult to establish complete filtering rules or keywords for data collection.Therefore,this thesis proposes a PU learning method of case-related news involved with topic information.Firstly,a topic model based on variational self-encoder is trained to obtain topic information to guide the selection of positive and negative samples.Secondly,the topic information is enhanced in the iterative process of PU learning,aiming at improving the accuracy of case-related news filtering.Experimental results show that the F1 value of the proposed method leads by1.8% in the case-related news filtering tasks involved.(3)A method for topic analyzing of case-related news by integrating theclassification of charges is proposed.Topic analysis of case-related news is to extract the topic information with preference in the case-related news.However,it is difficult for conventional topic models to extract topic information with preference for involving cases.Therefore,this thesis proposes a method for case-related news filtering of positive-unlabeled learning by integrating topic information,aiming at generating topic information with preference involved in the case.According to the experimental analysis,compared with the benchmark model,the topic interpretability of the proposed method is improved by 5%.(4)A method for case-related news matching,which combines the topic ofcharges and entity knowledge is proposedDue to the wide range of case-related news data sources and diversified content expressions,the semantic and structural differences of case-related news texts involved are quite different,and there are more redundant information,so the conventional text matching methods can not achieve good performance.At the same time,this thesis finds that the case-related news involved in the same case has similar or the same charge topic and entity knowledge,so this thesis proposes a matching method of case-related news that integrates charge topic and entity knowledge.Experimental results show that the F1 value of the proposed method is increased by5.5% compared with the baseline model.(5)A prototype system of topic analysis and matching of case-related news isdesigned and builtWhen we get a new news text,through the case-related news filtering model,the topic analysis of case-related news model and the matching model of case-related news,we can judge whether it is involved or not and match a certain kind of hot cases.And displayed to users to provide technical and platform support for public opinion monitoring.
Keywords/Search Tags:Case-related news filtering, Topic analysis, Case-related news match, Positive-Unlabeled Learning, Classification of charges
PDF Full Text Request
Related items