| Since the NPC standing committee authorized the supreme people’s procuratorate to initiate pilot public interest litigation,the field of public interest litigation has been continuously improved from theory to practice.The discovery of clues in public interest litigation is the prerequisite and prerequisite for handling a case.The quantity and quality of clues determine the final effect of handling a case.With the help of text mining technology,we can broaden the sources of public interest litigation leads by monitoring network data and promote the development of "Internet + justice".As an important form of data carrying,online texts are also an entry point for finding public interest litigation leads.This article is based on the perspective of the procuratorial organs,and takes the platform construction project of a public interest litigation case command center in a province as the background.It mainly studies the identification of the public interest litigation network clues,the judgment of the responsible parties of the clues,and the recommendation of relevant legal provisions to broaden the channels of public interest litigation leads and improve the work efficiency of the procuratorate.This paper uses natural language processing and mining methods to realize the identification of online public interest litigation clues.It mainly includes:First,preprocessing the data of potential public interest litigation source clues obtained by the crawler,including data cleaning,manual labeling,etc.Secondly,Chinese text segmentation,keyword extraction,text quantification,etc.are performed on the text data,and the text data labeled as clues are displayed visually to discover the status of online public interest litigation clues.Third,the text classification model fastTextmodel and TextCNN model are used to accurately identify the clue data in the potential case source clue data.Among them,the fastTextclassification model has an advantage over short text clue title data,and the TextCNN model performs better on the clue details data model.This paper also combines the advantages of the two models to establish a hybrid classification model based on fastTextand TextCNN.The hybrid model has improved the accuracy of identifying clues.In order to prioritize important public interest litigation clues,this article designs and constructs public interest litigation clues filing index,and uses the clue data for instance verification.Finally,the similarity of the case source data that has been judged to be a clue is used to calculate the similarity between the case source clue,the rights and responsibilities list data,and the laws and regulations data,so as to determine the responsible subject institution corresponding to the case source clue and recommend related laws and regulations. |