| With the rapid development of Internet technology,the information on the network is growing explosively.Both the scale of and the types of information are continuously expanding.At the same time,in various fields,the successful application of large amounts of data has announced the arrival of the era of big data.Big data now plays an increasingly important role in the development of society,and its value has been generally recognized by the society.In recent years,the trial of judicial cases has become more and more transparent along with the construction of legal system in our country.The publically available of judicial documents on the Internet is a typical example.By covering the processes and results of court cases,judicial documents can provide rich judicial information,including judgment courts,case numbers,litigation requests of parties,case names,judgment results,applicable laws,and so on.All of these are the core element of the court’s "big data".Through the in-depth mining of these information,we can summarize the rule of case trial,predict the trial trend,improve the judicial credibility.The mined information can be used to support the realization of judicial justice and the construction of a legal society.However,the judicial documents is semi-structured and domain-specific,consisting of both the formalized law language and natural language.On the other hand,the writing style of them is largely determined by the judges.All of these make the judicial documents with characteristics such as polymorphism,heterogeneity and randomness.Therefore,how to extract valuable information from such text is of great significance and value.Under the above background,we apply reinforcement learning for information extraction on judicial documents.Specifically,we propose an information extraction method based on reinforcement learning(IEM-HRL).The main research content of this article includes:(1)According to the characteristics of judicial documents,we analyze the process of information extraction on them,and divide it into two steps.Firstly,the target data in dynamic judicial documents is located.Secondly,the needed information is extracted according to the learned rules.(2)For the problem of data location,in view of the heterogeneity and polymorphism of the judicial documents,we use the idea of reinforcement learning to treat the text as the environment,and let the agent interact with the environment continuously,so as to learn the optimal strategy to solve the problem by trial and error.At the same time,we use the identity of individual agent and system’s long-term goal in multi-agent system,introduce the strategy coordination mechanism,exchange information between agents to discover trend information,and then use the dynamic knowledge obtained online by shaping technology to inspire agent,so as to speed up agent learning.(3)For the problem of rule extraction,we first use the length of the target document,parts of speech of words,and the number of stop words as conditional attributes.Then,the decision attribute values are calculated based on the combination of conditional attributes and priori knowledge.The decision information tables are also created.Finally,rough set theory is used to reduce the decision information tables and extract decision rules.These extracted rules form a rule base.(4)Based on the above method,we design and develop an information extraction system for domain-specific documents.Specifically,an intelligent Agent locate the target data in documents efficiently and accurately.The decision rules is then used to extract the target information based on the location.Experimental results show that the proposed method in this paper can effectively extract information from judicial documents,with better accuracy,higher efficiency,and satisfying robustness. |