| With the rapid development of Internet technology,all kinds of information have increased dramatically.Every day on the Internet,a large amount of information is generated,disseminated and stored.Human beings are facing unprecedented information expansion.More and more people like to read some international information in English,but readers often want to read some reports about China.Faced with such a huge information network,people often can not quickly locate information about Chinese elements in English information.How to design a system that can extract Chinese element information from international information scientifically and effectively in order to save user's reading time is a concern of current researchers.In this paper,the requirements of Chinese element extraction are firstly defined according to actual needs,and the system architecture and functional modules are designed in detail.Secondly,the technical scheme of Chinese element extraction is studied in depth.The Chinese element extraction backtracking strategy is proposed.The extraction method based on Chinese element dictionary library is used to extract the conditional random field model.Finally,a Chinese element extraction system was implemented.The system uses related technologies such as web page information collection,named entity recognition,and text retrieval.According to the URL entered by the user,the system automatically collects webpage information,and extracts the Chinese text from the original text through the trained model,and finally displays it to the user in web form.Users can use this system to quickly and easily view Chinese names,place names,food,culture,institutions and other information in English international information.In addition,in order to facilitate the batch extraction operation of some information consulting companies,the system also provides a service for extracting Chinese elements of local English text.Based on the above requirements and the realization goal of the system,the main research contents and work of this paper mainly include the following aspects:(1)Data acquisition:Using Python's Beautiful soup framework to collect information from the English International Information Network and construct the Chinese element feature library.(2)Model construction:The effects of hidden Markov model,maximum entropy model and conditional random field model on element extraction in China are compared and analyzed.A model based on the combination of conditional random field and Chinese element feature library is designed.The results of conditional random field model extraction are extracted twice by using Chinese element feature library in order to achieve better extraction effect.(3)System implementation:The system is designed and implemented based on LNMP architecture.Bootstrap framework is used in the front end,Django framework is used in the back end,and Mysql and Elastic search are used in the database.(4)System testing:testing and evaluating each module of the system.The function and performance of the system meet the requirements of the design scheme.The experimental results show that the Chinese element extraction system designed in this paper can obtain satisfactory extraction results.An open test of the Global Times Mark corpus yielded an accuracy rate with a recall rate and a °F value of 0.952,0.887,and 0.913,respectively. |