| There is a large amount of public opinion information in cyberspace,which contains people’s opinions,attitudes and emotions.To get the overall opinion and attitude of a certain group or the whole society towards a certain issue,event and policy,we need to take advantage of the public opinion analysis technology to extract information.To do this,it is necessary to classify,organize,and summarize massive data,which requires classification and information extraction of text information.With text Classification technology and named entity recognition technology,we can keep pace with the development of public opinion.When performing public opinion analysis,a text can involve multiple public opinion events at the same time.The single-label model can only analyze one public opinion event,and cannot get use of the multi-label information and the potential associated knowledge between several classification tasks.The traditional named entity recognition method based on tag system is difficult to identify nested entities and discontinuous entities.The span-based method can solve the recognition of nested entities,but it is not elegant enough when faced with discontinuous entities,ignoring the relationship information between words and words within entities.Named entity recognition methods based on deep learning usually require large-scale manually labeled data sets.In this thesis,we explore and study the above problems,the main work are as follows:1.This thesis research on the text multi-label classification model in the context of public opinion.Compared with the model trained on the single-label task,the multi-label classification model can learn richer semantic knowledge.The performance of the multi-label text classification model is better than that of the singlelabel text classification model under general circumstances.With several trained models,voting prediction of several models is generally better than that of a single model.In order to speed up the training of the model,we sorts the training set according to the length of the text sequence,so that the length difference between texts in each batch is minimized,in which case a considerable of padding can be avoided.It can reduces training time under the same hardware and software conditions,and the performance of the model remains stable.2.This thesis takes advantage of the graph represented by the adjacency matrix to model the connection between words,which can naturally be used for the identification of discontinuous entities and nested entities.In view of the lack of data resources,this thesis designs the last decoding layer of the named entity recognition model as a layer independent of entity type and quantity,which is convenient for migration.Data sets from similar fields and scenarios can be used for training.Experiments show that the model designed in this thesis can be competent in named entity recognition task and has the ability of transfer learning. |