| As a basic task in the field of natural language processing,named entity recognition plays an active role in text comprehension and translation.Due to the differences between different languages,it is difficult to transplant the traditional Chinese and English named entity technology into Cambodian.In order to enrich the theory and application of Cambodian natural language processing,this paper uses BiLSTM-CRF model to study the Cambodian named entity recognition,and uses the topic vectors based on HDP topic model as input features of BiLSTM neural network.The main work of this paper is as follows:(1)Aiming at the problem of polysemy and polysemy in single word vector,a method of constructing subject word vector based on HDP topic model is proposed.This method incorporates topic information into a single word vector.Firstly,the HDP topic model is used to get the topic label of a word.Then it is regarded as a pseudo word and introduced into Skip-gram model to train the topic vector and the word vector.Finally,the topic vector of the text topic information is cascaded with the word vector obtained after word training to obtain the topic vector of each word in the text.Compared with the word vector model without subject information,this method achieves better results in word similarity and text categorization,so the subject word vector obtained in this paper has more semantic information.(2)Aiming at the problem that traditional named entity methods rely too much on artificial feature engineering,a Cambodian named entity recognition method based on BiLSTM-CRF neural network is proposed.On the one hand,the input feature of BiLSTM neural network model is the keyword vector with subject information and word information.On the other hand,the output of BiLSTM neural network model does not consider the order between output tags,which results in poor entity recognition effect.In this paper,the output of BiLSTM neural network model and the entity feature of Cambodia are used as the input feature of CRF model.Using CRF model to realize Cambodian named entity recognition.The experimental results show that the method can improve the recognition effect of Cambodian named entity.(3)A Cambodian named entity recognition prototype system based on BiLSTM-CRF neural network is constructed.According to the corpus collected and the results obtained from the experiment,a prototype system of Cambodian named entity recognition based on multi-feature neural network is designed and developed.The necessary tools and system framework for building the system are introduced.The design process of the system is described in detail,and the results of Cambodian named entity recognition are displayed. |