Font Size: a A A

Research And Application Of Named Entity Recognition In Tourism Domain Based On Lexical Enhancement And Feature Fusion

Posted on:2024-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:J X LiuFull Text:PDF
GTID:2557307145454614Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the significant improvement of people’s material living standard and spiritual and cultural needs,travel has gradually become the priority choice of holiday leisure.At present,the use of intelligent software to solve the problems encountered in travel is a very convenient way for tourists,such as intelligent recommendation and Q&A system for tourist attractions.Therefore,it is of great practical significance to extract key information of application value from tourism texts for the intelligent development of tourism industry,and the importance of named entity recognition as a key subtask in information extraction is selfevident.At present,the research of named entity recognition in tourism field is still in the early stage of development,and there is a lack of more mature public data sets.Moreover,tourism texts are more specialized,with multiple meanings of words,blurred boundaries of some entities,and close connection between entities and contexts,etc.For this reason,this dissertation carries out the following work for named entity recognition in tourism field:(1)Named entity recognition dataset in tourism field is constructed.To solve the problem of the lack of public tourism domain dataset in named entity recognition research,we use crawler tools to crawl the information of national tourist attractions in tourism web pages,preprocess the obtained original corpus,classify entity categories and annotate entities,and finally construct the self annotated dataset TRAVEL about tourism text.The dataset contains 4349 samples divided into 4 entity categories with a total of 10843 entities,which can provide a reusable data resource for subsequent named entity recognition studies in the tourism domain.(2)Based on the characteristics of tourism text,a named entity recognition model based on lexical enhancement and feature fusion is proposed.To address the problems of multiple meanings and blurred entity boundaries in tourism texts,the embedding layer uses a fixed-parameter Ro BERTa(Robustly optimized BERT approach)pre-training model to generate dynamic word vectors and reduce training time,while introducing lexical information so that the character vector contains more entity boundary information.For the characteristics of entities in tourism texts that are closely linked to the context,Multi-Head Self-Attention(MHA)is added after Bidirectional Long Short-Term Memory(Bi LSTM)to give greater weight to key words.Iterated Dilated Convolutional Neural Networks(IDCNN)is added to the coding layer to address the lack of local spatial feature capture in current named entity recognition models in the tourism domain,and the features captured by the Bi LSTM-MHA network and IDCNN network are fused by assigning weights.On the TRAVEL dataset,the accuracy,recall and F1 values of the model were 85.38%,83.87% and 84.62%respectively,which were 8.8%,11.31% and 10.1% higher than the baseline model,demonstrating the performance advantage of the model for named entity recognition tasks in the tourism domain.Also,the experimental results on the public dataset further demonstrate that the model has certain generalization properties.(3)Design and implementation of a named entity recognition system for the tourism domain.To truly apply the research results of named entity recognition in the process of intelligent development in the tourism industry,the lightweight Streamlit library is used to deploy the model proposed in this dissertation,and the functionality of the named entity recognition system in the tourism domain is introduced and demonstrated.
Keywords/Search Tags:Named entity recognition, Tourism domain, Lexical enhancement, Feature fusion
PDF Full Text Request
Related items