| Word embedding is a technique that represents words in numerical form and is one of the key research directions in natural language processing.Tibetan word vector representation technology plays an important role in Tibetan natural language processing,and its performance has an extremely important impact on the downstream tasks of Tibetan natural language processing.In order to improve the performance of Tibetan word vector representation,this thesis employs deep learning techniques to enhance the performance of Tibetan word vector representation,taking into consideration the actual characteristics of Tibetan language.We study the deep learning-based Tibetan word vector representation technology from the perspectives of performance analysis of Tibetan word vector representation technology and the TWER_BERT model-based Tibetan word vector representation method.The main work includes:1.Performance analysis of Tibetan word vector representation technologyThere are many factors that affect the Tibetan word vector representation technology,such as the size of the corpus used to train model,the parameters of the model itself,and so on.This thesis analyzes the effects of corpus size,word vector dimension,model training iterations and learning rate on the performance of Tibetan word vector representation through experiments,and gives some suggestions for generating high-quality Tibetan word vector.2.Tibetan word vector representation based on TWER_BERT modelMost of the word vectors represented by traditional deep learning models are static,and can not generate appropriate word vectors according to the specific context.The BERT model integrates word position information and Self Attention mechanism,which can dynamically represent word vectors.Based on the actual characteristics of Tibetan language,this thesis combines the actual characteristics of Tibetan,firstly designed the Tibetan word vector acquisition algorithm based on the BERT model,and then proposed a T-Attention mechanism that is suitable for obtaining Tibetan word vector and constructed the TWER_BERT model,Finally,the effectiveness of the TWER_BERT model is verified through experiments..3.Design and implement an acquisition system of Tibetan word vector based on TWER_BERTBased on the design of TWER_BERT model,an acquisition system of Tibetan word vector based on TWER_BERT is designed and implemented.The system is implemented using the Tkinter framework in Python,Which can easily obtain the Tibetan word vectors and calculate the similarity and correlation between Tibetan words.The system has a concise interface,easy operation,and certain practical value. |