Tibetan Pre-Trained Model Based On ALBERT And Its Application

Posted on:2021-03-17

Degree:Master

Type:Thesis

Country:China

Candidate:L Li

Full Text:PDF

GTID:2415330611452116

Subject:Engineering�Software Engineering

Abstract/Summary:

PDF Full Text Request

In the field of natural language processing,we can pre-train a model on unlabeled datasets and fine-tune the model on labeled datasets to save time and computing resources when we are training a neural network.With the help of the pre-trained model,human beings have made great breakthroughs in many natural language processing tasks.The study of Tibetan pre-trained model can not only effectively deal with the lack of Tibetan labeled datasets,but also promote the development of Tibetan natural language processing research.At present,the research of Tibetan language pre-trained model is still in the exploratory stage,but its research has important theoretical significance and wide application value for the research of Tibetan natural language processing.To this end,this thesis carried out relevant research on Tibetan pre-trained model.The main research contents of this thesis include:1.There is no public dataset in Tibetan at present,this thesis scraps Tibetan corpus texts from Tibet People's Website,Qinghai Tibetan Network Radio Station Official Website,Qinghai Provincial People's Government Website,and then makes a training dataset for the pre-trained model based on the corpus provided by Professor Dora of Northwest Minzu University.At the same time,it collects data from the Chinese Tibetan Netcom to make a Tibetan text classification dataset and a Tibetan abstract extraction dataset.2.Aiming at the problem of insufficient Tibetan labeled dataset in Tibetan downstream tasks,this thesis trains the Tibetan ALBERT pre-trained model to reduce the need for labeled datasets.Finally,the accuracy of the pretraining model reached 74% in the masked language model task and 89% in the sentence-order prediction task.3.By comparing the performance differences between the ALBERT Tibetan text classification model and GBDT,Bi-LSTM,and TextCNN in text classification tasks,we verified the effectiveness of the Tibetan ALBERT pre-trained model in text classification tasks.At the same time,in order to solve the problem of sample imbalance,we use focus loss function to train the ALBERT Tibetan text classification model,the results show that the prediction results of small sample category are improved.4.The effectiveness of the Tibetan ALBERT pre-trained model in the downstream task was further verified through the Tibetan extraction abstract extraction comparison test.

Keywords/Search Tags:

Tibetan, pre-training, ALBERT, text classification, abstract extraction

PDF Full Text Request

Related items

1	Research And Implementation Of Tibetan Text Classification Based On Ada Boost Model
2	Research And Implementation Of Tibetan Text Classification Based On MLP And SepCNN Models
3	Research On Sentiment Classification Technology Of Tibetan Text
4	Text Analysis Of Speech Synthesis Based On Statistical Parameters Of Tibetan Language In Specific Fields
5	The Research And Implementation Of The Tibetan Textual Automatic Classification Based On The Web
6	Research On Automatic Label Algorithm Based On Music Content
7	Research On Tibetan Sentiment Analysis System For Social Media
8	Research On Text Relationship Acquisition In Thangka Domain
9	The Application Of Form Extraction In My Creation Of Abstract Sculpture
10	Be Geared To The Needs Of Tibetan Information Processing Of The Loanword