Research On The Establishment And Application Ot The Sample Database Of Tangut Script

Posted on:2019-11-11

Degree:Master

Type:Thesis

Country:China

Candidate:W H Yang

Full Text:PDF

GTID:2405330551454403

Subject:Engineering

Abstract/Summary:

The digital information of ancient books is beneficial to the protection and exchange of ancient books and is the main channel for the study of ancient books in modern society.Tangut script is a kind of ancient script that records the Dangxiang nationality.Through the Tangut script in ancient books,we can fully understand the social and historical forms and the national culture of Western Xia Dynasty at that time.Therefore,it is an important way for us to study Tangut script by excavating and preserving the ancient literature of Tangut.However,because of the long history,there are very few ancient books in Tangut period,and there are many phenomena such as paper damage and unclear writing,which hinder the digital development of Tangut script.Nowadays,optical character recognition、machine learning and other techniques will greatly help people to interpret ancient script,but these technologies are based on character databases,which provides training samples and evaluation standards for the character recognition.Therefore,the establishment of the standard,open and universal Tangut Script sample database is the premise and foundation to carry out the research of the Tangut character recognition.The Tangut script sample database not only provides the test samples and evaluation standards for the intelligent recognition algorithm of the Tangut script,but also compensates for the scarcity of specialists who can master the Tangut language system.which provides more convenient scientific research tools and efficient scientific research methods for the Tangutology researchers.and also provides a strong support for the way and content of the digital literature information retrieval of ancient books.At present,the establishment of the sample database for the identification of Tangut script is still in the blank stage.This paper focuses on the research of the establishment and application technology of the sample database of the Tangut script.Firstly,the Buddhist sutra in Tangut script are selected as the data source.Then the scanned ancient texts are preprocessed and texts are extracted.The extracted textual image information is organized into Tangut script sample database,including text sample database and single character sample database.The text database is organized in the form of Excel tabular files.By reading the information in the excel table,the user can easily query Tangut characters and improve the traditional annotation form,while the single-character database is organized in the order of the character frequency.The single-character image file is named strictly according to the regulations,so as to ensure that the researchers of the Tangutology search ancient books and documents through the database.It is easy to find out in which documents Tangut character has appeared and how it has been translated.Finally,based on the sample database created,Tangut script intelligent identification research was conducted.The deep learning model was established using convolutional neural networks to train and learn the Xixia dataset.In order to solve the problem of unbalanced samples,The MLSD is used to expand the samples to improve the performance of the learning and recognition algorithm for the Tangut script.In a word,we established a sample database of Tangut Script with theoretical research and practical application value,which is of great benefit to the development of the digitalization of Tangut script.

Keywords/Search Tags:

Tangut Script, data source, character extraction, sample database, deep learning

Related items

1	Study On Key Techniques For Informatization And Digitalization Of Tangut Character
2	Research On The Recognition Of Tangut Character Based On Deep Learning
3	Research On The Generation Of Handwritten Tangut Character Samples Based On GAN
4	Research On Skeleton Extraction Algorithms Of Calligraphic Characters Based On Deep Learning
5	Optimization And Improvement Of Tangut Character Images Recognition Model
6	Research On Tangut Character Recognition Based On Improved Fuzzy Support Vector Machine
7	The Research Of Oracle Bone Script Detection And Recognition Based On Deep Learning Methods
8	Research On The Generation Of Handwritten Tangut Character Samples Based On Style Transfer
9	Research On Text Detection And Recognition Algorithms For Tangut Ancient Books Based On Deep Learning
10	The Ancient Ruins Of Hyperspectral Data Information Extraction Method Research