Font Size: a A A

Research On Tangut Character Recognition Based On Improved Fuzzy Support Vector Machine

Posted on:2020-06-09Degree:MasterType:Thesis
Country:ChinaCandidate:X C LiuFull Text:PDF
GTID:2415330578955900Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In our daily life,we often encounter the need to convert text on paper into digital information that can be stored by electronic devices.Based on this requirement,Character recognition technology is born.Character recognition belongs to the subject of pattern recognition.It is based on OCR(Optical Character Recognition).It preprocesses and extracts the features of the acquired pictures,and then chooses the appropriate classifier to recognize different characters.Obviously,character recognition has a very wide range of applications.Especially in postal service,examination,bills and many other situations which need to recognize complex handwriting,and high precision requirements of the occasion.The Tangut character recognition technology studied in this paper is a new field to be developed in recent years.As an inseparable part of Chinese civilization,Tangut civilization has been waiting for people to explore.As the carrier of Tangut civilization,the identification of ancient Tangut language is particularly important.Unlike modern Chinese characters,the unique character structure created by reference Chinese characters is very complex,and the parts of the characters are very similar,with an average stroke of up to 25 pictures.It is extremely difficult to digitalize.In addition,the main carriers of Tangut language unearthed at present are manuscripts and engraved characters.The location and layout of the same word in different documents are different,which brings great difficulties to the recognition work.In order to solve this problem,aiming at the problems of redundancy of preprocessing data,complexity of features and insufficient generalization ability in traditional character recognition technology,After using HOG feature extraction,this paper proposes a Tangut character recognition technology based on improved fuzzy support vector machine.Fuzzy Support Vector Machine(FSVM)is a new classifier proposed by Lin Chun-fu et al.for the problems of mixing and missing points when SVM is extended to multi-classification.In this paper,a membership function based on hyperplane distance measure is proposed to improve the fuzzy support vector machine.New functions are designed by comparing the distances of sample points to two kinds of central planes and class centers by replacing the role of class centers with the planes of positive and negative class centers.According to the sample distribution,the classifier is optimized by assigning different weights to different sample points.Aiming at non-equilibrium data classification,this paper introduces a new constraint formula to the mathematical model of support vector machine,which reduces the membership function assignment error,enhances the generalization ability of the new algorithm and further optimizes the classifier.In this paper,the improved Fuzzy Support Vector Machine is applied to Tangut character recognition and experiments are carried out.Compared with several common algorithms,the advantages and disadvantages of each algorithm are analyzed.The experiment results show that the new method has the advantages of fast convergence and high recognition rate,and has certain application value.The research significance of this paper mainly has four points: first,it is beneficial to the regeneration and protection of Tangut characters.The technology proposed in this paper realizes the digitization of Tangut characters and stores the ancient books in the form of images in the computer.Secondly,it improves the efficiency of text sorting.The digitized image database greatly facilitates the scientific research workers;Thirdly,the text recognition model is provided for reference.Similar to the situation of the Tangut characters,there are khitan,privet and so on.Finally,the information retrieval of ancient books and documents is realized.For a character set with high similarity such as Tangut characters,it is of great significance to establish image database and realize free retrieval of information.
Keywords/Search Tags:Tangut Character Recognition, Feature Extraction, Fuzzy Support Vector Machine, Unbalanced data classification
PDF Full Text Request
Related items