Font Size: a A A

Research On Automatic Recognition Algorithm Of Value-added Tax Invoice

Posted on:2019-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q LiaoFull Text:PDF
GTID:2359330542489165Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The VAT(value-added tax)invoice is the accounting documents and tax vouchers in business activities.In order to store,transport and audit the invoice information,We need to put the VAT invoice information on the paper into the computer database for storage.Through the region location,character segmentation and character recognition processes,automatic recognition algorithm of VAT invoice can extract information of VAT invoice,and reduce the impact of huge workload and low efficiency brought by traditional manual input information.Moreover,with the popularity of mobile devices,the VAT invoice recognition algorithms for mobile devices,can obtain VAT invoice information in anytime and anywhere,improve the expediency of enterprises or individuals to get information of VAT invoice,and facilitate the remote reimbursement and tax calculation.This article uses VAT invoice images taken by mobile devices as algorithms input to extract all the machine printing information on the invoice.And the main works are as follows:1)Invoice detection and region location algorithms are givenThrough the analysis of structure and content of the layout,the VAT invoice is divided into the rectangle outer regions and the rectangle inner regions.At the first,we locate the invoice's position in the image using multi-scale template matching algorithm to rectangle;And then,for the rectangle outer regions,the location program adopt the method based on the relative position with the rectangular box to locate the information regions;For the rectangle inner regions,the program locate using scene text detection algorithm combine with the method of connected domain analysis.2)The character segmentation algorithm of VAT invoice is givenA comprehensive solution is given for the touching characters and unconnected characters problem of the VAT invoice.Firstly,we segment the character preliminary using the traditional character segmentation algorithm;And then,we use the drop-falling algorithm to make the touching characters over-segment as far as possible;At last,the character segmentation algorithm based on recognition feedback is used to merge unconnected characters and over-segment characters and to recognize them continuously,meanwhile the program feedback the recognition confidence to merge process until get the highest recognition confidence.3)The algorithm of character recognition for similar characters is givenAccording to the structural features of similar characters,the method of dividing character blocks and judging whether the similar characters exist are given.Through the preliminary matching results of the current character,we can judge that whether the current character exists similar characters.If it exists,we recognize it adopting the character block recognition algorithm that can enlarge the difference between similar words;If not,we use weighted kNN algorithm to optimize the recognition results.In order to verify the performance of the algorithms,we collected 56 VAT invoice images in different environments,include regions need to be located for 13346,the total number of characters is 15,191.Test above invoice,the results show that the correct rate of region location is 97.53%,the correct rate of character segmentation is 98.13%,and the correct rate of character recognition is 96.55%.Among them,the correct rate of Chinese character recognition is 96.25%,the correct rate of number and English alphabet recognition is 95.67%,and the correct rate of other symbols is 99.37%;Considering all the characters of the invoice,the global recognition correct rate of the characters is 93.63%.
Keywords/Search Tags:VAT invoice recognition, Scene text detection, Drop-falling algorithm, Recognition feedback, Similar characters recognition
PDF Full Text Request
Related items