| With the rapid development of office automation technology,E-Government has become an important field in the development of social information,and electronic official documents also be very good application,but the review of electronic official documents is facing many challenging problems.In addition,deep learning has made breakthroughs in the application of computer vision and natural language processing,and has certain application prospects in image and text processing.Therefore,in view of the intelligent review of electronic official documents,it has certain social significance and practical application value to put forward an algorithm based on deep learning.According to the characteristics of official seal and official text of electronic official documents,the paper studies the model of effectively identifying the seal content and extracting the information of the body of the document.Specifically,in view of the recognition problem of official seal,the CRAFT model is used to detect the text of seal image according to its bending characteristics.Then geometric post-processing is carried out on the seal image to obtain nearly parallel text images according to the probability of single character central region and adjacent character region center in the detection results,which are then sent into the CRNN network model for recognition.Finally,in order to improve the recognition accuracy,FASPell network is used to correct the recognition results on the text font.In addition,in view of the electronic official document text information extraction problem,text detection and recognition are also carried out on the original image,and then the recognition results are sent to the CARSEL network for relation extraction.Relevant information is extracted according to the subject and object information acquisition of specific relations.The practical experiments show that in the processing of the CRAFT model,the average processing time of one image is 11.9 ms,and the average processing time is 4ms when the probability of the center of the adjacent character region is enhanced,namely the model optimization.In the recognition experiment,CRNN network has faster convergence speed and better effect compared with Densenet network.Finally the result loss on the test set was 0.226,the accuracy rate reached 96.2%,and the average recognition time was9.53 ms.In the correction experiment,the original pre-training modelof Bert is fine-tuned,and CSD algorithm is used to filter it.Finally,the error detection and correction rates is89.55% and 72.19% respectively at character level.In CARSEL relation extraction experiment,the accuracy rate in Chinese and English reaches 77.23% and 93.13%respectively. |