There are many types of information systems in the current medical system.The information systems used in hospitals in the same city,in hospitals in different places,and even in hospitals at higher and lower levels,can not communicate with each other.As patients seek care across the country,there is a huge volume of medical documentation that is difficult to manage and maintain and that can not be used by new doctors.It is very important to recognize the medical index data on the paper-based laboratory report form into words and store them in the medical electronic file,which can be used for insurance claim,hospital transfer,long-distance consultation and personal health file.Because of the shaking of hand-held mobile phone,incorrect shooting angle and uneven lighting,the test sheet photo taken by users often appears distortion or tilt in perspective.In this paper,a new algorithm based on image processing is proposed,which can effectively correct the distorted images with perspective distortion and tilt,and select the correct frame of each character in the images to extract the text area,and accurately identify the text content of the test sheet.In this paper,Matlab is used as a tool.The algorithms of edge detection,Hough line detection and center projection transformation are used to correct the perspective distortion and tilt of the single image,the maximum stable region algorithm is used to detect the position and cut the text,and the open source OCR engine Tesseract is used to recognize the text,and the jTessBoxEditorFX is used to correct the result and to re-train the text image data.Finally,by calculating the statistics of the recognition results,we concluded that the miss rate is only 0.6%,the correct recognition rate of Chinese is over 90%,and the correct recognition rate of English and number is over 96%. |