During the whole process of a patient’s medical treatment,the medical test list is an important basis for both doctors and patients to refer to.However,with the increase in the demand for medical treatment,the paper test list shows various inconveniences in the actual use process.One of the important methods to solve the above problems is to identify and convert electronic data into formatted form through OCR and use it further.Recent OCR-related research work is mainly devoted to providing a general OCR identification method.Relevant research work has provided more research ideas for the identification of OCR in the general field,but for the text layout of medical test sheets with its particularity and complexity,general OCR cannot solve such problems well.There are few related studies on the identification of medical test sheets.Therefore,on the basis of collecting 1125 real medical test sheets,this paper uses the data self-generation method based on SVG to generate a large number of medical test sheet data,which solves the pain point of the related research without public medical test sheet data set,and provides the research of this paper.data base.The experimental results show that adding the generated medical test sheet data to the model training process can effectively improve the performance of the model.Aiming at the problems of various formats,complex contents,and low recognition accuracy of general-purpose OCR,this paper proposes a layout analysis model for medical test orders that integrates regional features and table line features.Innovatively introduces layout analysis work into the field of medical test order identification.Considering that the medical test sheet is arranged in a special style with few lines,the model is dedicated to identifying the table lines and regional features of the medical test sheet,and realizes the layout analysis of the test sheet,so as to carry out more targeted text recognition work in the future.Experiments show that in the layout analysis experiment of medical test sheet,the experimental average AP values of the model proposed in this paper are 0.712 and 0.706 on patient test items and full test sheets,it can be seen that this model can better complete the layout analysis on the medical laboratory data.In the traditional OCR text recognition experiment,the recognition and positioning effect of horizontal text is good,but the recognition and positioning performance of text with oblique distortion and distortion represented by medical test sheets is poor.In order to solve the above problems,this paper further proposes an improved text detection method based on deep learning,which is dedicated to solving the problem of recognition and positioning of oblique and distorted text.The experimental results show that on the medical laboratory data set,the model proposed in this paper obtains the experimental effect of the F1 value is 0.833,compared with the traditional method,it has a great improvement,and can better solve the problem that the previous text recognition model does not perform well for special medical test sheets. |