Optical music recognition is a technology that analyzes the musical semantic content from sheet music images to realize computer score-reading.It has remarkable value to explore its application in the fields of music information retrieval,music analysis,music search engine etc.Recognizing low-quality music scores is still a major difficulty in the field of optical music recognition.The traditional recognition algorithm based on the general framework has complicated steps and,many sub-tasks.Though an end-to-end strategy in deep learning effectively simplifies the general framework,there are still problems such as long training time and the limited recognition accuracy of low-quality scores.To address such problems,we proposed a low-quality music score recognition algorithm C-SE-Bi SRU with attention mechanism and simple recurrent unit.In order to enhance the network’s feature extraction ability of note and improve the accuracy of model recognition,we proposed a feature extraction network that combines a convolutional neural network and a squeeze-excitation module.This network introduces the squeeze-excitation module after the convolution operation to form a characteristic channel high attention mechanism.Through the squeeze-excitation processing,the transmission of interference information such as staff lines,non-note ink,and noise in low-quality music scores is suppressed.Aiming at solving the time-consuming problem of network model training,we proposed a note recognition and classification network that combines a bilateral simple recurrent unit and a connectionist temporal classification loss function.In addition,the network does not require strict alignment of notes and labels,which simplifies data processing.Verify the performance of the algorithm we proposed through experiments.First,verify the effectiveness of each improved part;Secondly,through comparing C-SE-Bi SRU with the benchmark and the same type of score recognition network on the CAMERA-PRIMUS data set,we conclude that the symbol error rate of the agnostic encoding format of C-SE-Bi SRU is as low as 1.78%,and the training time of the model is decreased by 50%;Thirdly,analyze and discuss the impact of the agnostic and semantic encoding formats on the performance of the neural network;Through visualizing the typical low-quality score results identified by C-SE-Bi SRU and comparing it with commercial score recognition software,we conclude that C-SE-Bi SRU has a better recognition performance on low-quality scores;Finally,analyze the error results of note recognition to provide direction for further improvement. |