| Voice is an important tool for human communication.The quality of the voice has a direct impact on people’s speech expression.The vocal fold is an important part of the vocal system,and its pathology is an important cause of problems with the voice.By using acoustic analysis technology to analyze and study pathological signals,objective assessment of the quality of voice can be achieved,which has clinical guiding significance for the diagnosis and treatment of laryngeal diseases.With the deepening of research,clinical research on subdivision of vocal cord diseases has become a hot spot in pathological voice detection.With the improvement of computer processing ability,deep learning has achieved remarkable results in the field of speech recognition.This paper mainly studies the classification of vocal fold diseases and using the convolutional neural networks to identify the pathological voice.It mainly includes the following three aspects.1.In this paper,the limitations of parameter selection in the classification of vocal fold diseases,from the perspective of nonlinearity and statistics,a method for exacting voice features using wavelet packet multi-scale analysis is proposed to improve the recognition rate of vocal fold diseases.Firstly,the original voice signal is decomposed into sub-signals of different frequency bands by using wavelet packet technique.Then the nonlinear characteristics Hurst parameter,2-Rényi entropy,box-counting dimension and attractor are extracted from different frequency bands to evaluate the contribution of each frequency band in detecting and classifying pathological voices.Finally,the extracted multi-scale features are combined,and SVM is used to classify the normal and pathological voice of the different databases.The average recognition rates in the MEEI database,self-built clinical database,and SVD database are 99.15%,97.87%,and 96.76%,and the highest recognition rate is 100%;in the MEEI database and SVD database,the normal,vocal fold paralysis and vocal fold non-paralysis are identified,the average recognition rates obtained on the two databases are 98.32% and 92.89%.The experimental results show that the recognition rate of vocal fold diseases can be effectively improved by extracting features after multi-scale analysis of wavelet packet.2.In the process of pathological voice classification,the classification accuracy of the traditional machine learning algorithm depends on the validity of the extracted features.In order to solve this problem,the delay time and convolutional neural network are proposed to classify the normal and pathological voice.Based on the chaotic characteristics of the signal,the onedimensional speech signals are transformed into two-dimensional matrix M*N by using the delay time parameter,based on the LeNet-5 network model,a neural network model with three convolution layers is built on the platform of Matlab to realize the classification of the normal and pathological voice.The average recognition rate is 94.64% and 94.26% when the voice classification is performed on the MEEI database and the clinical database.The experimental results show that the effectiveness of introducing the delay time characteristic parameter when converting the one-dimensional signal into a two-dimensional matrix.Pave the way for the introduction of phase space theory in the next experiment.3.Nonlinear dynamics characteristics can effectively describe the acoustic characteristics of normal and pathological voice.Phase space reconstruction theory is introduced into the feature extraction of voice signals.Phase space reconstruction of normal and sick voice signals is realized by using delay time and embedding dimension theory,and the reconstruction trajectory of voice signals is obtained.According to the principle of the three views,the reconstructed phase diagrams are projected in three directions((9)),(9)+ )),((9)),(9)+ 2)),((9)+ ),(9)+ 2)).The one-dimensional speech signal is transformed into two-dimensional image,and the projection of three directions is used as the input of three channels of convolution neural network RBG.The normal and pathological voice is classified by constructing VGG-like convolution nerve.The average recognition rates of normal and pathological voice are 99.42%,95.88% and 97.30% in MEEI,self-built clinical and SVD databases.The average recognition rates are 96.04% and 92.27% for normal,vocal fold paralysis and vocal fold non-paralysis voice in MEEI database and SVD database.The experimental results show that the method has high classification recognition rate and good robustness,and has certain universal applicability for the recognition of the normal and pathological voice. |