| With the rapid development of computers’ speed and the outbreak of AI upsurge,the method based on deep learning has become the mainstream one to improve the performance of speech recognition.A large amount of training data is a necessary condition for speech recognition to achieve good results.However,at present,it lacks of enough training data for many languages,which limits the speech recognition’s development of the specific language.This paper studies the commonness of different languages from the perspective of articulatory features as the cohesive link between different languages and obtain a relatively good performance of cross-language speech recognition.The main research work and innovation of this paper are as follows:Cross-language phoneme recognition by rule driven.First method is that the phoneme mapping table was established according to the similarities and differences of phonemes in different languages by means of the international phonetic alphabet.Then the model is trained by source language speech data,then gets the output result using the model by the target language’s data and maps the result between phonemes.Finally it obtains the recognition result under the phoneme system of the target language.Phonemes of different languages are coded according to the phoneme characteristics in the second method.By training the classifiers of multiple articulatory features,the relationship between different languages is established by the coding of the articulatory features,and the speech recognition is carried out.In the second method,it is proved that the speech feature based on coding has better recognition performance on the second kind of phonemes than the direct mandatory mapping.For example,in the experiments of cross-language phoneme recognition with Chinese as the source language and English as the target language,the former is nearly 10 percentage points lower than the latter in phoneme error rate.Cross-language phoneme recognition is through the articulatory feature by data driven.From the relationship between the acoustic features and phonemes of a large amount of source language data,we can learn the commonness of languages through the deep neural network,that is articulatory features.In the subsequent recognition process,the Softmax classifier about the target language is trained in advance,and then the target language speech recognition is carried out.The obtained phoneme recognition system has better cross-language recognition performance than the phoneme recognition system based on the encoding method.For example,in the cross-language phoneme recognition experiment of Chinese as the source language and English as the target language,the phoneme error rate of the former is 26.1 percentage points lower than that of the latter.In addition,the phoneme recognition performance of this method and the method of knowledge-driven method in the same language is compared.The error rate of the phoneme recognition using the articulatory feature by data-driven is improved by nearly 7 percentage points. |