Study On Speech Enhancement For Eardrum Stimulated Middle Ear Implant Using Deep Neural Network

Posted on:2024-03-14

Degree:Doctor

Type:Dissertation

Country:China

Candidate:W B Wang

Full Text:PDF

GTID:1528307118482304

Subject:Mechanical design and theory

Abstract/Summary:

PDF Full Text Request

Eardrum stimulated middle ear implant(MEI)is a new type of hearing aid device to compensate for the hearing damage by directly stimulating the eardrum through an actuator.Compared with traditional hearing aids,it has the advantages of wide working band,high definition,no ear blocking effect,and has become one of the main directions for the development of hearing aids.Its working principle is to use a microphone to collect the speech signal,which are then processed by a signal processing module.The actuator then converts the electrical signal from the signal processing module into a vibration signal and inputs it into the inner ear through the middle ear,so that the patient can feel the sound.It can be seen that the performance of the MEI is determined by the results of signal processing.Therefore,enhancing the speech signal input to the actuator is an important part of the MEI.Traditional speech enhancement(SE)methods cannot effectively process non-stationary noises that are common in daily life.With the development of deep learning,deep neural networks have become a research hot topic in the field of SE for hearing aids.However,current SE methods do not fully consider the time-frequency structure information of speech signals and also not considered the specific environment of the MEI.Therefore,it has very important theoretical and practical significance to propose an effective SE algorithm for the MEI to improve the intelligibility and quality of noisy.This thesis focuses on the deep neural network SE method for the MEI,and adopts the method of combining the theoretical analysis,the simulation experiment and the MEI simulation experiment for research.According to the use environment of the MEI,and by making full use of the time-frequency structure characteristics of speech and noise signals,the research is conducted on signal preprocessing,deep neural network learning objectives,network structure,and training data to improve the performance of the SE algorithm.The main research contents are as follows:(1)A deep neural network-based noise and gender classification method is proposed to address the background noise and speaker gender problems in the environment of the MEI.According to the characteristics of the MEI working environment,firstly,a neural network-based voice activity detection(VAD)method is designed to effectively detect speech and non-speech segments.Then,based on the VAD results,a network-based noise and gender classification algorithm is proposed to make the SE model more targeted.Finally,the proposed classification algorithm is used to accurately classify noise and speaker gender under various background noise conditions.(2)A multi-target ensemble SE framework based on the time-frequency structure is designed to address the issues of insufficient utilization of the time-frequency structure information and the disadvantages of the masking and mapping methods.Firstly,this framework integrates masking and mapping two types of SE targets.,and fuses the time-frequency structure of the targets into the framework.Secondly,the characteristics of the time-frequency structure of the targets are analyzed,and the influence of the base matrix dimension on the results is explored.Then,a target sequence floating forward selection method is proposed to select the optimal integrate target.Finally,using the proposed method,the noisy is effectively enhanced in the case of small network complexity,especially in low SNRs and non-stationary noise environments.(3)Aiming at the problems of gradient disappearance and information loss that may occur in deep neural networks,and inspired by the intermediate target method in the previous chapter,a time-frequency structured target SE method based on progressive learning is proposed.First of all,the framework uses structured targets which provide more robust targets for the network as intermediate targets of the network to alleviate the problems of information loss and gradient disappearance that the network may occur in the network.Secondly,a new optimization function is proposed for the structured progressive network,and the impact of the new optimization function and the number of layers of the progressive network on the enhanced performance are analyzed.Then,a post-processing method is proposed for the output of each layer of the progressive network.Finally,the feasibility of the proposed SE method is verified by simulation experiments.It can be found that the classification method has a great impact on the enhancement results of the proposed method,and the progressive method has no advantage in invisible noise environments.However,in visible noise environments,especially in non-stationary noise,the proposed method is more competitive.(4)According to the characteristics of specific speakers in the MEI environment,a personalized SE model based on noisy speech is proposed.Firstly,a progressive multi-target SE network is proposed,which combines the advantages of multi-objective network and progressive network according to the research in the previous two chapters.Secondly,a personalized SE method based on noisy speech is proposed,which not only simplifies the problem of obtaining training data but also solves the generalization problem of the general enhancement model.Then,the impact of the characteristics of personalized training data on enhancement performance is analyzed.Finally,the enhancement method proposed in this thesis can make the enhancement model more targeted,so that the enhancement effect is better than the general model.Especially when the SNR of the personalized training data is high,the performance is close to that of the clean speech training data.(5)In order to verify the feasibility and superiority of the proposed method,an eardrum stimulated MEI SE test platform is established.Firstly,a piezoelectric stack is selected as the MEI for the experiment based on the performance and index requirements of the MEI.Secondly,the physical model of the middle ear is constructed at 1:1 by 3D printing technology,and its feasibility and accuracy are verified.Then,the eardrum stimulated MEI SE test platform is established by coupling the MEI and the middle ear physical model.And finally,the SE algorithm is applied to the above test platform,and the feasibility of the algorithm is verified by measuring and comparing the output displacement of stapes before and after speech enhancement.The experimental results show that the proposed SE algorithm can effectively suppress the environmental noise.And it can be found that there is less noise in the stapes output displacement,the displacement vibration signal can be effectively introduced into the cochlea to obtain a better hearing performance.This thesis has 98 figures,24 tables and 197 references.

Keywords/Search Tags:

Eardrum stimulated, middle ear implant, depth neural network speech enhancement, time-frequency structure characteristics, personalized training

PDF Full Text Request

Related items

1	Study Of Speech Enhancement In Cochlear Implant Based On Characteristics Of Hearing
2	Study On Real Time Speech Enhancement Of Cochlear Implant
3	Speech Analysis and Single Channel Enhancement to Improve Speech Intelligibility for Cochlear Implant Recipient
4	Research On Speech Enhancement Algorithm Based On Neural Network In Complex Environment
5	Research On Single-channel Speech Enhancement Method Based On Deep Neural Networks And Time-frequency Masking
6	Speech Enhancement Based On Iterative Mask Estimation And Generative Adversarial Networks
7	Research On Speech Enhancement Of Implantable Middle Ear Hearing Device Based On Deep Neural Network
8	Research And Implementation Of Speech Enhancement Based On Domain-Adversarial Training Of Neural Networks
9	Speech Enhancement Based On Deep Neural Network And Recurrent Neural Network
10	Research On Key Techniques Of Mono Speech Enhancement