Research And Application Of Multi-Speaker Speech Recognition

Posted on:2023-05-24

Degree:Master

Type:Thesis

Country:China

Candidate:R Y Xu

Full Text:PDF

GTID:2568306914959389

Subject:Electronic and communication engineering

Abstract/Summary:

Multi-speaker speech recognition is a speech recognition technology for special scenarios.It can separate the speech of each speaker and transcribe it into text in the scenario where multiple speakers are speaking at the same time.This technology can serve as a novel solution for needs such as conference transcription and paper recording.There are generally two scenarios for multi-speaker speech recognition.One is the multichannel scenario.This scenario means that there are multiple microphones on the scene to collect sounds at the same time.The multispeaker speech recognition task based on this can therefore be done with the help of Multi-channel implicit microphone position information becomes easier,while in real life is more common in single-channel scenarios,that is,there is only one microphone in the scene to collect sound,which greatly increases the difficulty of this task.The goal of this paper is to design a multi-speaker speech recognition system that can work in a single-channel scenario based on deep learning technology,and the system recognition performance can be comparable to that of a single-speaker speech recognition system.This paper mainly takes speech separation as the research focus.The main contents of the paper are as follows:1.Based on the latest research results in the current academic community,implement a single-channel speech separation system based on deep clustering,and then innovatively combine the very popular graph convolutional network(GCN)on this basis,and propose the sliding window algorithm to solve the difficulties encountered during training,thereby improving the performance of deep clustering systems.2.Based on the latest research results in the current academic community,implement an end-to-end single-channel speech separation system based on permutation invariant training,and on this basis innovatively use a one-dimensional convolutional network to combine it with the speech separation system described in(1)is coupled to perform multi-loss function training,thereby realizing a composite system with excellent separation performance.3.Based on the implementation of the end-to-end speech recognition system based on Transformer,the implemented speech separation system is combined with it,and finally a complete multi-speaker speech recognition system is realized.The multi-speaker speech recognition system that combines separation and speech recognition can achieve good recognition results under the test of the LibriMix dataset.

Keywords/Search Tags:

Single-channel Speech Separation, Deep Clustering, Graph Convolution, End-to-end, Speech recognition

Related items

1	Research On Two Methods Of Single Channel Speech Separation
2	Research On Single-channel Speech Separation Technology Based On Deep Learning
3	Research On The Single-Channel Speech Graph Representation And Graph Signal Enhancement Algorithms
4	Single Channel Speech Separation Methods Based On Deep Neural Network
5	Rsearch And Implementation Of Single Channel Speech Separation With Unknown Number Of Speakers
6	Research On Multi-Speaker Speech Separation And Speech Recognition In Noisy Environment
7	Research On Speech Separation And Recognition Based On Deep Learning
8	Research On Single-channel Speech Separation Technology Based On Dictionary Learning And Deep Neural Network
9	Research On Single Channel Speech Separation Algorithms Based On Deep Learning
10	Research On Speech Separation Technology Based On Deep Learning