In recent years,with the continuous development of deep learning,impressive achievements have been made in speech emotion recognition,which has great advantages over some traditional algorithms.The integration of deep learning and classroom teaching has led to new educational reforms.For example,air classrooms,Ding Talk classrooms,etc.can sign in online,enabling teachers and students to interact online,breaking the traditional classroom teaching model;at the same time,due to the maturity of deep learning technology.and complete,many colleges and universities have also begun to use this technology to realize the recognition of body movements and expressions of classroom students.To this end,this paper will introduce deep learning technology,and take the students’ speech emotions in classroom teaching as the object of recognition,so that teachers can grasp the emotional atmosphere of classroom teaching,adjust teaching methods to stimulate students’ enthusiasm for learning,so as to better improve academic performance.The main tasks of this research include:Firstly,the method of multi-feature fusion is adopted in the selection of emotional features.According to the research foundation of predecessors,the emotional features of speech are roughly divided into three parts: prosodic features,spectral features,and timbre features,and the commonly used emotional features are selected for introduction.Some improvements are made to the processing method of emotional features,and multiple single emotional features are spliced and fused to make up for the lack of emotional ability of a single feature.Secondly,increase the width of the neural network to improve the recognition accuracy.Considering that in order to extract deeper speech emotion features,most studies usually choose to increase the depth of the network,but there are few studies on the width of the network,so some improvements are made on the basis of the deep neural network,and a hybrid multi-scale is proposed.Convolution combines a two-layer LSTM recognition model and a multi-channel convolutional speech emotion recognition model based on an attention mechanism.By widening the width of the network to extract richer emotional features,and by combining the improved model with fusion features,the recognition accuracy of the public dataset has been improved,which verifies the effectiveness of the model.Finally,the multi-channel convolution combined with the attention mechanism model is used for classroom speech emotion recognition.The data set in classroom speech emotion recognition comes from classroom teaching videos on the Internet.First,the speech data part in the teaching video is extracted,and then the speech segmentation and corresponding emotion classification are performed,and the imbalanced data set is equalized.Finally,the real classroom speech emotion is recognized on the model,which verifies the validity of the model. |