Font Size: a A A

Research And Implementation Of A Vision-based Piano Transcription System

Posted on:2021-11-02Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2505306104986329Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Automatic Music Transcription(AMT)is the process of converting acoustic music signals into symbol annotations,often based on audio information for analysis.However,multiple pitches can overlap each other at the same time,so it is difficult to obtain accurate recognition results only by analyzing the audio,To solve this problem,computer vision-based approaches are adopted for transcription.In the existing research,the vision-based piano transcription system mainly includes two essential algorithms: a piano keyboard detection based on hough transform and pressed key detection based on weak classifier,but the accuracy and robustness of the two algorithms above need to be improved in complex environments.In this paper,we implement a robust and higher performance visual piano transcription system.The system contains four modules: piano keyboard registration,hand detection,automatic background update and pitch detection.The system takes piano transcription video as input.Firstly the background image and key position are determined through the piano keyboard registration module.Then the range of the hand is obtained through the hand detection module for each frame,and the background image is updated through the automatic background update module to prevent changes in lighting from affecting the result.Then difference image of each frame and background image is obtained.Finally,the pressed keys are detected by the pitch detection module to obtain the transcription result.The main contributions in this paper are the following four aspects:(1)In piano keyboard detection,Aiming at insufficient detection ability of hough transform,semantic segmentation is used for piano keyboard detection,which achieves the most accurate results so far;(2)In pressed key detection,Aiming at insufficient performance of the current pressed key detection classifier,we design and implement a CNN model suitable for detecting pressed keys,which outperforms the state-of-the-art approaches byexperimental verification;(3)The impact of different environments(light position,camera position,light intensity)on pressed key detection is discussed,and the best suggestions for deploying the system are given.;(4)in view of lacking public datasets in the field of visual piano transcription,we further propose a new dataset for visual transcription of piano music(Vision Piano),which includes the data recorded in the laboratory(Piano Dataset2)and the video data downloaded from the network(Piano Dataset3).The piano transcription dataset used in this paper includes the dataset proposed by Akbari(Piano Dataset1)and Vision Piano.The system’s f1-measure is 96.5% on the Piano Dataset1,and the f1-measure in Piano Dataset2 and Piano Dataset3 are 95% and93% respectively,it is state-of-the-art.
Keywords/Search Tags:Automatic Music Transcription, Multi-pitch Estimation, Convolutional Neural Network
PDF Full Text Request
Related items