Digital voice teaching system is the main voice teaching tools. Students learn English language through vivid English materials provided by Digital voice teaching system. It is proved that Digital voice teaching system can effectively be interest in learning English and inprove the learning result. In order to share netword resources, digital voice teaching system share all English materials to all teachers and students. Although students can learn more English resources, the students'devices can storage many audio clips with limit memory. In fact, what students really need to learn is the original audio not clips. But how fast and efficient to retrieve the original audio by audio clips is a serious problem. Content-based audio retrieval technology retrieves the original audio by extracting the features, then result the original audios that has the similar features. Digital voice teaching system also can use this technology to retrieve the similar original audios. So, the paper focuses on studying the content-based audio retrieval of digital voice teaching system.Firstly, the paper focuses on studying perceptual hashing and similarity matching technology. Then it merged perceptual hashing and content-based retrieval technology. Lastly, it is proposed that hierarchical similarity matching model that can retrieve the result quickly by Rough-Fine two-step algorithm. In addition, if the system finishs the operation, then the results in list form will return to the client and the student can choose to play the audio file by the form of streaming media or not. |