Font Size: a A A

Research On Soccer Vedio Retrieval By Using Audio-Visual Features

Posted on:2009-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:R XuFull Text:PDF
GTID:2178360242480738Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the rapid development of Digital Video in recent years, such as VOD, Digital TV, Digital Library, Video Conference and Tele-education, it has been accepted and known well by more and more people. Dealing with streaming of vast video data, the conventional text-based video database retrieval can not be satisfied. Since 90's, a new research field called Content-Based Video Analysis and Retrieval (CBVAR) comes into existence. According to the support of analysis and comprehension from low level to high level for the video data, it gets the content of video data and does retrieval based on the content. It can do some efficient inquiries, indexing, browsing, searching and retrieval based on visual information, and this work is directly launched based on the content of video data.Sports video retrieval is an important branch of content-based video retrieval field, and is a challengeable direction. As the member of sports videos, soccer match video has a great number of audiences. But we found though a soccer video last a very long time, there are only a few spectacular shot which actually attracted the audiences. And these audiences can be grouped, considering their purposes of watching match. The focus and the request of retrieval are different as the group changed, and it is necessary to retrieve soccer video in semantic layer.There are two points in semantic recognition of soccer events:(1) Soccer video is well structured in some degree, however its structure is loose compared with news video. One example is that the highlights (such as goal, pause, etc) only happened occasionally, and the time when they will happen can't be predicted. Furthermore scenes related to these highlights will change more according to changing of the match. Because of these reasons, it is not appropriate to predicted highlights in soccer video with rule-based methods.(2) Existent semantic recognition methods mostly based on player/ball tracking technology. But the complexity of light conditions and player interaction make this type of methods inappropriate.On the other hand, soccer video is made up through some regular pattern. It means that highlights in soccer video happened along with some typical elements (such as typical shot transition, typical sound, etc). Therefore these highlights can be recognized along with the recognition of these typical elements.This thesis firstly focuses on the structural analysis of soccer video. Partitioning a soccer video sequence into shots or detecting shot boundary is one of the key techniques in video indexing. Shot change means scene content change in a video sequence. Soccer Video shot boundary detection is under global concern as the first step of all kinds of processing. A huge number of various methods have been proposed at previous years. Commonly used methods contain algorithms based on comparison of histogram differences, pixel differences, MB and so on. In this thesis, we use a dual-threshold segmentation method to detect shot boundary of soccer video. Our test shows that this algorithm has better detecting outcome than conventional algorithm based on comparison of histogram.Abstraction of key-frame is very important to the highlights retrieval of soccer video. For soccer video, one key-frame is not enough to express the content of a shot. Thus we classify the shots into two kinds, one is Field-see shot, and another is Field-not-see shot. For the Field-not-see shot, because of its short lasting time and lack of highlights, we directly get the first frame and last frame as the key-frame; as for the Field-See shot, because of its small difference between frames but most highlights, we get five frames to be the key-frame based on the same interval.According to the analysis of soccer video goal event, we find that there is a time-relationship between shots. In the course of goal event, there are three kinds of shots: the shoot for the ball moving towards goal, the celebration of players, and slow motion shot of the goal event. These shots are time-continuous in the soccer video, so we can make it as visual characteristics for goal event retrieval. There are also some audio characteristics for goal event, such as cheers and violent commentary voice. These two audio characteristics can be used to assist the detection of soccer goal event.According to the above rules, we propose an algorithm of soccer goal event shot detection by using joint multimedia feature. For the video analysis, we mainly do threes things as follow: (1) detect the goal in the key-frames to find potential goal shot; (2) abstract the key-frames for the two shot after the potential goal shot to detect the player celebrating shot; (3) detect the slow motion shot by using algorithm based on difference of histogram between frames. For the audio analysis, we classify the audio information, by using SVM, to four kinds, cheers, silence, common commentary voice and violent commentary voice, among of which cheers and violent commentary voice are used to assist the detection of soccer goal event.
Keywords/Search Tags:soccer video, video retrieval, shot boundary detection, audio classification
PDF Full Text Request
Related items