Font Size: a A A

Research And Implementation Of Person Recognition Method Based On Video Data

Posted on:2022-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:W Z WangFull Text:PDF
GTID:2518306338968229Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Video data has dominated the Internet traffic these days.Compared with static images,video data can provide richer temporal and multi-modal information.Video semantic analysis and content understanding have urgent needs in practical applications,and they have gradually become a research hotspot in the field of computer applications.Human beings are not only the important entities in videos but also the core of social development.As a key issue in the field of multimedia content understanding,person recognition in videos plays an important role in further social relationship mining,knowledge graph construction,behavior and sentiment analysis,and has great social and commercial value in public security supervision,social management,information retrieval,and entertainment ecology.Previous studies mainly focus on this problem on still images while they cannot handle the temporal and multi-model information in videos.The main research and development work can be summarized as follows:1)This paper proposes a Multi-Cue and Temporal Attention(MCTA)model to recognize persons in videos.For the multi-cue information,it extracts features from multiple visual cue regions and utilizes a Muti-Cue Attention Module to integrate them.For the temporal information,it adopts a Temporal Attention Module to learn the quality of different frames adaptively.In particular,this paper constructs a novel and challenging video dataset named Character Recognition in Videos(CRV)for this task.Comparative experiments and ablation studies demonstrate the effectiveness and advancement of the MCTA model.2)This paper proposes a Frame Aggregation and Multi-Modal Fusion(FAMF)model for video-based person recognition.For frame features,it proposes a Vector of Locally Aggregated Descriptors with Attention(AttentionVLAD)algorithm to learn the residual distribution of local features and measure the quality of different frames simultaneously.For multi-model features,an improved multi-modal fusion module is adopted to jointly optimize the features of different modalities.Comparative experiments and ablation studies demonstrate the effectiveness and advancement of the FAMF model.3)This paper develops and builds a video person detection and recognition system,which realizes the functions of data management,video processing,video person detection and recognition,and visualization of detection and recognition results.It also integrates multiple algorithm components for multi-dimensional video content understanding and analysis,and provides a friendly interactive operation interface to promote the application and development of video content understanding algorithms.
Keywords/Search Tags:video content understanding, person recognition, multiple cues, multi-modal, attention mechanism
PDF Full Text Request
Related items