| This dissertation is concerned with the problem of monocular visual tracking, the task of estimating the position of arbitrary targets as they move across the frames of a temporal image sequence. The research described in this dissertation advances visual tracking in two significant ways, both of which serve to unify various aspects of the field. First, a common theoretical framework is derived that connects a range of visual trackers in the literature that were previously viewed as disparate. Moreover, this uniform framework permits systematic evaluation of visual trackers that retain varying amounts of spatial organization information regarding the target. Previous research has seen the investigation of trackers that incorporate differing amounts target spatial arrangement information; however, an empirical evaluation that systematically varies this parameter in the realm of visual tracking has not been considered previously. The second manner in which this dissertation unifies visual trackers is through a novel feature representation. The proposed features, termed spatiotemporal oriented energies, capture both spatial appearance and dynamics (e.g., velocity) in a uniform fashion. The integration of appearance and dynamics yields a compact, highly discriminative feature set with robustness to variable illumination. Previous approaches in tracking have attempted to incorporate target dynamics through prediction mechanisms (e.g., filtering) or by combining spatial and motion-based cues that are derived independently. Notably, this dissertation introduces the first representation for visual tracking that uniformly encompasses appearance and dynamics. These features are applicable to a range of trackers, outperform alternative common representations, and can lead to state-of-the-art tracking accuracy in empirical evaluation. |