Font Size: a A A

Computer Vision-Based Detection,Classification And Tracking Of License Plates And Vehicles

Posted on:2022-01-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z B XuFull Text:PDF
GTID:1482306323963779Subject:Information security
Abstract/Summary:PDF Full Text Request
Vehicle understanding is a key part of intelligent transportation systems and has a wide range of applications in real-world surveillance systems,autonomous driving scenarios,and roadside parking scenes.In recent years,vehicle analysis has become increasingly important due to the rapid development of computer vision and the rise of new infrastructures such as smart cities.In the traditional roadside parking scenario,fast and accurate license plate detection and recognition is crucial for efficient parking fee collection.In recent years,roadside parking fees are gradually developing in the direction of unmanned,i.e.relying heavily on cameras to record vehicle license plate numbers as well as drive-in and drive-out.In an unattended roadside parking scenario,the camera needs to accurately record which parking spot the vehicle has driven into or out of.Accurate determination of the parking spot requires an algorithm that can accu-rately locate the vehicle in 3D.In addition,because the license plate is often obscured by surrounding vehicles in the parking lot scenario,the license plate number of the ve-hicle needs to be determined before driving in,and the vehicle needs to be continuously located and tracked.In this paper,we analyze vehicles from three different research directions:(i)license plate detection and recognition;(ii)binocular 3D detection;and(iii)multi-target tracking and segmentation.In the license plate detection and recogni-tion task,we focus on the very challenging roadside parking scene.For the binocular 3D detection task and the multi-target tracking and segmentation task,we also focus on the autonomous driving scenario since the more popular methods in academia are compared on the autonomous driving dataset.In this dissertation,we will introduce three different aspects of vehicle perception,and try our best to let readers have a more comprehensive understanding of the methods of vehicle perception.The three aspects are as follows.1.License Plate Detection and Recognition.License plate detection and recog-nition(LPDR)is a relatively traditional research topic in computer vision.Most of the previous researches are based on traditional methods to model and analyze license plates.Moreover,the poses of the camera and license plate in most scenes are relatively fixed.Compared with most scenes,we observe that LPDR is very challenging in the scene of roadside parking.Moreover,we present the largest,the most diverse,and the most challenging publicly available dataset for LPDR.We conclude that there are five important factors affecting the performance of LPDR:(?)severe deformation;(?)un-even illumination;(?)low contrast;(?)obstruction;(?)blurriness.In order to solve these five problems once and for all,we study the differences between LPDR and more popular text detection methods and put forward a novel module named Controlled Spa-tial Transformer Network(CSTN)for noise removal and unsupervised shape correction.We show that CSTN can handle these five factors well and recover license plates that are originally in bad conditions well.Based on CSTN,the first fully differentiable archi-tecture for LPDR which can learn LP detection in a semi-supervised way is proposed.The resulting LPDR architecture processes images at more than two hundred frames per second and achieves state-of-the-art LPDR performance.2.Stereo Imagery-Based 3D Detection.3D detection is a relatively new research direction.In recent years,the accuracy of LiDAR-based 3D detection has saturated,but camera-based solutions still need to be improved.Current stereo methods lag far be-hind LiDAR-based methods in detecting faraway vehicles.Recent pseudo lidar methods bridge this gap by converting 2D pixels to 3D point cloud and adopting 3D detectors to detect vehicles in the forged 3D point cloud.Though great performance gains have been achieved,the performance of pseudo lidar methods for faraway vehicles is still unsatisfying.Inspired by that human's vision system can accurately locate objects in 3D spaces via attentively watching and analyzing each object instance,we propose to analyze more fine-grained details of vehicles and estimate the 3D locations of vehicles in the instance level rather than in the image level.The resulting stereo imagery-based framework named ZoomNet mimics the human vision system to adaptively zoom in/out instances to analyze the 3d pose of vehicles in parallel.For each instance,the pixel-level disparity,the foreground segmentation,and the per-pixel part location are learned in an end-to-end manner.Moreover,to fully exploit the provided 3D bounding box labels,we introduce KITTI fine-grained car(KFG)dataset to fill the vacancy of fine-grained annotations and to enable the training of ZoomNet.Evaluations on the popular KITTI dataset show that ZoomNet outperforms all existing stereo imagery-based methods by large margins.3.Multi-Object Tracking and Segmentation.Multi-object tracking and seg-mentation(MOTS)is an emerging direction that has been proposed very recently.It re-quires pixel-level localization of each instance in the video and tracking of the instances between frames.Current approaches for multi-object tracking and segmentation first detect instances and then adopt 2D or 3D convolutions to extract instance embeddings for instance association.However,due to the large receptive field of deep convolutional neural networks,the foreground areas of the current instance and the surrounding areas containing the nearby instances or environments are usually mixed up in the learned in-stance embeddings,resulting in ambiguities in tracking.To learn more discriminative instance embeddings,we convert the compact image representation to un-ordered 2D point cloud representation.In this way,the non-overlapping nature of instance segments can be fully exploited by strictly separating the foreground point cloud and the back-ground point cloud.Moreover,multiple informative data modalities are formulated as point-wise representations to enrich point-wise features.For each instance,the embed-ding is learned on the foreground 2D point cloud,the environment 2D point cloud,and the smallest circumscribed bounding box.Then,similarities between instance embed-dings are measured for the inter-frame association.In addition,to enable the practical utility of MOTS,we modify the one-stage instance segmentation method SpatialEmbed-ding for instance segmentation.The resulting efficient and effective framework,named PointTrackV2,significantly outperforms the best current methods(including 3D track-ing methods).Moreover,it runs at a near real-time speed(20 FPS when evaluated on a single 2080Ti).Furthermore,as crowded scenes for cars are insufficient in current MOTS datasets,we provide a more challenging dataset named APOLLO MOTS with much higher instance density.Based on the research and exploration in these three directions,we further summa-rize three maj or trends in vehicle understanding and three major points.The three major trends are:(?)vehicle localization from local to global;(?)vehicle analysis from global to local;(?)vehicle analysis from single modal to multi-modal.The three major points are:(?)fine-grained analysis;(?)problem decomposition;and(?)dimensional in-variance.Overall,methods proposed in this dissertation not only provide novel insights for vehicle perception in three aspects but also achieve state-of-the-art performances on popular datasets.In addition to methods,we also contribute additional datasets to pro-mote future researches.Besides,we also open-source our codes to make the community better understand our works.
Keywords/Search Tags:Multi-Object Tracking and Segmentation, 3D Detection, License Plate Detection and Recognition, Roadside Parking, Vehicle Analysis
PDF Full Text Request
Related items