Font Size: a A A

Research On Key Technologies Of Semantic Understanding Of Traffic Scene Based On Deep Representation Learning

Posted on:2020-03-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:S T DingFull Text:PDF
GTID:1482306740472874Subject:Traffic Information Engineering & Control
Abstract/Summary:PDF Full Text Request
In recent years,the transportation industry has developed rapidly,the multifarious application of computer vision-based traffic monitoring system has become increasingly prevalent,and the corresponding big data processing,analysis and utilization have become the focus of researchers.As an emerging cutting-edge technology,machine vision-based image semantic understanding plays a leading role in traffic image processing systems,which provides analysis and inference for high-level visual tasks such as action recognition and event search by collecting information provided by underlying visual processing techniques such as object detection and tracking.In the driver assistance system,the semantic understanding of traffic scene based on deep representation learning takes all traffic participants as objects and the understanding of content as the core,studying image content representation.The key is to parse traffic scenes and provide behavior analysis of vehicles and pedestrians and scene content description for traffic participants.Therefore,it has critical research value.The essence of semantic understanding of traffic scenes is to study the conversion of video information to text description information.The technology semantically describes the vehicle characteristics,driving state,pedestrian behavior and road environment in the current traffic scene based on the detected traffic monitoring information.In order to handle problems encountered in the process of semantic understanding and description of traffic scenes,the paper studies the practical application of intelligent monitoring systems by studying a large number of domestic and foreign related literatures and the key technologies on machine vision and natural language processing.The paper is aimed at improving the accuracy and robustness of the image semantic understanding algorithm in traffic scenes,thereby making it meet the actual needs of intelligent traffic monitoring systems.Major researches carried out are as follows:1.This paper analyzes the target detection method based on spatio-temporal interest points,and proposes a human behavior recognition algorithm based on improved spatio-temporal interest point detection.Pedestrian and vehicle detection in complex traffic scenes has been affected by factors such as object occlusion,background,viewing angle and light changes,which has been a challenging subject in the field of image processing.In order to solve the problems and deficiencies in the algorithm,this paper introduces multi-scale information into the detection of interest points.By applying local space surround suppression,time constraint and scale adaptation,the background noise generation is reduced and the accuracy of target detection is improved.In this paper,the human behavior recognition experiment is used to verify and improve the robustness of the detector,while suppressing the generation of background noise and improving the performance of the model.2.The problems of scale change,background and object occlusion are common in the process of traffic object detection and recognition,and the object detection algorithm based on deep learning is time-consuming when performing region selection.In this paper,an algorithm for detecting the region of interest of traffic targets based on improved spatio-temporal interest points is proposed.By applying spatio-temporal interest point optimization,multi-objective dynamic clustering and region of interest construction,the robustness of the algorithm to cope with complex traffic scenarios is improved.In addition,as the model only calculates the features in the region of interest,it also increases the computational speed of the model,so that it can meet the real-time requirement of traffic detection.3.Research on image captioning algorithm based on deep learning is carried out.An image semantic description method based on human visual attention mechanism is proposed to handle fuzzy subject selection,redundant sentence interference and low reduction of the real scene encountered by the image semantic understanding model when generating the image description sentence.The algorithm provides guidance for the semantic description to produce accurate and human language habits-like by filtering complex scenes that containing multiple targets.The stimulus-driven attention mechanism stems from the uniqueness,unpredictability,and singularity of vision.The algorithm first selects the attention condition for a specific region in the image,and then performs attention resource allocation and image feature coding according to the region selection result.Finally,the image feature with the weight distribution are input into the language model to decode and generate an image description.4.The framework of visual question answering system based on machine vision and natural language processing is studied.As the model structure of the question answering system is relatively simple,the model and humans have attention bias and lack of relationship reasoning ability when answering questions,this paper proposes an image question answering system model based on multi-objective relationship detection.Firstly,the object detection model and the target relationship judgment model are pre-trained to obtain the relationship between the object appearance relationship and the relationship predicate between the objects.Then the words contained in the question are used to direct attention to the corresponding area in the image.Finally,the image appearance relationship feature is transformed into a unified vector space together with the text problem,and the corresponding answer is generated by the word vector similarity principle.The experimental results show that the method can effectively enhance the correlation between image features and text problems,and achieve satisfactory results in the validation dataset.5.The long video semantic description algorithm is studied to handle the large amount of calculation and inaccurate event localization in the long video in the content analysis and event search.This chapter proposes a new long video semantic understanding algorithm.By applying redundant video frame detection,long video superframe segmentation,key frame selection and other methods,the long video is converted into an important text summary,which improves the accuracy of video semantic description.At the same time,the computational search time of the model is significantly reduced.In summary,this thesis conducts in-depth research and analysis on the object detection method based on machine vision and the text description method based on natural language processing,which is aimed at object deformation,similarity interference,occlusion,and light change encountered in the process of semantic understanding of traffic scenes.A series of solutions is proposed to handle the problem of change and complex object relationship.Based on the accurate detection of pedestrians,vehicles and other objects in the traffic scene,the content expression and objective interpretation of the traffic scene are further achieved.Finally,the main work of the article is sorted out,the full paper is concluded,future work is expected,and further research is advised.
Keywords/Search Tags:Transportation monitor and control, Driver assistance system, Deep learning, Image captioning, Video semantic description
PDF Full Text Request
Related items