With the continuous increasement of the non-motor vehicles,the number of non-motor vehicle related accidents has also been increased rapidly in China.The main reasons behind are twofold: first,riders lack of the traffic safety awareness,and second,they do not always obey the traffic rules.Using video structuring technology to analyze the attribute information of nonmotor vehicles and riders can provide technical possibilities to reduce non-motor vehicle related accidents.To this end,this thesis studies deep learning-based attribute recognition algorithms for non-motor vehicles and cyclists,including an attribute recognitiona method based on an improved Conv Ne Xt model,and a training method for a recognition model using a teacherstudent learning framework.(1)The attributes to be recognized of our system include both those related to large image areas and those only related to small image areas.So,in this thesis we propose a multi-label image recognition algorithm based on a modified Conv Ne Xt model,which developed an improved feature pyramid structure and a CBAM.The features of different scales and different receptive fields extracted from the feature pyramid module are then input into a suitable recognition branch that matches the receptive field size.And finally the attributes corresponding to images of different scales are recognized by different branch network.In addition,we also construct an image data set by collecting actual traffic video to verify the effectiveness of proposed algorithm.Experiments on the data set show that the F1 value of the proposed algorithm reaches 86.94%,which is about 3% higher than the original Conv Ne Xt model.(2)In practical applications,a recognition system should not only have better performance,but also take into account the computing power of the end device that the model will be deployed on.A method based on teacher-student learning framework for training non-motor vehicle and cyclist attribute recognition model is proposed.A larger model containing two feature extraction subnetworks is used as the teacher model.In the way of knowledge distillation,the output of the teacher model is then used as supervision information to assist the training of the student model.Finally,the student model can achieve the recognition performance close to the teacher model with much less network parameters and much less computation burden.Experiments on our data sets show that the F1 value of our student model reaches 87.20%,while the inference speed of the model is several times higher than that of the teacher model. |