| Drunk driving is one of the major challenges to road safety around the world.Drunk driving behavior may not only lead to a threat to the driver’s own safety,but also pose a potential risk to other road users.Traditional detection methods are mainly based on breath alcohol detection technology and blood alcohol testing technology,which are susceptible to time,location,and manpower in the actual use process.In-vehicle DUI testing is susceptible to cluttered air and other factors that affect the accuracy of detection.In order to address the above-mentioned problems of drunk driving detection,improve the efficiency and accuracy of drunk driving detection,and protect road safety,this study proposes a kind of drunk driving detection based on the deep fusion of infrared thermal image and electronic nose multimodal signals.In this study,we started from data acquisition,set up a data acquisition environment,and used infrared thermography with electronic nose to collect multimodal physical signs data of 15 drivers before and after alcohol consumption.The drunk driving detection method in this study also uses electronic nose odor data as another judgment basis for drunk driving.Since there are differences in structure and signal physical properties between the image data of infrared imaging and the multiplexed one-dimensional data of the electronic nose so far,this study will first synchronize the electronic nose signal based on the infrared image frame frequency.Then the infrared thermal image signal and the electronic nose signal are fused with depth feature encoding fusion using Multi-Axis Vision Transformer(Max-Vi T)and Multiple Perception Layer(MLP),respectively,to construct a drunk driving detection model.In this study,starting from data acquisition,the physical signs data before and after drinking were collected from 15 volunteers using infrared thermography with electronic nose.Through pre-processing and evaluation analysis of these multimodal sign data,drunk driving detection data containing 2,478,455 frames of thermal infrared images temporally synchronized with electronic nose signals were produced.This dataset is characterized by high information correlation and cross-unification of data features,which is not only a reference value for the field of drunk driving detection,but also has important implications for multimodal data fusion.In order to achieve efficient drunk driving status classification recognition,this study extends the Max-Vi T with multimodal input to realize simultaneous input,encoding and classification of image data and electronic nose multiplexed 1D signal numerical data.Subsequently,the results are evaluated and analyzed.Both in the results of the test in the simulated environment and in the experiments conducted in the cab,the detector has an accuracy of more than 80%,achieving accurate prediction and recognition of drunk driving behavior.Also in order to compare the strengths and weaknesses of the recognition effect,this,study also uses other models to compare its ablation experiments.Through data scale test and timing test experiments,the designed method can accurately and effectively discriminate various types of alcohol,such as white wine,red wine,beer,and various characteristics of non-alcoholic liquid sparkling water,effectively reducing the phenomenon of false alarm,thus improving the accuracy and reliability of detection.Therefore,the drunk driving detection method based on the deep fusion of infrared thermal imaging and electronic nose multimodal signals proposed in this study is effective and achievable.It is a more accurate and efficient means of drunk driving detection,which is expected to provide new ideas for the future development of drunk driving detection and contribute to the road safety management in a small way. |