| With the continuous advancement of science and technology,especially the continuous update of electron microscopes,it is no longer a fantasy to analyze the shape,activity and movement trajectories of cells.For instance,doctors can use professional instruments to detect the components in the blood,calculate the ratio of three types of cells in the blood,and infer the health status of human body.Using deep neural networks to detect cell videos can assist people in finding cells,particles and other substances in complex pictures,which replaces human eyes and mental labor,so as to free people from complicated manual labor for more meaningful research and innovation work.Aiming at the problem of cell detection in cell-monitoring videos,this paper proposed a traditional method for cell location detection.In view of the problem that the traditional method cannot distinguish two cells that are very close to each other,this paper improved the YOLOv5 algorithm,adding an attention mechanism and three channel feature fusion structure that realized automatic detection of cell position and adjacent cells identification.The main work and innovations of the paper are as follows:(1)In view of cell monitoring videos,the traditional method of computer vision was used,and the cell position detection model was built through KNN background removal,threshold binarization,noise removal,and connected components detection.According to easy division of cells after binarization,an NMS algorithm based on the area sorting of connected components was proposed,and cell detection algorithm was improved.(2)In view of the situation that the traditional cell location detection model could not distinguish adjacent cells under the traditional method,this paper used the YOLOv5 model to detect cell location,and on the basis of the YOLOv5 network,creatively combined background-removing video screenshots,background-block removing video screenshots and original images,proposing a three-channel feature fusion structure.The structure considered both the surface feature information of cells and the feature information of time dimension.Compared with using the YOLOv5 model directly on the original screenshot,the YOLOv5 s network with a three-channel feature fusion structure had an input size of 640,1024 and 2400,and AP50 increased by 1.4%,3.6% and 4.7% respectively to 99.1%,98.9% and 99.1%.When using the larger YOLOv5 m model,AP50 increased by 3.9%,3.1% and 3.8% to 99.3%,98.9%and 99.2%.(3)In order to enhance the fusion ability between different channels in the three-channel feature fusion structure and the ability to express and extract features,and to enhance the generalization ability of the model,this paper optimized the YOLOv5 s network added with the feature fusion structure,added an attention mechanism to its backbone network layer,and proposed three methods to increase the CBAM attention mechanism.On the cell image test set,a high accuracy and recall rate were achieved,which increased the effect of feature fusion.On the dual-channel fusion data set,the AP50 index increased by up to 0.7%.By using the feature fusion network and adding attention mechanism,AP50 had a maximum increase of 5.2%,from 94% to 99.2%. |