| With the rapid development of deep learning technology,intelligent orchard management has become a hotspot in the research of agricultural modernization.By implementing intelligent management of orchards,not only can the quality of fruits be improved and the income increased,but also a large amount of labor costs can be saved,meeting the demand of consumers for high-quality fruits.Real-time recognition of fruits on the tree is one of the key technologies for achieving orchard operation intelligence,which is of great significance for intelligent yield estimation in orchards.Fruits in orchards are affected by various factors such as light,weather,and occlusion.In addition,the overlap and partial occlusion of fruits themselves pose challenges to fruit recognition and yield estimation.This paper focuses on the real-time identification and tracking of tree-based fruits in the orchard environment,aiming to improve the robustness and practicality of the algorithm when facing practical problems such as small targets,severe occlusion,and significant lighting effects,and to provide better technical support for intelligent yield estimation in agricultural production.The main contributions of this paper are as follows:1.This article proposes a novel object detection algorithm,called Swin-TransformerYOLO,based on Swin Transformer and YOLOv5 models to address the problem of identifying and counting tree-based fruits in orchard environments.The algorithm improves on practical issues such as severe fruit occlusion and lighting interference by incorporating several key measures.Firstly,a novel Swin-Transformer-YOLO combination network architecture is proposed by leveraging the characteristics of Transformer to effectively detect tree-based fruits.Secondly,a bidirectional weighted feature pyramid network(Bi FPN)is used for feature fusion,which quickly and effectively combines different-scale features by utilizing context information from low-level and high-level features.Thirdly,a coordinate attention module is added before each layer’s feature output layer to enhance fruit localization and recognition.Finally,Varifocal Loss is employed to calculate loss and convergence for difficult samples and optimize the classification loss function to detect difficult samples in complex backgrounds.Experimental results on self-built image datasets containing various growth stages,lighting conditions,and weather conditions for grapes,citrus fruits,and apples demonstrate that the proposed SwinTransformer-YOLO algorithm achieves recognition accuracy(AP50)ranging from 90.47% to 98.4%,with the majority of cases surpassing 95%.Furthermore,the detection speed for a single image is 14.2 milliseconds,exhibiting excellent accuracy and processing speed.2.This article proposes a tree fruit counting method based on the ByteTrack object tracking algorithm to address the shortcomings of single-image counting methods.Before processing the video sequence with the algorithm,a de-jittering algorithm is used to stabilize the video sequence to improve the model’s recognition and tracking performance.For the position distribution of grape fruits in the video sequence,a specific region tracking and counting scheme is proposed,which effectively reduces the computation of the algorithm by tracking and counting fruits within a specific counting area.During the detection and tracking stage,the Swin-Transformer-YOLO algorithm is used to recognize fruits in each frame of the video sequence,and the coordinate motion information of each fruit in each frame image is recorded and outputted to the ByteTrack algorithm for data association.By effectively utilizing high and low confidence detection boxes through Byte data association,the algorithm can effectively detect occluded fruits and reduce the problem of duplicate counting caused by occlusion.Experiments conducted on a grape video sequence demonstrate the tracking and counting performance of the ByteTrack algorithm,with an average counting error of 9.2%.Furthermore,experiments conducted on a mature Merlot tree demonstrate the practical application value of the proposed method,which outperforms other tracking algorithms. |