| With the development of deep learning,researchers in the field of robotics are paying more and more attention to robotics based on deep learning.Precise and agile robotic arms have been widely used in the assembly industry for decades,but it is still a challenging topic for the adaptation of robots to domestic use.This task can be made easier if the vision inputs are provided and well utilized by the robots,and robotic arms that contain only vision sensors can also accomplish tasks well such as grabbing.The general process of robotic arm grasping based on pure vision is as follows:(a)Calibrate the camera to obtain the camera internal parameters such as focal length and camera external parameters such as the transformation between the camera coordinate system and the world coordinate system;(b)Obtain an image input for target object detection,and get the pixel coordinates of the target object,then get the three-dimensional coordinates of the object;(c)Estimate the pose of the robot arm,get the angles of the joints of the robot arm,and control the robot arm for grasping.In this thesis,the application of deep learning in the field of grasping robotic arms based on pure vision is discussed,with the goal of equipping cheap robot systems with computer vision algorithms.The research focuses on three aspects: target detection based on deep convolutional networks,pose estimation of robotic arms based on deep keypoints detection networks,and network compression.There are many types of objects in real life.When new objects appear,the original model needs to be retrained.QRCode has the advantages of large information storage and strong robustness.We can easily paste the QRCode to the surface of the object,and the stored content in the QRCode can also be updated in real time.Therefore,in this thesis,QRCode is used as the target object,and the QRCode detection algorithm based on deep neural network is studied.The vision-based pose estimation method of the robotic arm is also studied to estimate the joint angles of the OWI-535 robotic arm and the pose between the camera and the robotic arm in real time.The robotic arm relies on a common steering gear for transmission and is completely dependent on visual input without any other sensors.Aiming at the shortcomings of the existing neural network models that are not easy to run on the CPU system in real time,the related experiments of model compression are studied and experimented.The specific research contents are as follows:(1)The QRCode detection methods are investigated,and the deep neural network-based detection methods are mainly studied,the one-stage target detection network YOLOv3 is selected as the basic model.A large-scale target detection dataset of QRCode images is constructed for model training,and an automatic labeling algorithm based on the ZBar(bar code detection library)is developed.This dataset contains approximately 20,000 QRCode images.(2)Different model compression methods are studied,and YOLOv3 is improved from two directions.First,the lightweight design of the YOLOv3 network structure is carried out,and both the backbone network and the detection network are lightweighted.A lightweight network with a model size of 2.4 Mb is obtained,and the model m AP value is 90.1%.The YOLOv3 network model is also pruned and compressed.During the pre-training phase,a YOLOv3 model with a model size of 220 Mb and the m AP value of 98.1% is obtained.Then,the network is thinned and channel pruning and layer pruning are performed to obtain a lightweight YOLOv3 model with a size of 2.2Mb and the m AP value of 96.8%.While compressing the model size by 100 times,the accuracy is also effectively maintained.(3)The pose estimation method of the robotic arm is studied and the three-dimensional coordinates of the object are obtained using a binocular camera.First,the keypoints detection network Simple Baseline is lightened and improved,and the model is trained using virtual synthetic data and real data.And through this network,the pixel coordinates of the 17 key points predefined on the OWI-535 robot arm are obtained in the image.Then based on the prior information such as the size of the robotic arm,the three-dimensional reconstruction is performed by minimizing the reprojection error,and the four joint angles of the robotic arm and the rotation and translation vectors between the robotic base and the camera are obtained.After obtaining the three-dimensional coordinates of the target object and the joint angles of the robotic arm,the robotic arm can be controlled for grasping. |