| Machine vision technology is rapidly developing and becoming an important technology in the field of intelligent manufacturing.Intelligent assembly robots equipped with machine vision systems can achieve autonomous identification,high-precision positioning and automatic assembly of parts.Obtaining a high-precision 3D model of the assembled part through the positioning link is a key step for the robot to achieve accurate gripping.Vision 3D reconstruction techniques based on monocular depth estimation are gradually being widely used in localisation tasks for their low cost and high efficiency,and have received widespread attention from academia as well as industry for their difficulty,high research value and promising future.Among them,unsupervised depth estimation methods can avoid laborious and expensive annotations and have become a research hotspot in the field of depth estimation.However,the above methods still have some problems in the 3D reconstruction scenario of electronic components.This paper summarises three of the key problems: 1)lack of global perceptual field in the depth estimation model;2)distorted prediction of the depth estimation model in the pathological region;3)high model complexity and slow inference speed.For the above problems,this paper proposes a series of effective improvement schemes and finally the corresponding algorithms and models.The main research contents and innovative work in this paper are as follows:(1)a monocular depth estimation model Huge Depth based on unsupervised learning is proposed.firstly,a large kernel convolution module is proposed to obtain very large effective receptive fields,and supplemented with small convolution kernels to fuse features at different receptive field scales;then,a global self-attention mechanism is introduced to model the global context of the information after feature extraction to enhance feature detail representation,and combined with code-decode The lack of global receptive field in the model is effectively improved by combining the high and low feature layer fusion modules of the code-decode infrastructure to fuse feature maps of different size scales.Then,an adaptive loss function model based on the prediction mask is proposed to adaptively adjust the loss calculation of different regions to mitigate the negative impact of the pathological regions of electronic component images on the network convergence,thus improving the pathological region prediction distortion problem.(2)A method for inference acceleration of the model based on structural reparameterisation is investigated.Considering the high demand for 3D reconstruction speed in electronic component assembly scenes,this paper improves and optimises the above depth estimation model based on the structural reparameterisation method,and designs a reparameterisable base convolution module to replace some original convolutions to improve the accuracy without cost.At the same time,the time-consuming structure of the model is simplified using the structural reparameterization method to further accelerate the inference of the model.After a series of experiments,the proposed method achieves a maximum of 59.36% model inference speedup without loss of inference accuracy.(3)Design and system validation of the 3D reconstruction system for electronic components.Firstly,the system design requirements are clarified for the electronic component assembly scenario,and then the overall scheme design is made for the system.Then,the hardware system and software system were analysed,selected and built respectively,and the design and construction of the experimental platform,server,algorithm running environment and user interface were completed.Finally,the system was validated and analysed based on the 3D reconstruction accuracy evaluation method.The results show that the 3D reconstruction system constructed in this paper can meet the practical use requirements.In summary,this paper analyzes and discusses the 3D reconstruction method based on the intelligent assembly scenario of electronic components,conducts an in-depth study on the important area of visual 3D reconstruction-monocular depth estimation,proposes an unsupervised learning depth estimation model with global perceptual field,and proposes an adaptive loss function for the model to The model is trained and the obtained model has a better performance in local pathological regions.At the same time,the model is subjected to costless accuracy enhancement and lossless inference acceleration based on a structural reparameterisation approach.Finally,a 3D reconstruction system for electronic components is designed and completed for experimental validation according to the needs of real scenarios.The experimental results show that a visual 3D reconstruction method based on monocular depth estimation of electronic components proposed in this paper can meet the practical requirements of the scene;the performance of the proposed unsupervised depth estimation model meets and exceeds that of mainstream models of the same type,and the research work has certain scientific significance and engineering application value. |