| Weight initialization refers to giving suitable initial values to the convolutional neural networks before training to optimize the convergence speed of the model during the training process and its eventual generalization ability.The training goal of the convolutional neural networks is to find the weight-optimal solution.Therefore,a suitable weight initialization method is the basis for faster convergence and better generalization performance of convolutional neural networks.The existing random initialization methods increase the burden of the model training process.The target task and the model structure are specifically required in the transfer learning initialization approach,limiting their application range.In this thesis,starting from the statistical analysis of the weights of the pre-trained model,obtain the distribution law by analyzing the weight distribution,and apply the distribution law to the initialization of the model,thus,weakening the blindness of the random initialization and solving the dependence of the migration learning initialization on the model structure.The main work and innovations of this thesis are as follows:(1)In response to the current situation that random initialization methods lack the guidance of distribution laws,the laws of prior information in the pre-trained model are investigated and new initialization methods are developed accordingly.Since the existing random initialization methods are based on normal and uniform distributions,this thesis fits and tests the distribution characteristics of weights.The experimental results suggested that weights conformed to the symmetric power-law distribution by the normality test based on the Jarque-Bera statistic and the power-law distribution test of linear fit under a double logarithm.By analyzing the change of the power exponent,it is determined that the power exponent of each convolutional layer decreases gradually with the deepening of the number of layers.Based on the obtained law,the probability distribution function in the initialization method is established as an asymmetric power-law distribution function to establish the mathematical model for weight initialization.A weight initialization method based on the law of convolutional layers is developed by combining the change law of power exponents of convolutional layers.(2)Based on the above study,the spatial distribution pattern of the convolutional kernel weights is derived by studying the weight distribution at each position of the convolutional kernel,and the new initialization method is developed by this.By studying the weights of different channels,the same width and height coordinates within the convolution kernel during the convolution process,it is found that the power exponent of the weights of the pre-trained model is related to the position of the convolution kernel from the center to the periphery.A class division method is proposed for the location of the convolution kernel according to the law of decreasing step size.Based on the created symmetric power-law distribution model,a data sampling algorithm combined with the change law of the power exponent of the convolution kernel is proposed,and a weight initialization method based on the position law of the convolution kernel is designed.(3)Set up comparative experiments to test the effectiveness and applicability of the two initialization methods.The commonly used He initialization method is selected as the experimental comparison object,and several network models are selected for comparison experiments on different datasets.Experiments verify that the two weight initialization methods based on power-law distribution improve the accuracy during the first round of the training in the RGB image classification,as well as the optimization on the final accuracy of the model.The conclusions can be drawn that the initialization method based on the convolution kernel is more effective than convolutional layer-based initialization methods. |