Font Size: a A A

Structured Pruning Of Visual Neural Networks Based On Sparse Feature Selection

Posted on:2024-04-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Y WangFull Text:PDF
GTID:1528307340473924Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Deep neural networks have become the mainstream models in the field of computer vision due to their powerful feature extraction capabilities.Neural network models that extract visual features from pixel-level inputs mainly include two architectures: convolutional neural networks(CNNs)and self-attention neural networks.However,these models have properties such as high parameter counts and large computational costs,limiting their use on terminal devices with scarce computing and storage resources.As a data-driven adaptive feature extraction model,neural networks exhibit weak controllability in the feature extraction process,leading to blindness.During the model learning process,to ensure the accuracy of inference,neural networks tend to maximize the integration of all information in the input,forming dense feature sets that include many low-energy,repetitive,and interference-redundant features.By judiciously selecting features and pruning the feature extraction channels corresponding to redundant features in the model,a neural network model with structural sparsity is constructed,achieving model compression.This greatly improves the operational efficiency and memory overhead of deep neural networks,bearing profound significance for the deployment of neural networks in applications.This paper starts from the perspective of sparse feature selection,addressing issues such as the low pruning efficiency caused by single-layer compression in existing structured pruning methods,the importance assessment of isolated observations,the neglect of video domain characteristics,weak scene adaptability in structural optimization,and the long recovery period of the pruning process framework.The paper also tackles problems like the decline in inference precision and the low customization level for architecture.Based on the reasons or phenomena of feature sparsity,different structured pruning methods for neural networks are proposed from the perspective of feature sparsity.The main research content and work innovation include three aspects.:1.In response to the issue of low pruning efficiency,an efficient pruning method based on the sparsity of feature energy and structural evolution is proposed.Existing methods,under the framework where pruning and training are separated processes,suffer from low sparsity of feature energy in pre-trained models,indicating low pruning feasibility.This leads to a rapid decline in model performance as pruned channels increase,relying on long-period recovery training to ensure inference accuracy,thus resulting in inefficient pruning.To address this,the paper proposes embedding feature energy sparsity into the model pre-training process.This transforms the selection of redundant channels into a sparse learning problem for obtaining sparse solutions for the current task from a dense model.This enhances the pruning feasibility of the model,reducing the reliance on recovery training.Additionally,the differences in model constraints for different application scenarios make it challenging for existing structural optimization methods based on manual rules or learnable guidance models to achieve convenient scene-adaptive adjustments.The paper further designs an adaptive model structural evolution method based on genetic algorithms.It uses potential compression ratios as the genetic code for sub-model structures,adaptively evolving suitable sub-model structures for different application scenarios under the comprehensive constraints of accuracy,parameter count,and computational load.Experiments demonstrate that the proposed method,under no recovery training conditions,maintains model accuracy and compression ratios comparable to existing methods,reducing 36% ~ 48% pruning time.2.In addressing the problem of the decline in inference performance after pruning,a channel clustering pruning method based on the criterion of feature diversification is proposed.Under the classical structured pruning pattern of existing methods,where each channel is redundantly evaluated independently,permanent removal of structures and a singular compression direction can lead to a decrease in the richness of information within the model,manifested at the feature level as a reduction in the diversity of output features..This limitation hampers the model’s recovery and learning capabilities,restricting the inference performance of pruned models.To mitigate this issue,the paper introduces the criterion of feature diversification,aiming to achieve sparse feature patterns while improving redundancy evaluation,structural pruning,and compression directions.The proposed method considers the correlation between channels in the model,utilizing hierarchical clustering to merge channels with repetitive functions.This approach ensures that the model maintains diverse feature expression capabilities while compressing the model.In terms of structural pruning,the clustering process is embedded into the model training,involving iterative rounds of selection and relearning to provide the model with opportunities to dynamically change its structure and learn more features.For compression directions,an integrable auxiliary component is introduced to maintain the smooth flow of information during training compression.This helps remove weak layers that may form during width compression,achieving bidirectional stereoscopic compression of width and depth and avoiding restrictions on the model’s feature capacity due to the presence of weak layers.Compared to existing work,the proposed clustering pruning method effectively reduces the computational load of the majority of models by 50% while maintaining the highest predictive performance.3.In response to the problem of low customization level for architecture,a pruning method based on frequency domain feature sparsity is proposed.Existing pruning methods,when transitioning from pruning convolutional neural networks to self-attention neural networks,overlook the differences in frequency domain characteristics between the two architectures,resulting in low compression ratio in self-attention neural networks.Self-attention neural networks use a global self-attention mechanism to extract spatially correlated information,enhancing semantic understanding of low-frequency signals but weakening their ability to process high-frequency signals.As a result,these networks are more sensitive to interference features with high-frequency noise.To address this,the paper customizes pruning for self-attention neural networks from the perspective of frequency domain feature sparsity.Leveraging the strong low-frequency information processing capability of self-attention neural networks,a channel pruning method based on low-frequency sensitivity is proposed.This involves low-pass filtering of the input to achieve frequency domain feature sparsity.By measuring the correlation between each channel’s output features and the model output in the absence of high-frequency noise interference,more accurate redundancy localization is achieved.The paper also designs low-frequency-guided token fusion,which involves filtering and merging tokens based on their low-frequency information content.This facilitates the sparsity of token-level features.Through an iterative compression process from both the channel and token perspectives,the proposed method significantly improves the compression efficiency of self-attention neural networks.Extensive experiments demonstrate that the customized pruning method can achieve a maximum computational compression ratio improvement from 33.0% to 57.6% for self-attention architecture models,while maintaining optimal inference performance.
Keywords/Search Tags:neural network, model compression, sparse feature selection, structured pruning
PDF Full Text Request
Related items