Self-restraint Deep Learning For Real Photo Recognition Based On 3D Models | | Posted on:2018-03-01 | Degree:Master | Type:Thesis | | Country:China | Candidate:Y D Wang | Full Text:PDF | | GTID:2348330518496948 | Subject:Information and Communication Engineering | | Abstract/Summary: | PDF Full Text Request | | CNN has shown excellent performance on object recognition based on huge amount of real images. For training with synthetic data rendered from 3 D models alone to reduce the workload of collecting real images. Effective uti-lization on texture-less 3D models for deep learning is significant to recognition on real photos. We eliminate the reliance on massive real training data by mod-ifying convolutional neural network in 4 aspects: synthetic data rendering for training data generation in large quantities, multi-triplet cost function modifica-tion for multi-task learning joined with a foreground object reconstruction net-work, compact micro architecture design for producing tiny parametric model while overcoming over-fit problem in texture-less models and modification of conditional variatioanl AE for foreground object reconstruction. Network is initiated with multi-triplet cost function establishing sphere-like distribution of descriptors in each category which is helpful for recognition on regular pho-tos according to pose, lighting condition, background and category information of rendered images. Fine-tuning with additional data further meets the aim of classification on special real photos based on initial model. Then we propose a concatenated self-restraint learning structure lead by a triplet and softmax jointed loss function for object recognition. Locally connected auto encoder trained from rendered images with and without background used for object re-construction against environment variables produces an additional channel au-tomatically concatenated to RGB channels as input of classification network.This structure makes it possible training a softmax classifier directly from CNN based on synthetic data with our rendering strategy. Our structure halves the gap between training based on real photos and 3D model in both PASCAL and ImageNet database compared to GoogleNet. We propose a 6.2 MB compact parametric model called ZigzagNet based on SqueezeNet to improve the per-formance for recognition by applying moving normalization inside micro ar-chitecture and adding channel wise convolutional bypass through macro archi-tecture. Moving batch normalization is used to get a good performance on both convergence speed and recognition accuracy. Accuracy of our compact para-metric model in experiment on ImageNet and PASCAL samples provided by PASCAL3D+ based on simple Nearest Neighbor classifier is close to the result of 240 MB AlexNet trained with real images. Model trained on texture-less models which consumes less time for rendering and collecting outperforms the result of training with more textured models from ShapeNet. We further ex-ploit the semantic information in 3D models for real object recognition with the help of Bayesian latent rendering of additional channels. This structure makes it possible training a softmax classifier directly from CNN also based on synthetic data with our rendering strategy. So at last, We utilize synthetic data rendering for semantic foreground object reconstruction by Bayesian latent rendering. | | Keywords/Search Tags: | CNN, AE, synthetic images, triplet loss, CVAE | PDF Full Text Request | Related items |
| |
|