| The 3D model plays a vital role in many industries,disciplines,various programs and software.For example,in the entertainment industry(animation,games,movie visual effects,etc.),people are increasingly pursuing immersion,so there is an urgent need of large-scale 3D contents for creating a virtual world that meets the requirements.Ideally,people involved in the 3D field would like to easily use 3D modeling tools for easily creation,editing,and manipulation of 3D objects and scenes that meet their goals.But the reality is not satisfactory,and the complexity of the modeling tools makes this ideal far from being realized.While the demand for 3D content has not increased significantly,creating 3D content is still a difficult problem.In this thesis,we introduce a model that can generate object parts one by one,similar to the language model in natural language processing.It is mainly composed of autoencoder,Seq2 Seq model and self-attention mechanism.The part-based method can reduce the training pressure of the model and achieve the effect of division and conquer.We introduce VT-NET,a deep neural network which represents and generates 3D shapes via sequential part assembly.The input to our network is a 3D shape segmented into parts,where each part is first encoded into a feature representation using a part autoencoder.The design of the autoencoder is particularly critical.If the reconstruction effect is not good,it means that the feature vector does not fit well to the sample space,and then the randomly obtained parts will naturally be poor when doing the generation task.The most important point in this thesis is the use of a quantitative mechanism for the autoencoder module,which changes the continuous value into the discrete value.It seems to be simplified,but actually improves the generation effect.The core of VT-NET is the use of the sequence-to-sequence model,called Transformer in this thesis.It combines the output of the encoder with the input of the decoder to combine the parts into objects in sequential order.Through the study of previous work,this thesis proposes three modules of parts.From the early directly sent to the network to learn the position parameters,to the use of sequence-to-sequence models to analyze the structure and geometric information before and after the parts,these modules help us a lot.The quantitative autoencoder model designed on these network modules compares them with the transformer network.By the quantitative and qualitative evaluations,it is concluded that the quantitative mechanism,implicit field and transformer network structure can greatly improve the reconstruction performance.In addition,this thesis not only considers the performance of reconstruction,but also compares the relevant data.It is also concluded that these architectures are positively helpful to the generation of objects. |