Font Size: a A A

The Amino Acid Analysis Based On T-SNE Clustering

Posted on:2017-11-16Degree:MasterType:Thesis
Country:ChinaCandidate:D P ZhangFull Text:PDF
GTID:2311330488468716Subject:Theoretical Physics
Abstract/Summary:PDF Full Text Request
The simulation of biological macromolecules plays an important role in the field of biological research.Quantum Chemical Calculations and Molecular Dynamics Simulation are widely used in the field of biological molecules calculation nowadays.Force field is very important to molecular mechanics,and we can get on the Monte Carlo and Molecular Dynamics Simulation based on it.The construction of force field needs lots of molecular structures,energies,charge distribution and some other physical-chemical information data.The work in this paper has constructed a set of useful and representative data to build protein force field though a large-scale of calculation and clustering.Considered the biological environment around the protein when choose proteins,we got on the choice from variety of protein complex and made them run molecular dynamics simulation to obtain the trajectories of proteins.Larges of amino acid structures were extracted from these trajectories according to the method of MFCC.When we extracted these structures,we use a kind of process about adding caps.Though the method of MFCC,we extract structures,energies,charge distribution about 20 kinds of amino acid and collect out a set of original data.Because the capacity of original data is too big,there are some repeated data in it,and the weight of distribution of every kind of data is different,we need to simplify the original data.In order to maintain the feature of the original data,we introduce a new kind of clustering method,called t-Stochastic Neighbor Embedding(t-SNE).The method aims at build a low dimensions space and minimizes the Kullback-Leisler divergences of data distribution between the high dimensions space and the low dimensions space,in order to obtain a set of low dimensions space data,which are easily visible to analysis.t-SNE clustering is more superior than K-means clustering.It’s convenient to extract and apply the original data though t-SNE.It has an important scene in constructing force field and obtain the physical-chemical properties of protein molecular quickly and correctly though the use of data above.
Keywords/Search Tags:Amino Acid, MFCC, t-SNE, Clustering, Dimension Reduction
PDF Full Text Request
Related items