Font Size: a A A

Research And Application Of Multi-classification A Algorithm For Compositional Data Under Dirichlet Feature Embedding

Posted on:2024-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:X P CaoFull Text:PDF
GTID:2568307052993329Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet in the era of big data,new data is being produced continuously,and more and more data types are being generated.Among them,compositional data is widely used in geography,economy,biology and other fields.Compositional data is data that reflects relative information and has the Aitchison geometry.Classification is an important research content in machine learning.In daily life,people tend to classify an object or event according to some characteristics,and make decisions according to the classification results.For compositional data,direct application of traditional multi-classification algorithm may lead to misleading results.The existing classification methods of compositional data are mainly multi-classification of single compositional data or binary classification of multivariate component data,which is not applicable to the multi-classification of multivariate component data.Therefore,this paper studies the multi-classification of multivariate component data.The main research contents are as follows:(1)This paper proposes a multivariate component data classification algorithm based on Dirichlet feature embedding(D-Co DAGSVM),which is based on Dirichlet feature embedding and directed acyclic graph support vector machine(DAGSVM).Firstly,conditional Dirichlet density estimation is carried out for each type of compositional data in the training data set.According to the principle of DAGSVM,Dirichlet feature embedding is carried out on the data combination.Then,DAGSVM classifier is constructed based on the transformed training set data.Input the test data set into the DAGSVM classifier,and finally determine the category label of the data set.(2)Based on the D-Co DAGSVM algorithm proposed in this paper,the numerical simulation is carried out to simulate the multivariate component data under different category numbers,different part numbers,different sample sizes and different feature numbers of compositional data,as well as under general circumstances(that is,the number of component parts and sample sizes are different).In addition,Accuracy,F1,G-mean and Kappa coefficient are used to compare the proposed algorithm with two algorithms,namely,original data-based DAGSVM multi-classification algorithm(Co DAGSVM)and multi-classification algorithm based on Ilr transform(Ilr-Co DAGSVM)to Verify the effectiveness of the proposed algorithm.The simulation results show that the D-Co DAGSVM algorithm has the characteristics of high accuracy and consistency,and can solve the multi-classification problem of multivariate component data well.(3)The D-Co DAGSVM algorithm is empirically analyzed,and the algorithm is applied to the metabonomics data set to classify the chemical components of astragalus from different habitats and the endogenous metabolites data that could be identified after the intervention of Astragalus in mice.The classification results showed that the algorithm is effective.
Keywords/Search Tags:Compositional data, Classification, Dirichlet distribution, Directed acyclic graph support vector machine
PDF Full Text Request
Related items