Font Size: a A A

Visualization Of High Dimensional Data

Posted on:2021-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:X R ZhaoFull Text:PDF
GTID:2427330626955578Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the arrival of the era of big data,the amount of data created by human beings in production and life activities has been increasing at an unimaginable speed.Data visualization technology for large amounts of data can be converted to the direct observation of graphics,make people not only can quickly understand the surface of the data information,at the same time also can more easily push performance data underneath the implied logical relationship,it is to deal with the huge amounts of data efficiently,and to obtain valuable information of an important means.Parallel coordinate graph is one of many visualization methods of multidimensional data.It maps the multiple dimensions of high-dimensional data into multiple coordinate axes one by one,and displays the highdimensional data set as a set of broken lines intersecting with the parallel coordinate axes.However,when the dimension of data set is too high,the display space of parallel coordinate graph is too large,and the validity is highly dependent on the order of dimension.Although some scholars put forward high dimensional data set is divided according to the correlation dimension divided into several subsets to construct multiple low dimensional parallel coordinate,but the existing method,the idea is to use the classical MDS method mostly divides a subset of the related dimensions,layout and the distance between the dimension of this layout can lead to the distortion of resulting in the error.Therefore,this paper studies this problem and proposes a new layout method.Based on the shortcomings of MDS algorithm,this paper chooses to use Isomap algorithm to replace MDS algorithm for layout.The calculation of long distance in Isomap algorithm has been replaced by the estimation of the inherent geodesic distance,so the layout results calculated by this algorithm can reduce the error caused by distance distortion and reflect the more accurate correlation between dimensions.The specific algorithm is shown below.Firstly,each dimension of the data set is regarded as a vector,and according to the distance between the vectors,Isomap algorithm is used to map the dimensions into points and lay them out on the two-dimensional plane.Then the threshold value is set according to the demand,and the broomkerbosch algorithm is used to screen out the relevant subset of dimensions.Finally,the greedy algorithm is used to sort the dimensions of subsets and construct several parallel coordinate graphs of low dimension.In order to enhance the visual efficiency expression,the broken lines are colored according to the sample category,so as to improve the aesthetics and information expression ability of the parallel coordinate map.In this paper,two sets of data are selected for the experiment.The experimental results show that the dimension subsets screened by the layout of Isomap algorithm have higher correlation between dimensions than those obtained by MDS algorithm.At the end of the paper,the author summarizes the work done in this paper,points out the shortcomings in the research,and looks forward to the future research direction and goal.
Keywords/Search Tags:High-dimensional data visualization, Parallel coordinates plots, Isomap algorithm
PDF Full Text Request
Related items