Font Size: a A A

On The Model And Evaluation Of Linear Regression For Interval-Valued Symbolic Data

Posted on:2011-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:C ZanFull Text:PDF
GTID:2120330338481605Subject:Information management and information systems
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, the explosion of information and the enrichment of data happened. Faced with huge sample space, there are two limitations about traditional regression analysis method. Firstly, With the increasing amount of data, model complexity increases. Secondly, sample point is considered as study object, that grasping data characteristics at whole aspect is hard. Symbolic Data Analysis Methods, "data package", not only reduces the amount of data operations, but also realized the massive data from the whole to grasp the relationship between interior. The paper proposes new regression analysis method.Interval data is a kind of important type of symbolic data. Present regression analysis(CM,MINMAX,CRM) about interval data are assumed to be variable in the interval of uniform distribution, but non-uniform distribution is more common to interval data, e.g., normal distribution, Skewed distribution. Based on descriptive statistics of uniform and general interval symbolic data, the paper proposes Descriptive Statistics based Method(DSM). And for the current existing traditional interval data distance and Hausdorff distance only apply for the uniform interval. Elaborate a new interval data distance which calledμσdistance.The assessment of the proposed prediction method is based on the new distance in the framework of a Monte Carlo experiment.Research showed that compared with regression analysis method of general interval symbolic data, DSM leads to more objective results than other methods.But uniform interval symbolic data, CM can be regarded as a special case of DSM.Finally, the approaches presented in this paper are applied to a real data set with website"Digg"and their performance is compared.
Keywords/Search Tags:Regression analysis, Monte Carlo experiment, Interval symbolic data
PDF Full Text Request
Related items