Font Size: a A A

Granular Computing Approaches To Multi-modal Data Feature Extraction With Applications

Posted on:2018-02-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:L Y WenFull Text:PDF
GTID:1311330542960633Subject:Petroleum engineering calculations
Abstract/Summary:PDF Full Text Request
With the development of wireless sensor networks and Internet of things,data in fields,such as oil fields,is being produced at an unprecedented speed.How to preprocess these data to facilitate storage,mining and utilization is one of the core issues in the field of big data.For structured data,feature selection and extraction are two important techniques for dimensionality reduction,of which the latter is more challenging.Discretization is a classical feature extraction method for numerical data.For many years,new algorithms have been proposed to obtain better processing speed and effectiveness.Attribute value partitioning is a feature extraction method for symbolic data,but it lacks sufficient attention and leads to fewer results.Granular computing is a pervasive methodology in computational intelligence,and it is an effective tool for complex problem solving.In data preprocessing,granular computing constructs granular structures from different angles and levels,and selects appropriate granularity to obtain data representation which is more conducive to problem solving.Some concrete theories of granular computing,such as rough sets,fuzzy sets,quotient spaces,three-way decision and concept lattices,have made great progress in feature selection,and are widely used in petroleum,finance,medical and other fields.In contrast,feature extraction is still relatively scarce.In this dissertation,the general framework and specific methods of feature extraction based on granular computing are proposed for multi modal data,in order to obtain data with less storage space and better classifier quality.Using oilfield real data sets and UCI public data sets,we compare it with other popular methods to verify its advance.The contributions are as follows:(1)A granular computing framework suitable for feature extraction is designed.The framework consists of two stages.In the granularity construction stage,a granular structure is established at a single feature level.In the granularity selection stage,the final feature extraction scheme is obtained by selecting the granularity within and among the attributes.(2)A two-stage discretization algorithm based on information entropy is proposed for numerical data.In the local discretization stage,the granularity of individual features is constructed and selected by minimizing the conditional information entropy.In the global discretization stage,the extended decision table is constructed by using the granularity structure obtained at the previous stage,and the coarse granularity is selected without losing the information.Compared with the classical and popular discretization algorithms,the results show that the algorithm has better generalization ability,good classification accuracy and reasonable processing speed.This method can effectively balance the contradiction between efficiency and effectiveness of the discretization algorithm.(3)A two stage granular computing method for attribute value partitioning is proposed for symbolic data.In the single attribute granularity construction stage,the nodes corresponding to the attribute values are merged step by step,and a binary tree is constructed in a bottom-up manner.When merging nodes,minimizing the information loss is the optimization objective,so that the splitting of the attribute values is as close as possible to the root node.In the global granularity selection stage,based on the maximum information gain,the tree nodes are split gradually from top to bottom,and finally the optimal attribute value partitioning scheme is obtained.This method effectively solves many problems caused by the scarcity of prior knowledge,and realizes the automatic construction and selection of the granularity structure of attribute values.Compared with the state-of-the-art algorithm of attribute value partitioning,the results show that the proposed algorithm can select fewer attribute values,while maintaining or improving the performance of the classifier.(4)A fusion algorithm of feature extraction is proposed for mixed data.Firstly,the local discretization method is used to discretize each numerical data,and then it is transformed into symbolic data.Then,the attribute value partitioning method is used to obtain the final attribute extraction scheme.The method takes into account the correlation between different modal features,and extracts the features from a global perspective.The experimental results show that the proposed algorithm can normalize the mixed data effectively and obtain more concise data representation under the condition of minimizing the loss of the information.(5)During the granularity construction of symbolic data,a tree balance method is proposed to further improve the quality of grain structure.When building the granularity of a single feature,the algorithm computes the boundaries of the candidate nodes for each merge operation,thus ensuring that the two merged nodes are at the same or adjacent granularity level.By introducing this technique,all the granularity structures that are finally constructed are balanced binary trees.Since the balanced binary tree has a good tree structure,the introduction of the algorithm can reduce the number of node splitting in the granularity selection stage,and finally reduce the complexity of the feature extraction algorithm.The research work of this dissertation extends the application category of granular computing.A method of feature extraction based on granular computing in multi modal data environment is proposed to reduce data storage space and maintain or improve classification capability,and it has theoretical and practical significance for the research of data preprocessing in big data field.
Keywords/Search Tags:multi-modal data, feature extraction, granular computing, discretization, attribute value partitioning
PDF Full Text Request
Related items