Research On The Distributed Storage And Computing Technology For Sequential Data Analysis

Posted on:2019-12-20

Degree:Doctor

Type:Dissertation

Country:China

Candidate:D J Niu

Full Text:PDF

GTID:1368330590974660

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Sequential data grow enormously and rapidly in the era of big data.Different from the non-sequential data,long-term dependencies exist in sequences.Learning and mining the potential dependencies are critical for sequential data analysis.So far,sequential data analysis has been widely used in the fields of language,speech,video,finance,medicine,biology,Internet of things,traffic and is a hot topic in the research of big data intelligence.The traditional approaches show poor adaptability and are unable to solve the long-term dependencies effectively when facing the explosive growth of sequential data.The dependencies which are characterized by large span and deeply hidden are also big challenges in dealing with sequences.Recurrent neural networks,including the standard RNN and LSTM are able to learn long-term dependencies in sequences of arbitrary length theoretically,and are the remarkably effective models for sequential data.However,when training RNNs with a large number of sequences,tremendous parameters are involved.These parameters are updated and optimized step by step through a classical iteration method using massive training set,which makes the training of RNNs a combined issue of big data processing and high-performance computing.Therefore,it is an important problem to study a novel distributed storage and computing system according to the characteristics of RNNs training in order to improve the training efficiency and the accuracy of sequential data analysis.Based on the introduction of the relevant studies and techniques,we give the main challenges which affect the model training efficiency and the accuracy of sequential data analysis.Then,we present a distributed storage and computing system for sequential data analysis and describe its architecture.We focus on increasing the training efficiency and the accuracy,and carry out the research from three aspects: the storage method on the individual node,the distributed data and metadata management method and the training method based on distributed storage and computing for sequential data analysis.1)We design a node storage method based on NVM,which involves a fast file system and an asymmetric access algorithm for NVM.The prototype is implemented and evaluations are performed using gerneral testing tools.The testing results verify the node storage method can substantially increase the I/O performance of data access and reduce the response time,which guarantees the rapid access to the model parameters and the training set during sequential data analysis.2)A distributed storage strategy is designed for sequential data analysis.The metadata and data in the distributed storage system are used to store and manage the model parameters and the training set,respectively.Then,a metadata hierarchical management algorithm and a data distribution management algorithm based on NVM are proposed.The prototype is implemented and evaluations are carried out by general testing tools.The experimental results show the metadata hierarchical management algorithm can provide strong adaptibility and reduce the space and time overhead of metadata search,and the data distribution management algorithm based on NVM can speedup read and write as well as improve IOPS.Thus,the distributed storage strategy helps to improve the efficiency of training the sequential data analysis model.3)Distributed RNN training methods are proposed for sequential data analysis.RNN is used for modeling the sequential data and in order to speedup its training,the model parameters,the training samples and the computations are properly distributed among the multiple nodes of the distributed system.An autonomous RNN based on distributed storage and computing,an efficient training algorithm based on neuron dynamic activation,and an adaptive LSTM with duration are presented,respectively.The prototype is implemented and various evaluations are carried out.The experimental results validate the proposed approaches can increase the training efficiency and the accuracy of RNN as well as improve the model scalability for sequential data analysis.

Keywords/Search Tags:

Sequential Data Analysis, Recurrent Neural Network, Non-volatile Memory, Distributed Processing

PDF Full Text Request

Related items

1	Improved Recurrent Neural Networks And Its Application In Chinese Language Processing
2	Research And Design Of Deep Neural Network Compression Algorithm Based On Processing-in-Memory Framework
3	Non-volatile Memory Device Based Neural Network Accelerator Design
4	The Design And Implementation Of Low-latency Distributed Key/Value Storage Based On Non-volatile Memory And RDMA
5	Design And Implementation Of Distributed Memory Object System Based On RDMA
6	Research On Key Processing-in-memory Technologies With High-performance And Low-power For Deep Learning On Edge Devices
7	Analysis And Design For Associative Memory Based On Delayed Recurrent Neural Network With Memristor
8	Research On Lightweight And Reliability Of Convolutional Neural Network Edge Computing For Non-volatile Memory
9	Research On Key Technologies Of Non-volatile Memory System With High Performance And Security
10	Improved Recurrent Neural Network Method And Its Application