Font Size: a A A

Design And Implementation Of Data Lake Access Control System For Delta Lake

Posted on:2023-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ShaoFull Text:PDF
GTID:2568307298455124Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Data lake is a new generation of data storage solution,which provides a centralized repository for massive data.Different from data warehouse,data lake allows data to be stored in original format and retain complete data information.It is built on low-cost storage hardware such as distributed storage system,which provides a more efficient and low-cost storage solution for data computing and analysis business,so as to help users make more scientific analysis and decisions.Most of the existing data lake tools abstract the data storage structure.For example,delta lake integrates its unique delta structure on the basis of storage layer.This abstract structure can bring new features to the data lake,but there are also corresponding problems.On the one hand,Delta Lake metadata is scattered in different resource files,and the lack of a unified metadata view increases the difficulty of data governance.On the other hand,Delta Lake uses the open source storage system as the underlying storage,and its basic access control does not isolate the data of different users,so users can easily access the resources of other users,and the distinction of specific user permissions is insufficient.Therefore,only relying on the access control mechanism of the underlying storage system can not meet the needs of Delta Lake for permission differentiation in complex scenarios,and there may be a risk of data leakage.Therefore,aiming at the metadata management and access control requirements in Delta Lake,this thesis uses the open source access control framework to provide a suitable permission model,and designs and implements a data lake access control system for Delta Lake.It mainly includes the following aspects:(1)In view of the difficulty of data Lake data governance,this thesis designs a metadata management method of data lake for Delta Lake.This method is based on the unique abstract format and metadata information of Delta Lake.It is designed to have the metadata management ability necessary for the governance of data lake,so as to achieve the effect of unified and efficient management of metadata information of heterogeneous data.(2)In view of the weak basic access control ability of the storage platform and the inability to provide effective access control for Delta Lake,this thesis designs a policy based data lake access control mechanism,designs and implements policy service components,access control plug-ins and audit capabilities in combination with the open source permission framework,and completes the access request processing of different users and roles in the Delta Lake data lake.(3)Design and implement the data lake access control prototype system for Delta Lake,realize the data governance ability of the special structure of Delta Lake through the metadata management method,and realize the authority control ability of the data lake resources combined with the access control mechanism,and test the functional requirements and performance indicators of the system to verify the effectiveness of the system in the actual scene.In summary,this thesis studies metadata management and access control methods of data lake,and builds a Delta Lake oriented access control prototype system.This study will contribute to the construction of efficient governance and access control capabilities of Delta Lake.
Keywords/Search Tags:Data Lake, Delta Lake, Metadata, Access Control
PDF Full Text Request
Related items