| To supply decision-maker with better business analysis service,eBay Inc.merged all dist-ributed data in company's network to manage these business data.However,the data tables in data warehouse to be created is also increasing rapidly with more complex business analysis damand.At the same time,to describe the development of historical data in data warehouse and it future characteristic,we have to store all the historical data table which means more data in our data warehouse.Large scale of data occupied several PB storage and complex data query also cost much CPU resource.Hence,it became a problem how to get relationship of different data tables in database and use these relationships to help the data warehouse administrator to clean system storage effectively and reduce the cost of processing.Current database management tools basically all provide the visualization of data tablesbut they could just provide relationships in table level.To describe the relationships between different tables better and meet the need of business,eBay Inc.make the relationship analysis in column level.Based on relationships between different columns,the administrator of data warehouse could optimize inner organization of different data table and make analysis of these columns to get data usage condition of different tuples.It could help administrator to recognize popular or unpopular data easier.Based on above problems,this artile provides the design and implementation of data table relationship visualization system and you can also call it Lineage.This article would introduce Lineage in serveral aspects:(1)Defining relationship between different data tables.(2)Implementing interface of Druid SQL Parser to make analysis of data tables.(3)Implementing data flow module to process data.(4)Designing and implementing data visualization module based on B/S framework.Now,Lineage has been deployed in production environment and be in daily use.eBay China generated almost 30GB data query log in data warehouse everyday.Lineage could effectively provide relationship of different data tables and other information of these tables such as job scripts.This system could also help administrator of data warehouse know usage of every table and magage storage of the data warehouse. |