| With the complexity and high dimension of data structure,data visualization analysis has become an indispensable link in the age of big data.Traditional data visualization technology has been far from meeting the needs of modern information development.Data visualization technology is an important branch of data structure analysis and a research hotspot in the direction of data science.Through data visualization,users can more accurately understand the characteristics of deep data.However,due to the problems of invalid data,incomplete data and inaccurate data division,data cannot be effectively displayed,which makes the research of data visualization into a bottleneck.Soil is the basis of all forestry production,and the storage,integration and analysis of forest land soil data is an essential part of forestry informatization.The data structure of soil has become more and more rich and complex,and the traditional analysis form has been far from meeting the needs of modern forestry information development.It is an inevitable trend of future development to accommodate forest land soil data with a large amount of information by taking the form of graphics and images as the carrier.In this regard,aiming at data visualization technology,this paper takes forest land soil data as the research object,and combines Hadoop platform to conduct regional visualization of forest land soil data and geographic space visualization of forest land soil data.Mainly completed the following work:(1)Aiming at the problem that the traditional K nearest neighbor interpolation algorithm is not ideal when the soil data set has a large missing rate,this paper proposes an improved algorithm based on K nearest neighbor interpolation method-weighted K nearest neighbor multiple interpolation algorithm to solve the problem that the interpolation effect of missing data is not ideal when the missing rate is large.The algorithm gives a higher weight to the sample points closer to each other through Gaussian function to solve the problem of classification error of missing value attributes.At the same time,the idea of multiple interpolation method is used to ensure the important properties of data distribution.Compared with the traditional K nearest neighbor interpolation algorithm and multiple interpolation algorithm,this algorithm has a higher accuracy in dealing with the problem of missing values of soil data when the missing rate is large.(2)Aiming at the special attributes of soil data and the low accuracy of the current soil data regional visualization method,a visualization method based on the optimization of regional visualization method-grid based soil data regional visualization method was proposed.On the basis of Spark big data computing framework,the regional visualization of soil data is optimized by grid division,and the finer grained grid division of data is achieved by Pearson correlation coefficient.In order to solve the problem of single color of visualization effect after grid division,a visualization expression method of soil data change based on geographical space is designed to enhance the visualization effect.(3)According to the data source,actual demand and big data standard system of forest land soil data,a forest land soil data visualization system is designed and implemented.In order to better express the difference of soil data,a grid based soil data regional visualization method was used to show the visualization effect through the soil data of Gaofeng Forest Farm in Guangxi Zhuang Autonomous Region.The Spark distributed computing framework in Hadoop is used to realize parallel computing of grid data,and solve the problem of analyzing and processing massive data for forest land soil data in the visualization process.According to the characteristics and actual needs of forest land soil data,the business functions and data processing flow of the forest land soil data visualization system are designed,and the forest land soil data visualization system integrating storage management,data analysis and calculation and data visualization is implemented,which solves the problems of forest land soil data storage difficulties,low utilization rate of data value,data islands,etc.In this paper,firstly,the missing data set is supplemented by the weighted K-nearest neighbor multiple interpolation algorithm.Secondly,the regional visualization and spatiotemporal distribution visualization of forest land soil data are performed by the grid based soil data regional visualization method.Compared with the traditional visualization method,the visualization method of this system displays forest land soil data in a more comprehensive and intuitive way,On this basis,the forest land soil data visualization system based on Hadoop is realized. |