| With the increasing development of science and technology,artificial intelligence has been widely used in human daily work and life,bringing great convenience to human production and life.The General Office of the State Council,the Ministry of Industry and Information Technology,and local government departments at all levels have successively issued policies and measures on the implementation of artificial intelligence.At the same time,researchers in related fields have gradually deepened their research on artificial intelligence technology represented by deep learning,and gradually formed a relatively complete artificial intelligence education system.Although the popularization and application of deep learning technology is in full swing,there are still many problems in the actual application process:(1)The traditional deep learning development process is cumbersome,the environment configuration is complex,the model hosting is complex,and data sets need to be prepared in advance,often requires a lot of work time,thus resulting in a long development cycle,which is not conducive to the company’s product development iteration and the development of deep learning-related education and research in colleges and universities.(2)For small and medium-sized companies or university laboratories,computing resources are expensive but difficult to use effectively,and the utilization rate of resources is low,resulting in waste of resources.(3)For project teams,studios or research teams,traditional deep learning development lacks professional project management processes,and there are problems such as low team collaboration efficiency,chaotic authority management,and chaotic project deployment,which further reduces the efficiency of deep learning development.In view of the above problems in traditional deep learning development,this paper mainly carries out the design and implementation of the following deep learning container cloud platform based on container technology:(1)We investigate the research background and development status of the deep learning development platform,summarize the characteristics and shortcomings of the existing platform functions,and provide reference for the deep learning container cloud platform designed in this paper.We pre-research and test key technologies for platform implementation,such as Docker containerization technology,Kubernetes,Ceph,and Keycloak.From the perspective of platform users,the development and deployment of the deep learning container cloud platform are analyzed for functional and non-functional requirements.(2)Based on technology selection and demand analysis,the platform architecture is designed from three aspects: overall architecture,network architecture,and functional architecture.According to the platform module division,six modules are designed and implemented:cluster management,multi-tenant management,data set management,project management,image management,and model management.(3)We perform functional testing and performance testing of platform services for each functional module,and analyze and explain the test results.We summarize the design and implementation of the deep learning container cloud platform,put forward improvement suggestions for the current deficiencies of platform services,and make plans for future improvement directions.The deep learning container cloud platform designed and implemented in this paper has been officially put into use.The main users are students and development teams in the laboratory.The platform provides users with a convenient one-stop deep learning development environment,cluster management,project Core services such as management,data set management,and multi-tenant management provide a convenient environment for largescale concurrent model training and large-scale data set utilization,improving the development efficiency of deep learning. |