| In recent years,the contradiction between the increasing number of chronic disease patients in China and the scarcity of domestic nutritionists with rich professional knowledge has become increasingly prominent,making it difficult for many patients to get timely and efficient dietary guidance.In addition,the process for nutritionists to give personalized dietary recommendations to patients can be time-consuming and cumbersome.Therefore,how to analyze the daily food intake of patients timely and efficiently has become a vital issue.The popularity of mobile devices and the powerful learning ability of deep learning are the breakthrough points of the above problems,which can facilitate the implementation of real-time and efficient assessment and improve the accuracy of nutrition estimation.In this thesis,we focus on food and ingredient joint recognition and nutrition estimation,based on deep learning algorithms.Among them,food and ingredient joint recognition is a fine-grained classification task,which lays the foundation for nutrition estimation.And the latter is a regression task,which focuses on providing patients with real-time and accurate dietary intake.In the study of food and ingredient joint recognition,there is a certain correlation between food and ingredients.Based on the above view,in this thesis,we propose a food and ingredient joint recognition network based on the region-level attention mechanism,named as RLA-Net.Firstly,a two-branch structure is designed to extract global food features and local-region ingredient features under the supervision of the ground-truth label.Secondly,by utilizing the mutual relationship between food categories and ingredients,we propose a Region-Weighted Module(RWM)to excavate deep features to assist in classification.Finally,the GradNorm algorithm is introduced to optimize the multi-task loss function for better performance.The experiments show that our RLA-Net model achieves state-of-the-art performance in ingredient recognition on the Chinese Food dataset VIREO Food-172,and the results of food classification are also competitive.In the study of nutrition estimation,the multi-stage methods need to be defined and optimized individually at each stage.It means that optimized features from each stage are difficult to use interactively.Meanwhile,the prediction errors from the early stages will accumulate continuously,which ultimately affects evaluation accuracy.Therefore,in this thesis,we put forward an end-to-end nutritional assessment scheme based on multi-task learning,focusing on achieving efficient utilization and interaction of different information from the image feature level.Specifically,we introduce the auxiliary tasks of ingredient classification and depth estimation to learn the shared feature representation,which benefits the implicit interaction of different information.Further,a dual attention module is proposed to transfer spatial information from the depth map,and channel features optimized by ingredient classification to the nutrition estimation task,enriching the image feature representation for better assessment performance.Finally,the experimental results show that our proposed model has a good evaluation effect on the nutrition dataset Nutrition5K,and has certain advantages compared with the single-task model and the multi-stage architecture. |