| With the development of food culture and Internet technology,it becomes popular for people to share the food images online.However,how to automatically identify the attribute information from these food images remains a great challenge in the field of food image recognition.In fact,food image recognition technology has shown a broad application prospect in not only providing people with required recipe information,such as food categories,ingredients,and cooking methods,but also predicting the food nutrition information which can be used in nutritional analysis,science-based diet collocation and medical health management.However,most existing food image recognition methods are designed for a single task,such as food classification,while the techniques that can simultaneously predict ingredients,cooking methods and calories are rarely investigated.Especially,for food calorie prediction,current methods normally include several calculation procedures and meanwhile ignore the influence of cooking methods in terms of food calories.In addition,with regard to food datasets,there is currently a lack of public datasets that contain both the food cooking methods and the calorie information.To cope with the limitations in current study of food dataset and single-task recognition method,this paper conducts research and analysis from two aspects: the development in improved food dataset and the multi-task food image recognition models.The main contributions of this paper are as follows:(1)The construction of a Chinese and Western food dataset.At present,the public food datasets are lack of information such as cooking methods and calories,and most of them have unbalanced food categories,usually purely Western or Chinese food.To this end,we first scrap food images and corresponding recipes from three recipe websites;then we propose a corpus-based method for automatically extracting ingredients and cooking methods from the recipes;next we apply support vector machine-based outlier detection method for food images and implement data cleaning through multiple steps to reduce noises;finally,data augmentation is implemented on the original dataset to solve the category imbalance problem.As a result,a high-quality food dataset that combines Chinese and Western food categories is constructed,which includes food images,categories,ingredients,cooking methods and calories.The dataset contains 77362 samples,216 ingredients,18 cooking methods and 75 food categories,covering most common Chinese and Western dishes.(2)A food image recognition model based on multi-task convolutional neural network is proposed to realize multi-task and end-to-end recognition from food image to several food attributes.Most of the existing models can only recognize a single food attribute,and the recognition of multiple food attributes often requires a multi-step retrieval method.However,in this method the accuracy of independent steps cannot be guaranteed and the correlation between food attributes is ignored.To solve this problem,the image feature extraction module uses a convolutional neural network to extract the global feature of input image,and then inputs the global feature into 4 subtask modules.Each subtask module is composed of fully connected layers,where the food classification module contains a multi-class sub-model;the ingredient and cooking method extraction modules are designed as multi-label classification sub-models;the calorie prediction module contains a regression sub-model.This multi-task model realizes the simultaneous prediction on food category,ingredient,cooking method and calorie.The proposed multi-task model can effectively improve the accuracy of food classification and calorie prediction owing to the application of the correlation between four food attributes.Using the food dataset constructed in(1)to train,validate and test this model,the top-1 test accuracy of the food classification task reaches 63.47% and the mean absolute error of calorie prediction is 79.6kcal.(3)A food image recognition model based on multi-task attention network is proposed.Taking into account the complexity of food image features and that different food recognition tasks focus on different image features,in order to more accurately extract these fine-grained features and further improve the recognition of each task,spatial attention modules are applied to each subtask branch on the basis of(2),and the task-specific features are extracted from the shared feature maps.This model first generates the global feature map of the food image through the shared convolutional neural network,and then each subtask attention module uses the attention mask to extract the key local features from the global feature map.The attention mask assigns weights to each part of the shared feature map for specific tasks,realizing simultaneous learning of shared global features and task-specific features.Compared with the results obtained by(2),the top-1 food classification accuracy of this improved model improves to 68.59%,and the absolute error of calorie prediction is reduced to 71.4kcal. |