| The Chumon is extremely valuable as a vehicle of Vietnamese culture and has been studied by numerous experts in the humanities such as philology,linguistics.There were no available Chumon data sets or deep learning algorithms to recognize the words.The thesis applying deep learning to the recognition of the Chumon image.1)Organize the complete data set construction process including data acquisition,data preprocessing,data annotation and data quality inspection.Describe the challenges encountered in building the data set and propose solutions accordingly,On this basis,a perfect crowd-sourcing labeling system is constructed.2)The thesis build a comprehensive and diversified training dataset of Chumon word recognition,which containing 270261 pieces of Chumon image,cover 92%common words.The dataset is the first public training dataset about the recognition of Chumon images,and it is helpful to the research of other scholars.3)Aim at Chumon data set containing many words,different styles and similar structure,The thesis build word recognition model of Chumon image based on deep learning,and optimize the model from the perspective of data augmentation,the long-tail probolem,bayesian optimization of hyperparameters.Finally the model Weighted_F1 reaches 88.68%and the model Macro_F1 reaches 79.06%. |