| In traditional single label learning,it is considered that each object in the real world corresponds to only one category label.However,objects in the real world are often ambiguous and may correspond to multiple categories of labels.For example,in image classification,a picture can correspond to multiple tags;a report may also have multiple topics.With the development of the times,with the increasing amount of data,the complexity of data is also increasing.The traditional single label learning can not meet the needs of technology development.The study of multi-label learning of one object corresponding to multiple labels is more general,and has gradually attracted wide attention of scholars at home and abroad.At present,the multi-label learning algorithm has not achieved satisfactory results.Compared with the traditional single label learning problem,in the multi-label learning problem,a sample may correspond to multiple labels,and there are often correlations between labels.Effective use of the correlation between labels can improve the performance of multi-label learning algorithm.In addition,with the continuous development of new technologies such as the Internet of Things and social networks,the global data volume has achieved explosive growth in the past 20 years.How to mine the correlation between data in the large data environment has become a research hotspot.Therefore,this paper will focus on how to mine the association between data in large data environment,and how to combine the association with multi-label learning.The main contents of this paper are as follows:1.A Hadoop-based frequent itemset mining algorithm Apriori_ING is proposed.The common parallel strategy of frequent itemset mining algorithm Apriori is to parallelize the support statistics,but the generation and pruning steps of candidate itemsets are completed on a single computer,which does not give full play to the advantages of parallelization.Apriori_ING improves these problems.Firstly,Hadoop framework and Apriori algorithm are combined to propose a method of data set partitioning based on transaction,and then data set format is transposed.Then,the storage structure of <Former Item> and <Latter Item> is designed for frequent itemsets,and the generation and pruning strategy of candidate itemsets is designed based on the new structure,which improves the transmission rate of frequent itemsets between clusters and the generation and pruning rate of candidate itemsets.The experimental results show that Apriori_ING has obvious advantages in algorithm efficiency compared with common Apriori parallel algorithms in large data environment,especially when the number of items in data set is huge.2.A FreLP algorithm which combines association rules with multi-label learning is proposed.Firstly,frequent itemsets and association rules are obtained by Apriori_ING algorithm,and an algorithm IETG,which generates tag sets from frequent itemsets and association rules,is proposed.Then,the graph structure is combined with IETG algorithm to get G-IETG algorithm.G-IETG uses IETG to generate the graph,and uses the shortest path of the graph to get the label set.Finally,the label set and LP algorithm are combined to train an LP algorithm for each label set,and the final learner is obtained by voting mechanism.FreLP algorithm uses Apriori_ING algorithm to mine the association between tags,and reduces the category and search space size by using the association between tags reasonably,which improves the accuracy of the algorithm.The experimental results show that the performance of FreLP algorithm is better than that of classical multi-label learning algorithm under various evaluation criteria. |