Text Classification Model Based On Distributed Machine Learning

Posted on:2024-08-08

Degree:Master

Type:Thesis

Country:China

Candidate:X C Sheng

Full Text:PDF

GTID:2558307136495334

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Text classification is one of the key technologies for efficiently utilizing massive text data.During the training of text data,some data may contain sensitive information that needs to be protected from leakage or improper use.Therefore,data privacy is an issue that needs to be taken seriously in text classification tasks.In this thesis,we propose a text classification method based on federated learning and differential privacy to effectively protect the privacy of training data.To improve the training efficiency of the server’s initialized model,we study a distributed text classification method based on Spark.Finally,we implement a distributed text classification system in hybrid mode,applying the proposed method to practical applications.Specifically,the main contributions of this thesis are as follows:(1)Proposed a distributed text classification method based on Spark,which fully utilizes the distributed computing power of Spark and the powerful text representation learning ability of the BERT model.The proposed method effectively solves the problem of low efficiency in training largescale news data with server-side initialized models,and ensures that the distributed learning and centralized learning have comparable accuracy.Experiments show that the proposed method outperforms other word embedding methods when performing text classification tasks in a distributed environment using Spark.Compared to centralized learning methods,the proposed method reduces computing time by 59.53%,significantly improving training efficiency.(2)Proposed a differentially private SGD algorithm that combines differential privacy with the federated learning framework to implement a differentially private federated BERT model for text classification.Additionally,a privacy budget calculation method was proposed in the algorithm to track detailed information on privacy loss.This method ensures that the federated learning process is not affected by inference attacks when training parameter transmission,protects the parameter information and features from being exposed,and explores the impact of different parameters on algorithm efficiency.Experimental results show that the proposed method can achieve a model accuracy of 64.8% while protecting privacy.(3)Design and implement a hybrid mode distributed text classification system.The system combines Spark and federated learning technology to perform large-scale text classification tasks in a distributed environment while protecting the privacy of training data.Functional and performance testing shows that this system can meet the functional requirements and real-time requirements of distributed text classification.In the scenario of predicting text,the classification recognition accuracy can reach 86.3%,demonstrating good practical application effects.

Keywords/Search Tags:

Text classification, Distributed machine learning, Spark, Federated learning, Differential privacy, Data privacy protection

PDF Full Text Request

Related items

1	Research And Implementation Of Differential Privacy Data Protection Method In Federated Learning
2	Research And Implementation Of Differential Privacy Protection Technology Under Federated Learning
3	Research And Application Of Privacy Protection Federated Learning Methods For Heterogeneous Data
4	Research On Data Privacy Protection Method Based On Differential Privacy Mechanism
5	Research And Application Of Federated Learning Privacy Preservation Method Based On Differential Privacy
6	Research On Federal Learning Privacy Protection Method Based On Differential Privacy
7	Research On Instance-based Federated Transfer Learning Method And Its Differential Privacy Protection
8	Research On Dynamic Asynchronous Federated Learning With Privacy Protection
9	Research On Data Privacy Protection Method Of Federated Learning Across Data Silos
10	Research On Optimization Of Federated Learning Algorithm Based On Differential Privacy