Loading…

Joint Device Participation, Dataset Management, and Resource Allocation in Wireless Federated Learning via Deep Reinforcement Learning

Federated Learning (FL) enables large-scale machine learning without uploading the private data of wireless devices. Due to the heterogeneity and limitation of the devices' resources, the FL accuracy and latency substantially depend on the device participation and training dataset size. In this...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on vehicular technology 2024-03, Vol.73 (3), p.4505-4510
Main Authors:	Chen, Jinlian, Zhang, Jun, Zhao, Nan, Pei, Yiyang, Liang, Ying-Chang, Niyato, Dusit
Format:	Article
Language:	English
Subjects:	Computational modeling Convexity Dataset management Datasets Deep learning deep reinforcement learning device participation Energy consumption Federated learning Heterogeneity Machine learning Markov processes Multiagent systems Network latency Optimization Resource allocation Resource management Task analysis Training Wireless communication
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Federated Learning (FL) enables large-scale machine learning without uploading the private data of wireless devices. Due to the heterogeneity and limitation of the devices' resources, the FL accuracy and latency substantially depend on the device participation and training dataset size. In this letter, to strike a trade-off between the FL accuracy and FL latency, a joint device participation, dataset management and resource allocation (DPDMRA) optimization problem is investigated. To solve the non-convex optimization problem, a Markov decision process is formulated for the resource-limited wireless FL. Moreover, due to the high dimensional continuous action space, a multi-agent softmax deep double deterministic policy gradients (MASD3) method is employed to obtain the optimal DPDMRA strategies. The double actor networks and softmax operator are designed to alleviate the underestimation bias. Simulation results demonstrate that the proposed DRL method can obtain the global optimal policy without complete information in the dynamic environment. Compared with the other baseline schemes, the proposed MASD3 approach can achieve the larger system utility with the better convergence performance.
ISSN:	0018-9545 1939-9359
DOI:	10.1109/TVT.2023.3325843