Differential Privacy in Federated Learning

Read Paper

This project was done as part of a graduate special topics class (MA 591) in collaboration with Dr. Mansoor Haider and Dr. Olivera Kotevska

Overview

This project examined the integration of Differential Privacy (DP) into a Federated Learning (FL) framework using the MIMIC-IV ICU dataset. The goal was to understand how different privacy budgets affect model performance when training is distributed across heterogeneous clients. We implemented the system in Python using PyTorch and Flower, with Gaussian noise injection and gradient clipping as the primary privacy mechanisms.

Key Features

Federated Setup: Simulated multiple clients by partitioning the dataset along racial groupings to reflect real-world data heterogeneity.
Differential Privacy: Implemented Distributed DP (DDP) with configurable epsilon budgets, injecting calibrated Gaussian noise before gradients are aggregated.
Data Engineering: The MIMIC-IV dataset spans over 100GB of patient records, requiring significant preprocessing via SQL and Power Query.
Evaluation: Measured accuracy and convergence across a range of $\varepsilon$ values to characterize where the privacy-utility tradeoff becomes significant.

Implementation

We used Flower to coordinate the federated training loop and PyTorch for model definition and gradient tracking. Privacy was enforced client-side before any aggregation, simulating a realistic decentralized deployment. Results showed the expected degradation in accuracy under tighter privacy budgets, with the effect more pronounced given the non-IID data distribution across clients.