A Hospital's AI Model
A hospital trains an AI to diagnose diseases using patient records. Alice was one of the patients whose data was used.
Understand the threats that differential privacy protects against
Can an attacker tell if your data was used to train this model?
A hospital trains an AI to diagnose diseases using patient records. Alice was one of the patients whose data was used.
"Was Alice's medical record used to train this model?" The attacker wants to know if Alice was a patient here.
The attacker feeds Alice's data to the model and checks: "How confident is the model?" Models remember training data!
If the model is very confident, the attacker learns Alice was a patient at this hospital - a serious privacy violation!
A hospital trained a model to predict heart disease risk. Let's see if an attacker can tell which patient records were in the training set!
Revealing if someone's medical data was used in training could expose their health conditions
Knowing if someone's financial data was in training could reveal their banking relationships
Determining if someone's posts were used could reveal their participation in studies
Random noise addition: DP-SGD adds carefully calibrated noise to model outputs, making Alice's record indistinguishable from any other patient. The model gives similar confidence scores whether or not someone was in the training data - attackers can't tell the difference between training and test examples.
Recover training data from model gradients
In federated learning, a company trains a model using images from users' devices. An attacker with access to gradient updates attempts to reconstruct the original training images. Let's see how differential privacy protects against this!
Gradient clipping + noise: DP-SGD clips each person's influence on the model, then drowns it in noise - like taking a photo and adding so much blur that faces become unrecognizable blobs. Attackers cannot reverse-engineer the original training images from the model's outputs.
Extract representative features for each class
Feature scrambling: DP-SGD scrambles what the model "remembers" about each class. Instead of learning "digit 7 looks like this sharp image," it learns "7 is somewhere in this fuzzy cloud." Attackers cannot extract clear class representatives or features.
Infer statistical properties of the training dataset
Privacy budget tracking: DP-SGD keeps a "privacy budget" (ฮต) - like a bank account that tracks how much information leaked. When the budget runs low, training stops to prevent dataset statistics (like gender distribution or age ranges) from being exposed.
Combine model outputs with auxiliary data sources
Output noise + bounded leakage: DP-SGD adds noise to model predictions AND tracks total privacy loss (ฮต). Even when attackers combine model outputs with external datasets (social media, public records), the mathematical privacy guarantee ensures they cannot reliably link individuals or infer sensitive attributes.
How different privacy levels affect attack success rates