๐Ÿ›ก๏ธ Privacy Attacks in Machine Learning

Understand the threats that differential privacy protects against

๐ŸŽฏ

Membership Inference Attack

Can an attacker tell if your data was used to train this model?

๐Ÿค” What is this attack?

๐Ÿฅ

A Hospital's AI Model

A hospital trains an AI to diagnose diseases using patient records. Alice was one of the patients whose data was used.

โ†’
๐Ÿ•ต๏ธ

An Attacker's Question

"Was Alice's medical record used to train this model?" The attacker wants to know if Alice was a patient here.

โ†’
๐ŸŽฏ

The Attack Trick

The attacker feeds Alice's data to the model and checks: "How confident is the model?" Models remember training data!

โ†’
โš ๏ธ

Privacy Breach!

If the model is very confident, the attacker learns Alice was a patient at this hospital - a serious privacy violation!

๐Ÿงช Try the Attack Yourself

The Setup

A hospital trained a model to predict heart disease risk. Let's see if an attacker can tell which patient records were in the training set!

Training Data Sample โœ“ Was in training
๐Ÿฅ
Patient #A2847
Age: 58 | BP: 145/92
Cholesterol: High
Model Confidence:
96%
VS
Test Data Sample โœ— Not in training
๐Ÿฅ
Patient #B3192
Age: 56 | BP: 142/90
Cholesterol: High
Model Confidence:
68%
๐Ÿ” The Attack Insight: The model is 28% more confident on training data! An attacker can use this difference to infer that Patient A2847 was treated at this hospital - a serious privacy breach.

๐Ÿ›ก๏ธ How Does Differential Privacy Help?

Adjust Privacy Protection

High Privacy Low Privacy

Attack Success Rate

65% Success
With medium privacy protection, the attacker can still succeed 65% of the time. Try increasing privacy!

๐ŸŒ Real-World Impact

๐Ÿฅ

Healthcare

Revealing if someone's medical data was used in training could expose their health conditions

๐Ÿ’ณ

Finance

Knowing if someone's financial data was in training could reveal their banking relationships

๐Ÿ“ฑ

Social Media

Determining if someone's posts were used could reveal their participation in studies

๐Ÿ›ก๏ธ How DP-SGD Defends Against This Attack

Random noise addition: DP-SGD adds carefully calibrated noise to model outputs, making Alice's record indistinguishable from any other patient. The model gives similar confidence scores whether or not someone was in the training data - attackers can't tell the difference between training and test examples.

๐Ÿ”

Data Reconstruction Attack

Recover training data from model gradients

Threat: Attackers with access to gradients can potentially reconstruct original training images, especially in federated learning scenarios.

๐Ÿ“‹ Scenario: Federated Learning Attack

In federated learning, a company trains a model using images from users' devices. An attacker with access to gradient updates attempts to reconstruct the original training images. Let's see how differential privacy protects against this!

๐Ÿ” Reconstruction Quality at Different Privacy Levels

Ground Truth
No Privacy
(ฮต = โˆž)
Low Privacy
(ฮต = 8.0)
High Privacy
(ฮต = 0.1)
Ground Truth: Original private training images
No Privacy: Perfect reconstruction - privacy breach!
Your Settings: Adjust sliders below to see changes
High Privacy: Reconstruction fails - data protected!
โš ๏ธ Medium Quality

๐Ÿ›ก๏ธ Differential Privacy Settings

๐Ÿ”’ Tight Clipping (More Private) ๐Ÿ”“ Loose Clipping (Less Private)
๐Ÿ”‡ No Noise (Vulnerable) ๐Ÿ”Š High Noise (Protected)

๐Ÿ“Š Attack Analysis

Reconstruction SSIM Score:
0.85
(1.0 = perfect reconstruction, 0.0 = completely failed)
Privacy Level: โŒ Low (ฮต โ‰ˆ 8.0)
Attack Success: 92%
โš ๏ธ Without sufficient privacy protection, attackers can reconstruct training images with high fidelity from gradient information alone!

๐Ÿ›ก๏ธ How DP-SGD Defends Against This Attack

Gradient clipping + noise: DP-SGD clips each person's influence on the model, then drowns it in noise - like taking a photo and adding so much blur that faces become unrecognizable blobs. Attackers cannot reverse-engineer the original training images from the model's outputs.

๐Ÿ”„

Model Inversion Attack

Extract representative features for each class

Attack Goal: Generate synthetic data that represents what the model learned about each class, potentially revealing sensitive attributes.

Inversion Parameters

๐Ÿ”’ High Privacy
(ฮต = 1.0)
๐Ÿ”“ Low Privacy
(ฮต = 10.0)

Generated Features

Inverted Features
Confidence: 87%
โš ๏ธ Medium privacy. Some class features are visible but degraded. Move the slider left for higher privacy!

๐Ÿ›ก๏ธ How DP-SGD Defends Against This Attack

Feature scrambling: DP-SGD scrambles what the model "remembers" about each class. Instead of learning "digit 7 looks like this sharp image," it learns "7 is somewhere in this fuzzy cloud." Attackers cannot extract clear class representatives or features.

๐Ÿ“Š

Property Inference Attack

Infer statistical properties of the training dataset

Privacy Risk: Attackers can infer sensitive dataset properties like demographic distributions, even without seeing individual records.

Target Properties

๐Ÿ”’ High Privacy
(ฮต = 1.0)
๐Ÿ”“ Low Privacy
(ฮต = 10.0)
Black-box Gray-box White-box

Inferred Properties

Inferred Distribution:
Male: 52% ยฑ 8%
Female: 48% ยฑ 8%
Confidence intervals show attack uncertainty
โš ๏ธ Medium privacy. The attacker can infer dataset properties with moderate accuracy. Move the slider left for higher privacy!

๐Ÿ›ก๏ธ How DP-SGD Defends Against This Attack

Privacy budget tracking: DP-SGD keeps a "privacy budget" (ฮต) - like a bank account that tracks how much information leaked. When the budget runs low, training stops to prevent dataset statistics (like gender distribution or age ranges) from being exposed.

๐Ÿ”—

Linkage Attack

Combine model outputs with auxiliary data sources

Advanced Threat: Attackers combine model predictions with external datasets (social media, public records) to identify individuals and infer sensitive attributes.

Model Outputs

โ€ข Prediction scores
โ€ข Confidence levels
โ€ข Feature importance
+

Auxiliary Data

โ€ข Public records
โ€ข Social media
โ€ข Purchase history
โ†’

Linked Profile

โ€ข Identity revealed
โ€ข Sensitive attributes
โ€ข Behavioral patterns

Linkage Scenario

Linkage Success

Successful Links: 68%
Confidence: High
โ€ข 340 individuals identified
โ€ข 89% attribute accuracy
โ€ข 12% false positives

๐Ÿ›ก๏ธ How DP-SGD Defends Against This Attack

Output noise + bounded leakage: DP-SGD adds noise to model predictions AND tracks total privacy loss (ฮต). Even when attackers combine model outputs with external datasets (social media, public records), the mathematical privacy guarantee ensures they cannot reliably link individuals or infer sensitive attributes.

โš–๏ธ

Attack Effectiveness Comparison

How different privacy levels affect attack success rates

No Privacy (ฮต = โˆž)

85%
Average Attack Success

Low Privacy (ฮต = 8.0)

72%
Average Attack Success

Medium Privacy (ฮต = 3.0)

58%
Average Attack Success

High Privacy (ฮต = 1.0)

42%
Average Attack Success
Key Insight: As privacy budget (ฮต) decreases, attack success rates drop significantly. DP-SGD provides measurable protection against all attack types.