Privacy Attacks - DP-SGD Explorer

🎯

Membership Inference Attack

Can an attacker tell if your data was used to train this model?

🤔 What is this attack?

🏥

A Hospital's AI Model

A hospital trains an AI to diagnose diseases using patient records. Alice was one of the patients whose data was used.

→

🕵️

An Attacker's Question

"Was Alice's medical record used to train this model?" The attacker wants to know if Alice was a patient here.

→

🎯

The Attack Trick

The attacker feeds Alice's data to the model and checks: "How confident is the model?" Models remember training data!

→

⚠️

Privacy Breach!

If the model is very confident, the attacker learns Alice was a patient at this hospital - a serious privacy violation!

🧪 Try the Attack Yourself

The Setup

A hospital trained a model to predict heart disease risk. Let's see if an attacker can tell which patient records were in the training set!

Training Data Sample ✓ Was in training

🏥

Patient #A2847

Age: 58 | BP: 145/92
Cholesterol: High

Model Confidence:

96%

VS

Test Data Sample ✗ Not in training

🏥

Patient #B3192

Age: 56 | BP: 142/90
Cholesterol: High

Model Confidence:

68%

🔍 The Attack Insight: The model is 28% more confident on training data! An attacker can use this difference to infer that Patient A2847 was treated at this hospital - a serious privacy breach.

🛡️ How Does Differential Privacy Help?

Adjust Privacy Protection

Privacy Level: Medium

High Privacy Low Privacy

Attack Success Rate

65% Success

With medium privacy protection, the attacker can still succeed 65% of the time. Try increasing privacy!

🌍 Real-World Impact

🏥

Healthcare

Revealing if someone's medical data was used in training could expose their health conditions

💳

Finance

Knowing if someone's financial data was in training could reveal their banking relationships

📱

Social Media

Determining if someone's posts were used could reveal their participation in studies

🛡️ How DP-SGD Defends Against This Attack

Random noise addition: DP-SGD adds carefully calibrated noise to model outputs, making Alice's record indistinguishable from any other patient. The model gives similar confidence scores whether or not someone was in the training data - attackers can't tell the difference between training and test examples.

🔍

Data Reconstruction Attack

Recover training data from model gradients

Threat: Attackers with access to gradients can potentially reconstruct original training images, especially in federated learning scenarios.

📋 Scenario: Federated Learning Attack

In federated learning, a company trains a model using images from users' devices. An attacker with access to gradient updates attempts to reconstruct the original training images. Let's see how differential privacy protects against this!

🔍 Reconstruction Quality at Different Privacy Levels

Ground Truth

No Privacy
(ε = ∞)

Low Privacy
(ε = 8.0)

High Privacy
(ε = 0.1)

Ground Truth: Original private training images

No Privacy: Perfect reconstruction - privacy breach!

Your Settings: Adjust sliders below to see changes

High Privacy: Reconstruction fails - data protected!

⚠️ Medium Quality

🛡️ Differential Privacy Settings

Gradient Clipping Norm: 1.0

🔒 Tight Clipping (More Private) 🔓 Loose Clipping (Less Private)

Noise Multiplier (σ): 1.0

🔇 No Noise (Vulnerable) 🔊 High Noise (Protected)

📊 Attack Analysis

Reconstruction SSIM Score:

0.85

(1.0 = perfect reconstruction, 0.0 = completely failed)

Privacy Level: ❌ Low (ε ≈ 8.0)

Attack Success: 92%

⚠️ Without sufficient privacy protection, attackers can reconstruct training images with high fidelity from gradient information alone!

🛡️ How DP-SGD Defends Against This Attack

Gradient clipping + noise: DP-SGD clips each person's influence on the model, then drowns it in noise - like taking a photo and adding so much blur that faces become unrecognizable blobs. Attackers cannot reverse-engineer the original training images from the model's outputs.

🔄

Model Inversion Attack

Extract representative features for each class

Attack Goal: Generate synthetic data that represents what the model learned about each class, potentially revealing sensitive attributes.

Inversion Parameters

Target Class: Digit 7

Privacy Level: Medium

🔒 High Privacy
(ε = 1.0) 🔓 Low Privacy
(ε = 10.0)

Generated Features

Inverted Features

Confidence: 87%

⚠️ Medium privacy. Some class features are visible but degraded. Move the slider left for higher privacy!

🛡️ How DP-SGD Defends Against This Attack

Feature scrambling: DP-SGD scrambles what the model "remembers" about each class. Instead of learning "digit 7 looks like this sharp image," it learns "7 is somewhere in this fuzzy cloud." Attackers cannot extract clear class representatives or features.

📊

Property Inference Attack

Infer statistical properties of the training dataset

Privacy Risk: Attackers can infer sensitive dataset properties like demographic distributions, even without seeing individual records.

Target Properties

Property Type:

Privacy Level: Medium

🔒 High Privacy
(ε = 1.0) 🔓 Low Privacy
(ε = 10.0)

Model Access: Black-box

Black-box Gray-box White-box

Inferred Properties

Inferred Distribution:

Male: 52% ± 8%

Female: 48% ± 8%

Confidence intervals show attack uncertainty

⚠️ Medium privacy. The attacker can infer dataset properties with moderate accuracy. Move the slider left for higher privacy!

🛡️ How DP-SGD Defends Against This Attack

Privacy budget tracking: DP-SGD keeps a "privacy budget" (ε) - like a bank account that tracks how much information leaked. When the budget runs low, training stops to prevent dataset statistics (like gender distribution or age ranges) from being exposed.

🔗

Linkage Attack

Combine model outputs with auxiliary data sources

Advanced Threat: Attackers combine model predictions with external datasets (social media, public records) to identify individuals and infer sensitive attributes.

Model Outputs

• Prediction scores

• Confidence levels

• Feature importance

+

Auxiliary Data

• Public records

• Social media

• Purchase history

→

Linked Profile

• Identity revealed

• Sensitive attributes

• Behavioral patterns

Linkage Scenario

Auxiliary Data Quality: High Model Privacy: Low (ε=8.0)

Linkage Success

Successful Links: 68%

Confidence: High

• 340 individuals identified

• 89% attribute accuracy

• 12% false positives

🛡️ How DP-SGD Defends Against This Attack

Output noise + bounded leakage: DP-SGD adds noise to model predictions AND tracks total privacy loss (ε). Even when attackers combine model outputs with external datasets (social media, public records), the mathematical privacy guarantee ensures they cannot reliably link individuals or infer sensitive attributes.

⚖️

Attack Effectiveness Comparison

How different privacy levels affect attack success rates

No Privacy (ε = ∞)

85%

Average Attack Success

Low Privacy (ε = 8.0)

72%

Average Attack Success

Medium Privacy (ε = 3.0)

58%

Average Attack Success

High Privacy (ε = 1.0)

42%

Average Attack Success

Key Insight: As privacy budget (ε) decreases, attack success rates drop significantly. DP-SGD provides measurable protection against all attack types.

🛡️ Privacy Attacks in Machine Learning

Membership Inference Attack

🤔 What is this attack?

A Hospital's AI Model

An Attacker's Question

The Attack Trick

Privacy Breach!

🧪 Try the Attack Yourself

The Setup

🛡️ How Does Differential Privacy Help?

Adjust Privacy Protection

Attack Success Rate

🌍 Real-World Impact

Healthcare

Finance

Social Media

🛡️ How DP-SGD Defends Against This Attack

Data Reconstruction Attack

📋 Scenario: Federated Learning Attack

🔍 Reconstruction Quality at Different Privacy Levels

🛡️ Differential Privacy Settings

📊 Attack Analysis

🛡️ How DP-SGD Defends Against This Attack

Model Inversion Attack

Inversion Parameters

Generated Features

🛡️ How DP-SGD Defends Against This Attack

Property Inference Attack

Target Properties

Inferred Properties

🛡️ How DP-SGD Defends Against This Attack

Linkage Attack

Model Outputs

Auxiliary Data

Linked Profile

Linkage Scenario

Linkage Success

🛡️ How DP-SGD Defends Against This Attack

Attack Effectiveness Comparison

No Privacy (ε = ∞)

Low Privacy (ε = 8.0)

Medium Privacy (ε = 3.0)

High Privacy (ε = 1.0)