DP-SGD Explorer - Learning Hub

Introduction to Differential Privacy

Differential Privacy (DP) is a mathematical framework that provides strong privacy guarantees when performing analyses on sensitive data. It ensures that the presence or absence of any single individual's data has a minimal effect on the output of an analysis.

Why is Differential Privacy Important?

Traditional anonymization techniques often fail to protect privacy. With enough auxiliary information, it's possible to re-identify individuals in supposedly "anonymized" datasets. Differential privacy addresses this by adding carefully calibrated noise to the analysis process.

Key Insight

Differential privacy creates plausible deniability. By adding controlled noise, it becomes mathematically impossible to confidently determine whether any individual's data was used in the analysis.

The Privacy-Utility Trade-off

There's an inherent trade-off between privacy and utility (accuracy) in DP. More privacy means more noise, which typically reduces accuracy. The challenge is finding the right balance for your specific application.

Strong Privacy (Low ε)

More noise added
Lower accuracy
Better protection for sensitive data

Strong Utility (Higher ε)

Less noise added
Higher accuracy
Reduced privacy guarantees

Core Differential Privacy Concepts

The Formal Definition

A mechanism M is (ε,δ)-differentially private if for all neighboring datasets D and D' (differing in one record), and for all possible outputs S:

P(M(D) ∈ S) ≤ e^ε × P(M(D') ∈ S) + δ

Key Parameters

ε (epsilon): The privacy budget. Lower values mean stronger privacy but typically lower utility.

δ (delta): The probability of the privacy guarantee being broken. Usually set very small (e.g., 10^-5).

Differential Privacy Mechanisms

Laplace Mechanism: Adds noise from a Laplace distribution to numeric queries.

Gaussian Mechanism: Adds noise from a Gaussian (normal) distribution. This is used in DP-SGD.

Exponential Mechanism: Used for non-numeric outputs, selects an output based on a probability distribution.

Privacy Accounting

When you apply multiple differentially private operations, the privacy loss (ε) accumulates. This is known as composition.

Advanced composition theorems and privacy accountants help track the total privacy spend.

Stochastic Gradient Descent Refresher

Standard SGD

Stochastic Gradient Descent (SGD) is an optimization algorithm used to train machine learning models by iteratively updating parameters based on gradients computed from mini-batches of data.

The Basic Update Rule

The standard SGD update for a batch B is:

θ ← θ - η∇L(θ; B)

Where:

θ represents the model parameters
η is the learning rate
∇L(θ; B) is the average gradient of the loss over the batch B

Privacy Concerns with Standard SGD

Standard SGD can leak information about individual training examples through the gradients. For example:

Gradients might be larger for outliers or unusual examples
Model memorization of sensitive data can be extracted through attacks
Gradient values can be used in reconstruction attacks

These privacy concerns motivate the need for differentially private training methods.

DP-SGD: Core Modifications

How DP-SGD Differs from Standard SGD

Differentially Private SGD modifies standard SGD in two key ways:

1. Per-Sample Gradient Clipping

Compute gradients for each example individually, then clip their L2 norm to a threshold C.

This limits the influence of any single training example on the model update.

2. Noise Addition

Add Gaussian noise to the sum of clipped gradients before applying the update.

The noise scale is proportional to the clipping threshold and the noise multiplier.

The DP-SGD Update Rule

The DP-SGD update can be summarized as:

Compute per-sample gradients: g_i = ∇L(θ; x_i)
Clip each gradient: g̃_i = g_i × min(1, C/||g_i||₂)
Add noise: ḡ = (1/|B|) × (∑g̃_i + N(0, σ²C²I))
Update parameters: θ ← θ - η × ḡ

Where:

C is the clipping norm
σ is the noise multiplier
B is the batch

Hyperparameter Deep Dive

DP-SGD introduces several new hyperparameters that need to be tuned carefully:

Clipping Norm (C)

The maximum allowed L2 norm for any individual gradient.

Too small: Gradients are over-clipped, limiting learning
Too large: Requires more noise to achieve the same privacy guarantee
Typical range: 0.1 to 10.0, depending on the dataset and model

Noise Multiplier (σ)

Controls the amount of noise added to the gradients.

Higher σ: Better privacy, worse utility
Lower σ: Better utility, worse privacy
Typical range: 0.5 to 2.0 for most practical applications

Batch Size

Affects both training dynamics and privacy accounting.

Larger batches: Reduce variance from noise, but change sampling probability
Smaller batches: More update steps, potentially consuming more privacy budget
Typical range: 64 to 1024, larger than standard SGD

Learning Rate (η)

May need adjustment compared to non-private training.

DP-SGD often requires: Lower learning rates or careful scheduling
Reason: Added noise can destabilize training with high learning rates

Number of Epochs

More epochs consume more privacy budget.

Trade-off: More training vs. privacy budget consumption
Early stopping: Often beneficial for balancing accuracy and privacy

Privacy Accounting

Tracking Privacy Budget

Privacy accounting is the process of keeping track of the total privacy loss (ε) throughout training.

Common Methods

Moment Accountant

Used in the original DP-SGD paper, provides tight bounds on the privacy loss.

Tracks the moments of the privacy loss random variable.

Rényi Differential Privacy (RDP)

Alternative accounting method based on Rényi divergence.

Often used in modern implementations like TensorFlow Privacy and Opacus.

Analytical Gaussian Mechanism

Simpler method for specific mechanisms like the Gaussian Mechanism.

Less tight bounds but easier to compute.

Privacy Budget Allocation

With a fixed privacy budget (ε), you must decide how to allocate it:

Fixed noise, variable epochs: Set noise level, train until budget is exhausted
Fixed epochs, variable noise: Set desired epochs, calculate required noise
Advanced techniques: Privacy filters, odometers, and adaptive mechanisms

Practical Implementation

In practice, privacy accounting is handled by libraries like:

TensorFlow Privacy
PyTorch Opacus
Diffprivlib (IBM)

Learning Hub

DP-SGD Concepts

Introduction to Differential Privacy

Why is Differential Privacy Important?

Key Insight

The Privacy-Utility Trade-off

Strong Privacy (Low ε)

Strong Utility (Higher ε)

Core Differential Privacy Concepts

The Formal Definition

Key Parameters

Differential Privacy Mechanisms

Privacy Accounting

Stochastic Gradient Descent Refresher

Standard SGD

The Basic Update Rule

Privacy Concerns with Standard SGD

DP-SGD: Core Modifications

How DP-SGD Differs from Standard SGD

1. Per-Sample Gradient Clipping

2. Noise Addition

The DP-SGD Update Rule

Hyperparameter Deep Dive

Clipping Norm (C)

Noise Multiplier (σ)

Batch Size

Learning Rate (η)

Number of Epochs

Privacy Accounting

Tracking Privacy Budget

Common Methods

Moment Accountant

Rényi Differential Privacy (RDP)

Analytical Gaussian Mechanism

Privacy Budget Allocation

Practical Implementation