Learning Hub

DP-SGD Concepts

  • Introduction to Differential Privacy
  • Core DP Concepts
  • SGD Refresher
  • DP-SGD: Core Modifications
  • Hyperparameter Deep Dive
  • Privacy Accounting

Introduction to Differential Privacy

Differential Privacy (DP) is a mathematical framework that provides strong privacy guarantees when performing analyses on sensitive data. It ensures that the presence or absence of any single individual's data has a minimal effect on the output of an analysis.

Why is Differential Privacy Important?

Traditional anonymization techniques often fail to protect privacy. With enough auxiliary information, it's possible to re-identify individuals in supposedly "anonymized" datasets. Differential privacy addresses this by adding carefully calibrated noise to the analysis process.

Key Insight

Differential privacy creates plausible deniability. By adding controlled noise, it becomes mathematically impossible to confidently determine whether any individual's data was used in the analysis.

The Privacy-Utility Trade-off

There's an inherent trade-off between privacy and utility (accuracy) in DP. More privacy means more noise, which typically reduces accuracy. The challenge is finding the right balance for your specific application.

Strong Privacy (Low ε)

  • More noise added
  • Lower accuracy
  • Better protection for sensitive data

Strong Utility (Higher ε)

  • Less noise added
  • Higher accuracy
  • Reduced privacy guarantees

Core Differential Privacy Concepts

The Formal Definition

A mechanism M is (ε,δ)-differentially private if for all neighboring datasets D and D' (differing in one record), and for all possible outputs S:

P(M(D) ∈ S) ≤ e^ε × P(M(D') ∈ S) + δ

Key Parameters

ε (epsilon): The privacy budget. Lower values mean stronger privacy but typically lower utility.

δ (delta): The probability of the privacy guarantee being broken. Usually set very small (e.g., 10^-5).

Differential Privacy Mechanisms

Laplace Mechanism: Adds noise from a Laplace distribution to numeric queries.

Gaussian Mechanism: Adds noise from a Gaussian (normal) distribution. This is used in DP-SGD.

Exponential Mechanism: Used for non-numeric outputs, selects an output based on a probability distribution.

Privacy Accounting

When you apply multiple differentially private operations, the privacy loss (ε) accumulates. This is known as composition.

Advanced composition theorems and privacy accountants help track the total privacy spend.

Stochastic Gradient Descent Refresher

Standard SGD

Stochastic Gradient Descent (SGD) is an optimization algorithm used to train machine learning models by iteratively updating parameters based on gradients computed from mini-batches of data.

The Basic Update Rule

The standard SGD update for a batch B is:

θ ← θ - η∇L(θ; B)

Where:

  • θ represents the model parameters
  • η is the learning rate
  • ∇L(θ; B) is the average gradient of the loss over the batch B

Privacy Concerns with Standard SGD

Standard SGD can leak information about individual training examples through the gradients. For example:

  • Gradients might be larger for outliers or unusual examples
  • Model memorization of sensitive data can be extracted through attacks
  • Gradient values can be used in reconstruction attacks

These privacy concerns motivate the need for differentially private training methods.

DP-SGD: Core Modifications

How DP-SGD Differs from Standard SGD

Differentially Private SGD modifies standard SGD in two key ways:

1. Per-Sample Gradient Clipping

Compute gradients for each example individually, then clip their L2 norm to a threshold C.

This limits the influence of any single training example on the model update.

2. Noise Addition

Add Gaussian noise to the sum of clipped gradients before applying the update.

The noise scale is proportional to the clipping threshold and the noise multiplier.

The DP-SGD Update Rule

The DP-SGD update can be summarized as:

  1. Compute per-sample gradients: gi = ∇L(θ; xi)
  2. Clip each gradient: g̃i = gi × min(1, C/||gi||2)
  3. Add noise: ḡ = (1/|B|) × (∑g̃i + N(0, σ²C²I))
  4. Update parameters: θ ← θ - η × ḡ

Where:

  • C is the clipping norm
  • σ is the noise multiplier
  • B is the batch

Hyperparameter Deep Dive

DP-SGD introduces several new hyperparameters that need to be tuned carefully:

Clipping Norm (C)

The maximum allowed L2 norm for any individual gradient.

  • Too small: Gradients are over-clipped, limiting learning
  • Too large: Requires more noise to achieve the same privacy guarantee
  • Typical range: 0.1 to 10.0, depending on the dataset and model

Noise Multiplier (σ)

Controls the amount of noise added to the gradients.

  • Higher σ: Better privacy, worse utility
  • Lower σ: Better utility, worse privacy
  • Typical range: 0.5 to 2.0 for most practical applications

Batch Size

Affects both training dynamics and privacy accounting.

  • Larger batches: Reduce variance from noise, but change sampling probability
  • Smaller batches: More update steps, potentially consuming more privacy budget
  • Typical range: 64 to 1024, larger than standard SGD

Learning Rate (η)

May need adjustment compared to non-private training.

  • DP-SGD often requires: Lower learning rates or careful scheduling
  • Reason: Added noise can destabilize training with high learning rates

Number of Epochs

More epochs consume more privacy budget.

  • Trade-off: More training vs. privacy budget consumption
  • Early stopping: Often beneficial for balancing accuracy and privacy

Privacy Accounting

Tracking Privacy Budget

Privacy accounting is the process of keeping track of the total privacy loss (ε) throughout training.

Common Methods

Moment Accountant

Used in the original DP-SGD paper, provides tight bounds on the privacy loss.

Tracks the moments of the privacy loss random variable.

Rényi Differential Privacy (RDP)

Alternative accounting method based on Rényi divergence.

Often used in modern implementations like TensorFlow Privacy and Opacus.

Analytical Gaussian Mechanism

Simpler method for specific mechanisms like the Gaussian Mechanism.

Less tight bounds but easier to compute.

Privacy Budget Allocation

With a fixed privacy budget (ε), you must decide how to allocate it:

  • Fixed noise, variable epochs: Set noise level, train until budget is exhausted
  • Fixed epochs, variable noise: Set desired epochs, calculate required noise
  • Advanced techniques: Privacy filters, odometers, and adaptive mechanisms

Practical Implementation

In practice, privacy accounting is handled by libraries like:

  • TensorFlow Privacy
  • PyTorch Opacus
  • Diffprivlib (IBM)