Learning Hub
Introduction to Differential Privacy
Differential Privacy (DP) is a mathematical framework that provides strong privacy guarantees when performing analyses on sensitive data. It ensures that the presence or absence of any single individual's data has a minimal effect on the output of an analysis.
Why is Differential Privacy Important?
Traditional anonymization techniques often fail to protect privacy. With enough auxiliary information, it's possible to re-identify individuals in supposedly "anonymized" datasets. Differential privacy addresses this by adding carefully calibrated noise to the analysis process.
Key Insight
Differential privacy creates plausible deniability. By adding controlled noise, it becomes mathematically impossible to confidently determine whether any individual's data was used in the analysis.
The Privacy-Utility Trade-off
There's an inherent trade-off between privacy and utility (accuracy) in DP. More privacy means more noise, which typically reduces accuracy. The challenge is finding the right balance for your specific application.
Strong Privacy (Low ε)
- More noise added
- Lower accuracy
- Better protection for sensitive data
Strong Utility (Higher ε)
- Less noise added
- Higher accuracy
- Reduced privacy guarantees
Core Differential Privacy Concepts
The Formal Definition
A mechanism M is (ε,δ)-differentially private if for all neighboring datasets D and D' (differing in one record), and for all possible outputs S:
Key Parameters
ε (epsilon): The privacy budget. Lower values mean stronger privacy but typically lower utility.
δ (delta): The probability of the privacy guarantee being broken. Usually set very small (e.g., 10^-5).
Differential Privacy Mechanisms
Laplace Mechanism: Adds noise from a Laplace distribution to numeric queries.
Gaussian Mechanism: Adds noise from a Gaussian (normal) distribution. This is used in DP-SGD.
Exponential Mechanism: Used for non-numeric outputs, selects an output based on a probability distribution.
Privacy Accounting
When you apply multiple differentially private operations, the privacy loss (ε) accumulates. This is known as composition.
Advanced composition theorems and privacy accountants help track the total privacy spend.
Stochastic Gradient Descent Refresher
Standard SGD
Stochastic Gradient Descent (SGD) is an optimization algorithm used to train machine learning models by iteratively updating parameters based on gradients computed from mini-batches of data.
The Basic Update Rule
The standard SGD update for a batch B is:
Where:
- θ represents the model parameters
- η is the learning rate
- ∇L(θ; B) is the average gradient of the loss over the batch B
Privacy Concerns with Standard SGD
Standard SGD can leak information about individual training examples through the gradients. For example:
- Gradients might be larger for outliers or unusual examples
- Model memorization of sensitive data can be extracted through attacks
- Gradient values can be used in reconstruction attacks
These privacy concerns motivate the need for differentially private training methods.
DP-SGD: Core Modifications
How DP-SGD Differs from Standard SGD
Differentially Private SGD modifies standard SGD in two key ways:
1. Per-Sample Gradient Clipping
Compute gradients for each example individually, then clip their L2 norm to a threshold C.
This limits the influence of any single training example on the model update.
2. Noise Addition
Add Gaussian noise to the sum of clipped gradients before applying the update.
The noise scale is proportional to the clipping threshold and the noise multiplier.
The DP-SGD Update Rule
The DP-SGD update can be summarized as:
- Compute per-sample gradients: gi = ∇L(θ; xi)
- Clip each gradient: g̃i = gi × min(1, C/||gi||2)
- Add noise: ḡ = (1/|B|) × (∑g̃i + N(0, σ²C²I))
- Update parameters: θ ← θ - η × ḡ
Where:
- C is the clipping norm
- σ is the noise multiplier
- B is the batch
Hyperparameter Deep Dive
DP-SGD introduces several new hyperparameters that need to be tuned carefully:
Clipping Norm (C)
The maximum allowed L2 norm for any individual gradient.
- Too small: Gradients are over-clipped, limiting learning
- Too large: Requires more noise to achieve the same privacy guarantee
- Typical range: 0.1 to 10.0, depending on the dataset and model
Noise Multiplier (σ)
Controls the amount of noise added to the gradients.
- Higher σ: Better privacy, worse utility
- Lower σ: Better utility, worse privacy
- Typical range: 0.5 to 2.0 for most practical applications
Batch Size
Affects both training dynamics and privacy accounting.
- Larger batches: Reduce variance from noise, but change sampling probability
- Smaller batches: More update steps, potentially consuming more privacy budget
- Typical range: 64 to 1024, larger than standard SGD
Learning Rate (η)
May need adjustment compared to non-private training.
- DP-SGD often requires: Lower learning rates or careful scheduling
- Reason: Added noise can destabilize training with high learning rates
Number of Epochs
More epochs consume more privacy budget.
- Trade-off: More training vs. privacy budget consumption
- Early stopping: Often beneficial for balancing accuracy and privacy
Privacy Accounting
Tracking Privacy Budget
Privacy accounting is the process of keeping track of the total privacy loss (ε) throughout training.
Common Methods
Moment Accountant
Used in the original DP-SGD paper, provides tight bounds on the privacy loss.
Tracks the moments of the privacy loss random variable.
Rényi Differential Privacy (RDP)
Alternative accounting method based on Rényi divergence.
Often used in modern implementations like TensorFlow Privacy and Opacus.
Analytical Gaussian Mechanism
Simpler method for specific mechanisms like the Gaussian Mechanism.
Less tight bounds but easier to compute.
Privacy Budget Allocation
With a fixed privacy budget (ε), you must decide how to allocate it:
- Fixed noise, variable epochs: Set noise level, train until budget is exhausted
- Fixed epochs, variable noise: Set desired epochs, calculate required noise
- Advanced techniques: Privacy filters, odometers, and adaptive mechanisms
Practical Implementation
In practice, privacy accounting is handled by libraries like:
- TensorFlow Privacy
- PyTorch Opacus
- Diffprivlib (IBM)