FIG.02

Adversarial Evasion

Test-time input perturbations designed to force misclassification. Analysis of threat models, robustness thresholds, and certified defenses in operational environments.

Threat Model Scope
White-Box
Black-Box
Targeted
Untargeted
Budget (ε)
EVASION MECHANICS
Input x
+ δ (||δ||_p ≤ ε)
Model f(x)
argmax P(y|x+δ)
Decision Boundary
Output y'
Y' ≠ Y_true
Misclassification
DECISION BOUNDARY SKETCH
Class A (Safe) Class B (Adversarial Target)
δ
ε-ball (L_p norm)

Robustness Ladder

L0: Baseline Model

HIGH RISK

Standard empirical risk minimization. No explicit defense against test-time perturbations.

Pros
High standard accuracy, fast training.
Cons
Fails at minimal ε perturbations.

L1: Data Augmentation

MODERATE RISK

Training with random noise, rotations, or heuristic perturbations.

Pros
Improves average-case corruption robustness.
Cons
Vulnerable to targeted optimization (PGD/FGSM).

L2: Adversarial Training

LOW RISK

Min-max optimization: training on actively generated worst-case adversarial examples.

Pros
Empirically robust against known attacks.
Cons
Accuracy trade-off, computationally expensive.

L3: Certified Defenses

SECURE

Mathematical guarantees (e.g., Randomized Smoothing) that no perturbation within ε can change the prediction.

Pros
Provable lower bound on robustness.
Cons
Scalability issues, conservative bounds.

Attack Success vs Perturbation Budget (ε)

Comparing defense strategies under PGD attack (L_inf norm)

Baseline
Adv. Trained
Certified
OPERATING THRESHOLD