Professional Documents
Culture Documents
Reliable Machine Learning in the Wild Workshop, 34 th Interna- This technical report is structured as follows: In section
tional Conference on Machine Learning, Sydney, Australia, 2017. 1 we provide an overview over Foolbox and demonstrate
Foolbox: A Python toolbox to benchmark the robustness of machine learning models
how to benchmark a model and report the result. In section gradient of another. This makes it possible to at-
2 we describe the adversarial attack methods that are imple- tack non-differentiable models using gradient-based at-
mented in Foolbox and explain the internal hyperparameter tacks and allows transfer attacks of the type described by
tuning. Papernot et al. (2016c).
Calculates theP
mean absolute error • the version number of Foolbox,
d(x, y) = N1 i |xi − yi | • the set of input samples,
between two vectors x and y. • the set of attacks applied to the inputs,
• L∞ • any non-default hyperparameter setting,
foolbox.distances.Linfinity • the criterion and
Calculates the L∞-norm d(x, y) = maxi |xi − yi | • the distance metric.
between two vectors x and y.
1.3. Versioning System
• L0
foolbox.distances.L0 P Each release of Foolbox is tagged with a version number of
Calculates the L0-norm d(x, y) = i 1xi 6=yi be- the type MAJOR.MINOR.PATCH that follows the princi-
tween two vectors x and y. ples of semantic versioning1 with some additional precau-
tions for comparable benchmarking. We increment the
To achieve invariance to the scale of the input values, we
normalize each element of x, y by the difference between 1. MAJOR version when we make changes to the API
the smallest and largest allowed value (e.g. 0 and 255). that break compatibility with previous versions.
Gradient Attack target class is not specified we choose ℓ as the class of the
foolbox.attacks.GradientAttack adversarial example generated by the gradient attack.
This attack computes the gradient g(x0 ) = ∇x L(x0 , ℓ0 )
once and then seeks the minimum step size ǫ such that x0 + SLSQP Attack
ǫg(x0 ) is adversarial. foolbox.attacks.SLSQPAttack
Compared to L-BFGS-B, SLSQP allows to additionally
Gradient Sign Attack (FGSM) specify non-linear constraints. This enables us to skip the
foolbox.attacks.GradientSignAttack line-search and to directly optimise
foolbox.attacks.FGSM
2
This attack computes the gradient g(x0 ) = ∇x L(x0 , ℓ0 ) kρk2 s.t. L(x + ρ, ℓ) = l ∧ xi + ρi ∈ [bmin , bmax ]
once and then seeks the minimum step size ǫ such that x0 +
ǫ sign(g(x0 )) is adversarial (Goodfellow et al., 2014). where ℓ 6= ℓ0 is the target class. If the target class is not
specified we choose ℓ as the class of the adversarial exam-
Iterative Gradient Attack ple generated by the gradient attack.
foolbox.attacks.IterativeGradientAttack
Iterative gradient ascent seeks adversarial perturbations by Jacobian-Based Saliency Map Attack
maximizing the loss along small steps in the gradient di- foolbox.attacks.SaliencyMapAttack
rection g(x) = ∇x L(x, ℓ0 ), i.e. the algorithm iteratively This targeted attack (Papernot et al., 2016a) uses the gradi-
updates xk+1 ← xk + ǫg(xk ). The step-size ǫ is tuned ent to compute a saliency score for each input feature (e.g.
internally to find the minimum perturbation. pixel). This saliency score reflects how strongly each fea-
ture can push the model classification from the reference to
Iterative Gradient Sign Attack the target class. This process is iterated, and in each iter-
foolbox.attacks.IterativeGradientSignAttack ation only the feature with the maximum saliency score is
Similar to iterative gradient ascent, this attack seeks perturbed.
adversarial perturbations by maximizing the loss along
small steps in the ascent direction sign(g(x)) = 2.2. Score-Based Attacks
sign (∇x L(x, ℓ0 )), i.e. the algorithm iteratively updates
xk+1 ← xk + ǫ sign(g(xk )). The step-size ǫ is tuned in- Score-based attacks do not require gradients of the model,
ternally to find the minimum perturbation. but they expect meaningful scores such as probabilites or
logits which can be used to approximate gradients.
DeepFool L2 Attack
foolbox.attacks.DeepFoolL2Attack Single Pixel Attack
In each iteration DeepFool (Moosavi-Dezfooli et al., 2015) foolbox.attacks.SinglePixelAttack
computes for each class ℓ 6= ℓ0 the minimum distance This attack (Narodytska & Kasiviswanathan, 2016) probes
d(ℓ, ℓ0 ) that it takes to reach the class boundary by approx- the robustness of a model to changes of single pixels by set-
imating the model classifier with a linear classifier. It then ting a single pixel to white or black. It repeats this process
makes a corresponding step in the direction of the class for every pixel in the image.
with the smallest distance.
Local Search Attack
DeepFool L∞ Attack
foolbox.attacks.LocalSearchAttack
foolbox.attacks.DeepFoolLinfinityAttack
This attack (Narodytska & Kasiviswanathan, 2016) mea-
Like the DeepFool L2 Attack, but minimizes the L∞-norm
sures the model’s sensitivity to individual pixels by apply-
instead.
ing extreme perturbations and observing the effect on the
probability of the correct class. It then perturbs the pixels
L-BFGS Attack to which the model is most sensitive. It repeats this process
foolbox.attacks.LBFGSAttack until the image is adversarial, searching for additional crit-
L-BFGS-B is a second-order optimiser that we here use to ical pixels in the neighborhood of previously found ones.
find the minimum of
2 Approximate L-BFGS Attack
L(x + ρ, ℓ) + λ kρk2 s.t. xi + ρi ∈ [bmin , bmax ]
foolbox.attacks.ApproximateLBFGSAttack
where ℓ 6= ℓ0 is the target class (Szegedy et al., 2013). A Same as L-BFGS except that gradients are computed nu-
line-search is performed over the regularisation parameter merically. Note that this attack is only suitable if the input
λ > 0 to find the minimum adversarial perturbation. If the dimensionality is small.
Foolbox: A Python toolbox to benchmark the robustness of machine learning models