Black-box decision-based attacks on images

In the previous post, we reviewed a series of black-box score-based adversarial attacks where the adversary has to estimate the gradient by querying the target model and retrieving the labels’ confidence score.

In this post, we are going to explore the third category of black-box attacks, namely, black-box decision-based attacks. Under these settings, the only knowledge the attacker has about the model is only discrete hard-label predictions.

Compared to score-based or transfer-based attacks, decision-based attacks are much more relevant in a real-world scenario where confidence scores are usually not available.

This is a very challenging problem since the direct extension of state-of-the-art white-box attacks to the hard-label black-box setting will require minimizing a non-continuous step function, which is combinatorial and cannot be solved by a gradient-based optimizer.

More specifically, we are going to learn about the following attacks:

  • Boundary Attack;
  • QEBA;
  • Evolutionary Attack;
  • Hop-Skip-Jump-Attack.
Different type of adversarial attacks

Overview

This post is part of a collection including the following 6 posts:

  1. Threat models;
  2. White-box attacks;
  3. Black-box transfer-based attacks;
  4. Black-box score-based attacks;
  5. Black-box decision-based attacks (current post);
  6. Defence methods.

Boundary Attack

Boundary attack is one of the first methods developed for decision-based black box attacks. The algorithm is initialized from a point that is already adversarial and then performs a random walk along the boundary between the adversarial and the non-adversarial region such that it stays in the adversarial region and the distance toward the target image is reduced.

When applied to images, where each pixel is constrained to a range of [0,255], each pixel in the initial image xadv0 is sampled from a uniform distribution U(0,255). In untargeted context, are reject non-adversarial samples, while, in a targeted scenario, xadv0 can be any sample that is classified by the model as being from the target class.

Once xadv0 has been initialized, the algorithm iteratively samples a perturbation μt from the proposal distribution P following the constraints:

x_{adv}^{t-1}+\mu^t \in [0, 255], \: \: \:(1)

||\mu^t||2=\delta \cdot d(x, x{adv}^{t-1}), \: \: \:(2)

d(x, x_{adv}^{t-1})-d(x, x_{adv}^{t-1}+\mu^t)=\epsilon \cdot d(x, x_{adv}^{t-1}), \: \: \:(3)

where P is a N(0,1) distribution. To satisfy the constraints, a sample μt is first re-scaled and clipped so as to meet (1) and (2). In a second step, xadvt-1t is projected onto a sphere around the original image x such that d(x, xadvt-1)=d(x, xadvt-1t) and (1) holds.

Finally, in the last step, it is made a small movement toward the original image (3). For each iteration, the adversarial example xadvt is updated such that xadvt= xadvt-1t only if xadvt-1t is adversarial.

The only two parameters that are involved are thus δ, which characterizes the length of the total perturbation, and Ɛ, which is the length of the step toward the original input. These two parameters are adjusted dynamically according to the local geometry of the decision boundary so to find progressively smaller adversarial perturbations according to a given adversarial criterion.

The attack gives up further exploration as Ɛ converges to 0 (link).

In essence, the Boundary Attack performs rejection sampling along the boundary between adversarial and non-adversarial images

QEBA

The first introduced decision-based method, Boundary attacks, despite it can successfully find adversarial examples, however, it suffers from an exponential search time as well as a lack of convergence.

Query Efficient Boundary Attack (QEBA) method reformulates boundary attack as an optimization of continuous function g defined as

g(\theta)=\arg \min_{\nu>0}(f(x+\nu \frac{\theta}{||\theta||})\neq y),

where θ represents the search direction (note that in practice θ represents adversarial noise), f is the hard-label black-box function (classifier), and g(θ) is the distance from x to the nearest adversarial example along the direction θ.

Thus, instead of searching for an adversarial example, the algorithm searches the direction θ to minimize the distortion g(θ). To compute g(θ) given a normalized θ, it is first performed a fine-grained search by querying the points {x+αθ, x+2αθ,…} one by one until f(x+iαθ)!=y.

In the second phase, it is conducted a binary search to find the solution within that region. This procedure is used to find the initial θ0, the corresponding g0), and to evaluate subsequent gt).

The next search directions θt are instead optimized using Randomized Gradient-Free (RGF) method to estimate the gradient in each iteration as

\nabla_{\theta}=\frac{g(\theta+\beta u)-g(\theta)}{\beta}\cdot u,

where u is a random Gaussian vector and β>0 is a smoothing parameter usually set to 0.005. The solution is then updated by θt+1=θt-η∇θt. Finally, this method achieves smaller or similar distortion, using 3-4 times fewer queries compared with Boundary Attack (link).

Pipeline of QEBA. In this example, the attack goal is to obtain an adv-image that looks like a cat (target-image) but be misclassified as a fish.

Evolutionary Attack

Originally applied to fool face recognition models under the decision-based black-box scenario, Evolutionary Attack method is based on an efficient covariance matrix adaptation evolution strategy (CMA-ES) which can model the local geometries of the search directions and meanwhile reduce the dimension of the search space.

At the beginning of the algorithm, xadv0 is initialized with a sample that already satisfies the adversarial criterion. Next, a search is performed in a lower dimensional space Rm with m<n. In each iteration, it is first sampled a random vector z from N(0,σ^2C) such that z∈Rm, where C is a diagonal covariance matrix used to model the local geometries of the search directions and is defined as

p_c=(1-c_c)p_c+\sqrt{c_c(2-c_c)}\frac{z}{\sigma},

c_{ii}=(1-c_{cov})c_{ii}+c_{cov}(p_c)_i^2,

where pc∈Rm is called the evolution path as it stores the exponentially decayed successful search directions. For i=1,…,m, cii is the diagonal element of C, and (pc)i is the i-th element of pc, while cc and ccov are two fixed hyperparameters.

Then, given the fact that elements in the diagonal covariance matrix C represent the most likely coordinates of the past successful trials, in each iteration are selected a subset k (k<m) coordinates are to generate the random vector z with the probability of selecting the i-th coordinate being proportional to cii.

In this way, the search-space dimensionality can be reduced and the black-box attack convergence can be accelerated. Leaving the other coordinates set to 0, z is upscaled to the input space by bilinear interpolation so to get z‘. It is further added a bias to z‘ so sample from a biased distribution towards minimizing the distance of xadvt from the original image x defined as

z'=z'+\mu(x-x_{adv}^t),

where the strength of the bias is controlled by the hyper-parameter μ. Finally, if the new offspring xadvt+z‘ is better than its parent xadvt, the current adversarial we will update as xadvt+1=xadvt+z‘ (link).

Given a black-box model, the attackers use queries to generate adversarial examples with minimum perturbations

Hop-Skip-Jump-Attack

One of the main drawbacks of Boundary Attack is that it rejects perturbations that deviate from the target class, thus leading to query inefficiency. Conversely, in Hop-Skip-Jump-Attack perturbations are used to estimate the direction of the gradient.

During initialization, xadv0 is initialized with a sample in the target class for target attack, and with a random misclassified sample for targeted attack. Each iteration of the algorithm is divided into three parts.

In the beginning, the iterate from the last iteration, xadvt, is pushed towards the boundary via a binary search as in Boundary Attack, xadvt. Second, the gradient direction is estimated near the boundary such to reduce its variance when there is an uneven distribution of perturbed samples at the two sides of the boundary.

Third, the updating step size along the gradient direction is initialized with values decreasing via geometric progression until perturbation becomes successful and the adversarial example is updated accordingly, xadvt+1.

The next iteration starts with projecting the perturbed sample back to the boundary again, x’advt+1. The advantages of this algorithm with respect to the vanilla version of Boundary Attack are that Hop-Skip-Jump-Attack is hyperparameter-free and more query efficient.

However, one limitation of this algorithm, which is shared with many existing decision-based methods, is that they require evaluation of the target model near the boundary, which may not be effective while attacking under some environments that limit the queries near the boundary (link).

Continue on (6th part)

References

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s