In the previous post, we reviewed a series of black-box score-based adversarial attacks where the adversary has to estimate the gradient by querying the target model and retrieving the labels’ confidence score.

In this post, we are going to explore the third category of black-box attacks, namely, black-box decision-based attacks. Under these settings, the only knowledge the attacker has about the model is only discrete hard-label predictions.

Compared to score-based or transfer-based attacks, decision-based attacks are much more relevant in a real-world scenario where confidence scores are usually not available.

This is a very challenging problem since the direct extension of state-of-the-art white-box attacks to the hard-label black-box setting will require minimizing a non-continuous step function, which is combinatorial and cannot be solved by a gradient-based optimizer.

More specifically, we are going to learn about the following attacks:

- Boundary Attack;
- QEBA;
- Evolutionary Attack;
- Hop-Skip-Jump-Attack.

### Overview

This post is part of a **collection** including the following 6 posts:

- Threat models;
- White-box attacks;
- Black-box transfer-based attacks;
- Black-box score-based attacks;
- Black-box decision-based attacks (current post);
- Defence methods.

### Boundary Attack

**Boundary attack** is one of the first methods developed for decision-based black box attacks. The algorithm is initialized from a point that is already adversarial and then performs a random walk along the boundary between the adversarial and the non-adversarial region such that it stays in the adversarial region and the distance toward the target image is reduced.

When applied to images, where each pixel is constrained to a range of [0,255], each pixel in the initial image xadv0 is sampled from a uniform distribution U(0,255). In untargeted context, are reject non-adversarial samples, while, in a targeted scenario, *x*_{adv}^{0} can be any sample that is classified by the model as being from the target class.

Once *x*_{adv}^{0} has been initialized, the algorithm iteratively samples a perturbation μ^{t} from the proposal distribution P following the constraints:

where P is a N(0,1) distribution. To satisfy the constraints, a sample μ^{t} is first re-scaled and clipped so as to meet (1) and (2). In a second step, *x*_{adv}^{t-1}+μ^{t} is projected onto a sphere around the original image *x* such that *d*(*x*, * x_{adv}^{t-1}*)=

*d*(x,

*x*

_{adv}

^{t-1}+μ

^{t}) and (1) holds.

Finally, in the last step, it is made a small movement toward the original image (3). For each iteration, the adversarial example *x*_{adv}^{t} is updated such that *x*_{adv}^{t}= *x*_{adv}^{t-1}+μ^{t} only if *x*_{adv}^{t-1}+μ^{t} is adversarial.

The only two parameters that are involved are thus δ, which characterizes the length of the total perturbation, and Ɛ, which is the length of the step toward the original input. These two parameters are adjusted dynamically according to the local geometry of the decision boundary so to find progressively smaller adversarial perturbations according to a given adversarial criterion.

The attack gives up further exploration as Ɛ converges to 0 (link).

### QEBA

The first introduced decision-based method, Boundary attacks, despite it can successfully find adversarial examples, however, it suffers from an exponential search time as well as a lack of convergence.

Query Efficient Boundary Attack (**QEBA**) method reformulates boundary attack as an optimization of continuous function *g* defined as

where θ represents the search direction (note that in practice θ represents adversarial noise), *f* is the hard-label black-box function (classifier), and *g*(θ) is the distance from *x* to the nearest adversarial example along the direction θ.

Thus, instead of searching for an adversarial example, the algorithm searches the direction θ to minimize the distortion g(θ). To compute g(θ) given a normalized θ, it is first performed a fine-grained search by querying the points {*x*+αθ, *x*+2αθ,…} one by one until *f*(*x*+*i*αθ)!=*y*.

In the second phase, it is conducted a binary search to find the solution within that region. This procedure is used to find the initial θ^{0}, the corresponding *g*(θ^{0}), and to evaluate subsequent *g*(θ^{t}).

The next search directions θ^{t} are instead optimized using Randomized Gradient-Free (RGF) method to estimate the gradient in each iteration as

where *u* is a random Gaussian vector and β>0 is a smoothing parameter usually set to 0.005. The solution is then updated by θt+1=θt-η∇θt. Finally, this method achieves smaller or similar distortion, using 3-4 times fewer queries compared with Boundary Attack (link).

### Evolutionary Attack

Originally applied to fool face recognition models under the decision-based black-box scenario, **Evolutionary Attack **method is based on an efficient covariance matrix adaptation evolution strategy (CMA-ES) which can model the local geometries of the search directions and meanwhile reduce the dimension of the search space.

At the beginning of the algorithm, *x*_{adv}^{0} is initialized with a sample that already satisfies the adversarial criterion. Next, a search is performed in a lower dimensional space R^{m} with *m*<*n*. In each iteration, it is first sampled a random vector *z* from N(0,σ^_{2}C) such that *z*∈R^{m}, where C is a diagonal covariance matrix used to model the local geometries of the search directions and is defined as

where *p*^{c}∈R^{m} is called the evolution path as it stores the exponentially decayed successful search directions. For i=1,…,m, cii is the diagonal element of C, and (pc)i is the i-th element of p_{c}, while c_{c} and c_{cov} are two fixed hyperparameters.

Then, given the fact that elements in the diagonal covariance matrix C represent the most likely coordinates of the past successful trials, in each iteration are selected a subset *k* (*k<m*) coordinates are to generate the random vector *z* with the probability of selecting the *i*-th coordinate being proportional to *c*_{ii}.

In this way, the search-space dimensionality can be reduced and the black-box attack convergence can be accelerated. Leaving the other coordinates set to 0, *z* is upscaled to the input space by bilinear interpolation so to get *z*‘. It is further added a bias to *z*‘ so sample from a biased distribution towards minimizing the distance of *x*_{adv}^{t} from the original image *x* defined as

where the strength of the bias is controlled by the hyper-parameter μ. Finally, if the new offspring *x*_{adv}^{t}+*z*‘ is better than its parent *x*_{adv}^{t}, the current adversarial we will update as *x*_{adv}^{t+1}=*x*_{adv}^{t}+*z*‘ (link).

### Hop-Skip-Jump-Attack

One of the main drawbacks of Boundary Attack is that it rejects perturbations that deviate from the target class, thus leading to query inefficiency. Conversely, in **Hop-Skip-Jump-Attack** perturbations are used to estimate the direction of the gradient.

During initialization, *x*^{‘}_{adv}^{0} is initialized with a sample in the target class for target attack, and with a random misclassified sample for targeted attack. Each iteration of the algorithm is divided into three parts.

In the beginning, the iterate from the last iteration, *x*^{‘}_{adv}^{t}, is pushed towards the boundary via a binary search as in Boundary Attack, *x*_{adv}^{t}. Second, the gradient direction is estimated near the boundary such to reduce its variance when there is an uneven distribution of perturbed samples at the two sides of the boundary.

Third, the updating step size along the gradient direction is initialized with values decreasing via geometric progression until perturbation becomes successful and the adversarial example is updated accordingly, *x*_{adv}^{t+1}.

The next iteration starts with projecting the perturbed sample back to the boundary again, x’advt+1. The advantages of this algorithm with respect to the vanilla version of Boundary Attack are that Hop-Skip-Jump-Attack is hyperparameter-free and more query efficient.

However, one limitation of this algorithm, which is shared with many existing decision-based methods, is that they require evaluation of the target model near the boundary, which may not be effective while attacking under some environments that limit the queries near the boundary (link).