RobustBench
A standardized benchmark for adversarial robustness

The goal of RobustBench is to systematically track the real progress in adversarial robustness. There are already more than 2'000 papers on this topic, but it is still unclear which approaches really work and which only lead to overestimated robustness. We start from benchmarking the \(\ell_\infty\)- and \(\ell_2\)-robustness since these are the most studied settings in the literature. We use AutoAttack, an ensemble of white-box and black-box attacks, to standardize the evaluation (for details see our paper). Additionally, we open source the RobustBench library that contains models used for the leaderboard to facilitate their usage for downstream applications.

Up-to-date leaderboard based
on 30+ recent papers

Unified access to 20+ state-of-the-art
robust models via Model Zoo

Model Zoo


Check out the available models and our Colab tutorials.
!pip install git+https://github.com/RobustBench/robustbench

from robustbench.utils import load_model
model = load_model(model_name='Carmon2019Unlabeled')

from robustbench.data import load_cifar10
x_test, y_test = load_cifar10(n_examples=100)

!pip install git+https://github.com/fra31/auto-attack
from autoattack import AutoAttack
adversary = AutoAttack(model, norm='Linf', eps=8/255)
x_adv = adversary.run_standard_evaluation(x_test, y_test)

Analysis


Check out our paper with a detailed analysis.
robustness_vs_venues

Leaderboard: CIFAR-10, \( \ell_\infty = 8/255 \), Untargeted, AutoAttack

Rank Method Standard
accuracy
Robust
accuracy
Extra
data
Architecture Venue
1 Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples
We show the robust accuracy reported in the paper since AutoAttack performs slightly worse (65.88%).
91.10% 65.87% WideResNet-70-16 arXiv, Oct 2020
2 Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples
We show the robust accuracy reported in the paper since AutoAttack performs slightly worse (62.80%).
89.48% 62.76% WideResNet-28-10 arXiv, Oct 2020
3 Do Wider Neural Networks Really Help Adversarial Robustness? 87.67% 60.65% WideResNet-34-15 arXiv, Oct 2020
4 Adversarial Weight Perturbation Helps Robust Generalization 88.25% 60.04% WideResNet-28-10 NeurIPS 2020
5 Unlabeled Data Improves Adversarial Robustness 89.69% 59.53% WideResNet-28-10 NeurIPS 2019
6 Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples
We show the robust accuracy reported in the paper since AutoAttack performs slightly worse (57.20%).
85.29% 57.14% × WideResNet-70-16 arXiv, Oct 2020
7 HYDRA: Pruning Adversarially Robust Neural Networks 88.98% 57.14% WideResNet-28-10 NeurIPS 2020
8 Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples
We show the robust accuracy reported in the paper since AutoAttack performs slightly worse (56.86%).
85.64% 56.82% × WideResNet-34-20 arXiv, Oct 2020
9 Improving Adversarial Robustness Requires Revisiting Misclassified Examples 87.50% 56.29% WideResNet-28-10 ICLR 2020
10 Adversarial Weight Perturbation Helps Robust Generalization 85.36% 56.17% × WideResNet-34-10 NeurIPS 2020
11 Are Labels Required for Improving Adversarial Robustness? 86.46% 56.03% WideResNet-28-10 NeurIPS 2019
12 Using Pre-Training Can Improve Model Robustness and Uncertainty 87.11% 54.92% WideResNet-28-10 ICML 2019
13 Bag of Tricks for Adversarial Training 86.43% 54.39% × WideResNet-34-20 arXiv, Oct 2020
14 Boosting Adversarial Training with Hypersphere Embedding 85.14% 53.74% × WideResNet-34-20 NeurIPS 2020
15 Learnable Boundary Guided Adversarial Training
Uses \(\ell_{\infty} \) = 0.031 β‰ˆ 7.9/255 instead of 8/255.
88.70% 53.57% × WideResNet-34-20 arXiv, Nov 2020
16 Attacks Which Do Not Kill Training Make Adversarial Learning Stronger 84.52% 53.51% × WideResNet-34-10 ICML 2020
17 Overfitting in adversarially robust deep learning 85.34% 53.42% × WideResNet-34-20 ICML 2020
18 Self-Adaptive Training: beyond Empirical Risk Minimization
Uses \(\ell_{\infty} \) = 0.031 β‰ˆ 7.9/255 instead of 8/255.
83.48% 53.34% × WideResNet-34-10 NeurIPS 2020
19 Theoretically Principled Trade-off between Robustness and Accuracy
Uses \(\ell_{\infty}\) = 0.031 β‰ˆ 7.9/255 instead of 8/255.
84.92% 53.08% × WideResNet-34-10 ICML 2019
20 Learnable Boundary Guided Adversarial Training
Uses \(\ell_{\infty} \) = 0.031 β‰ˆ 7.9/255 instead of 8/255.
88.22% 52.86% × WideResNet-34-10 arXiv, Nov 2020
21 Adversarial Robustness through Local Linearization
We show the robust accuracy reported in the paper since AutoAttack performs slightly worse (52.84%).
86.28% 52.81% × WideResNet-40-8 NeurIPS 2019
22 Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning
Uses ensembles of 3 models.
86.04% 51.56% × ResNet-50 CVPR 2020
23 Efficient Robust Training via Backward Smoothing 85.32% 51.12% × WideResNet-34-10 arXiv, Oct 2020
24 Improving Adversarial Robustness Through Progressive Hardening 86.84% 50.72% × WideResNet-34-10 arXiv, Mar 2020
25 Robustness library 87.03% 49.25% × ResNet-50 GitHub,
Oct 2019
26 Harnessing the Vulnerability of Latent Layers in Adversarially Trained Models 87.80% 49.12% × WideResNet-34-10 IJCAI 2019
27 Metric Learning for Adversarial Robustness 86.21% 47.41% × WideResNet-34-10 NeurIPS 2019
28 You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle 87.20% 44.83% × WideResNet-34-10 NeurIPS 2019
29 Towards Deep Learning Models Resistant to Adversarial Attacks 87.14% 44.04% × WideResNet-34-10 ICLR 2018
30 Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness 80.89% 43.48% × ResNet-32 ICLR 2020
31 Fast is better than free: Revisiting adversarial training 83.34% 43.21% × ResNet-18 ICLR 2020
32 Adversarial Training for Free! 86.11% 41.47% × WideResNet-34-10 NeurIPS 2019
33 MMA Training: Direct Input Space Margin Maximization through Adversarial Training 84.36% 41.44% × WideResNet-28-4 ICLR 2020
34 Controlling Neural Level Sets
Uses \(\ell_{\infty}\) = 0.031 β‰ˆ 7.9/255 instead of 8/255.
81.30% 40.22% × ResNet-18 NeurIPS 2019
35 Robustness via Curvature Regularization, and Vice Versa 83.11% 38.50% × ResNet-18 CVPR 2019
36 Defense Against Adversarial Attacks Using Feature Scattering-based Adversarial Training 89.98% 36.64% × WideResNet-28-10 NeurIPS 2019
37 Adversarial Interpolation Training: A Simple Approach for Improving Model Robustness 90.25% 36.45% × WideResNet-28-10 OpenReview, Sep 2019
38 Adversarial Defense via Learning to Generate Diverse Attacks 78.91% 34.95% × ResNet-20 ICCV 2019
39 Sensible adversarial learning 91.51% 34.22% × WideResNet-34-10 OpenReview, Sep 2019
40 Bilateral Adversarial Training: Towards Fast Training of More Robust Models Against Adversarial Attacks 92.80% 29.35% × WideResNet-28-10 ICCV 2019
41 Enhancing Adversarial Defense by k-Winners-Take-All
Uses \(\ell_{\infty}\) = 0.031 β‰ˆ 7.9/255 instead of 8/255.
79.28% 18.50% × DenseNet-121 ICLR 2020
42 Manifold Regularization for Adversarial Robustness 90.84% 1.35% × ResNet-18 arXiv, Mar 2020
43 Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks 89.16% 0.28% × ResNet-110 ICCV 2019
44 Jacobian Adversarially Regularized Networks for Robustness 93.79% 0.26% × WideResNet-34-10 ICLR 2020
45 ClusTR: Clustering Training for Robustness 91.03% 0.00% × WideResNet-28-10 arXiv, Jun 2020
46 Standardly trained model 94.78% 0.00% × WideResNet-28-10 N/A

Leaderboard: CIFAR-10, \( \ell_2 = 0.5 \), Untargeted, AutoAttack

Rank Method Standard
accuracy
Robust
accuracy
Extra
data
Architecture Venue
1 Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples 94.74% 80.53% WideResNet-70-16 arXiv, Oct 2020
2 Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples 90.90% 74.50% × WideResNet-70-16 arXiv, Oct 2020
3 Adversarial Weight Perturbation Helps Robust Generalization 88.51% 73.66% × WideResNet-34-10 NeurIPS 2020
4 Adversarial Robustness on In- and Out-Distribution Improves Explainability 91.08% 72.91% ResNet-50 ECCV 2020
5 Robustness library 90.83% 69.24% × ResNet-50 GitHub,
Sep 2019
6 Overfitting in adversarially robust deep learning 88.67% 67.68% × ResNet-18 ICML 2020
7 Decoupling Direction and Norm for Efficient Gradient-Based L2 Adversarial Attacks and Defenses 89.05% 66.44% × WideResNet-28-10 CVPR 2019
8 MMA Training: Direct Input Space Margin Maximization through Adversarial Training 88.02% 66.09% × WideResNet-28-4 ICLR 2020
9 Standardly trained model 94.78% 0.00% × WideResNet-28-10 N/A

FAQ

➤ Wait, how does this leaderboard differ from the AutoAttack leaderboard? πŸ€”
The AutoAttack leaderboard is maintained simultaneously with the RobustBench L2 / Linf leaderboards by Francesco Croce, and all the changes to either of them will be synchronized (given that the 3 restrictions on the models are met for the RobustBench leaderboard). One can see the current L2 / Linf RobustBench leaderboard as a continuously updated fork of the AutoAttack leaderboard extended by adaptive evaluations, Model Zoo, and clear restrictions on the models we accept. And in the future, we will extend RobustBench with other threat models and potentially with a different standardized attack if it's shown to perform better than AutoAttack.

➤ Wait, how is it different from robust-ml.org? πŸ€”
robust-ml.org focuses on adaptive evaluations, but we provide a standardized benchmark. Adaptive evaluations are great (e.g., see Tramer et al., 2020), but very time-consuming and cannot be standardized. Instead, we argue that one can estimate robustness accurately without adaptive attacks but for this one has to introduce some restrictions on the considered models. See our paper for more details.

➤ How is it related to libraries like foolbox / cleverhans / advertorch? πŸ€”
These libraries provide implementations of different attacks. Besides the standardized benchmark, RobustBench additionally provides a repository of the most robust models. So you can start using the robust models in one line of code (see the tutorial here).

➤ Why is Lp-robustness still interesting in 2020? πŸ€”
There are numerous interesting applications of Lp-robustness that span transfer learning (Salman et al. (2020), Utrera et al. (2020)), interpretability (Tsipras et al. (2018), Kaur et al. (2019), Engstrom et al. (2019)), security (Tramèr et al. (2018), Saadatpanah et al. (2019)), generalization (Xie et al. (2019), Zhu et al. (2019), Bochkovskiy et al. (2020)), robustness to unseen perturbations (Xie et al. (2019), Kang et al. (2019)), stabilization of GAN training (Zhong et al. (2020)).

➤ Does this benchmark only focus on Lp-robustness? πŸ€”
Lp-robustness is the most well-studied area, so we focus on it first. However, in the future, we plan to extend the benchmark to other perturbations sets beyond Lp-balls.

➤ What about verified adversarial robustness? πŸ€”
We specifically focus on defenses which improve empirical robustness, given the lack of clarity regarding which approaches really improve robustness and which only make some particular attacks unsuccessful. For methods targeting verified robustness, we encourage the readers to check out Salman et al. (2019) and Li et al. (2020).

➤ What if I have a better attack than the one used in this benchmark? πŸ€”
We will be happy to add a better attack or any adaptive evaluation that would complement our default standardized attacks.

Citation

Consider citing our whitepaper if you want to reference our leaderboard or if you are using the models from the Model Zoo:
@article{croce2020robustbench,
    title={RobustBench: a standardized adversarial robustness benchmark},
    author={Croce, Francesco and Andriushchenko, Maksym and Sehwag, Vikash and Flammarion, Nicolas
    and Chiang, Mung and Mittal, Prateek and Matthias Hein},
    journal={arXiv preprint arXiv:2010.09670},
    year={2020}
}

Contribute to RobustBench!


We welcome any contribution in terms of both new robust models and evaluations. Please check here for more details.

Feel free to contact us at adversarial.benchmark@gmail.com