Bluff: Interactively Deciphering Adversarial Attacks on Deep Neural Networks

Nilaksh Das*

Haekyu Park*

Duen Horng (Polo) Chau

( * Authors contributed equally )

IEEE Visualization Conference (VIS), 2020

crown jewel figure — With Bluff, users interactively visualize how adversarial attacks penetrate a deep neural network to induce incorrect outcomes. Here, a user inspects why Inception V1 misclassifies adversarial giant panda images, crafted by the Projected Gradient Descent (PGD) attack, as armadillo. PGD successfully perturbed pixels to induce the “brown bird” feature, an appearance more likely shared by an armadillo (small, roundish, brown body) than a panda, activating more features that contribute to the armadillo (mis)classification (e.g., “scales,” “bumps,” “mesh”). The adversarial pathways, formed by these neurons and their connections, overwhelm the benign panda pathways and lead to the ultimate misclassification. (A) Control Side bar allows users to specify what data is to be included and highlighted. (B) Graph Summary View visualizes pathways most activated or changed by an attack as a network graph of neurons (each labeled by the channel ID in its layer) and their connections. When hovering over a neuron, (C) Detail View displays its feature visualization, representative dataset examples, and activation patterns over attack strengths.

Abstract

Deep neural networks (DNNs) are now commonly used in many domains. However, they are vulnerable to adversarial attacks: carefully crafted perturbations on data inputs that can fool a model into making incorrect predictions. Despite significant research on developing DNN attack and defense techniques, people still lack an understanding of how such attacks penetrate a model's internals. We present Bluff, an interactive system for visualizing, characterizing, and deciphering adversarial attacks on vision-based neural networks. Bluff allows people to flexibly visualize and compare the activation pathways for benign and attacked images, revealing mechanisms that adversarial attacks employ to inflict harm on a model. Bluff is open-sourced and runs in modern web browsers.

Citation