Massif: Interactive Interpretation of Adversarial Attacks on Deep Learning

Nilaksh Das*

Haekyu Park*

Duen Horng (Polo) Chau

( * Authors contributed equally )

Extended Abstracts on ACM Human Factors in Computing Systems (CHI), 2020

crown jewel figure — The MASSIF interface. A user Hailey is studying the targeted Fast Gradient Method (FGM) attack performed on the InceptionV1 model. Using the control panel (A), she selects “giant panda” as the benign class and “armadillo” as the attack target class. MASSIF generates an attribution graph (B), which shows Hailey the neurons within the network that are suppressed in the attacked images (B1, blue), shared by both benign and attacked images (B2, purple), and emphasized only in the attacked images (B3, orange). Each neuron is represented by a node and its feature visualization (C). Hovering over any neuron displays example dataset patches that maximally activate the neuron, providing stronger evidence for what a neuron has learned to detect. Hovering over a neuron also highlights its most influential connections from the previous layer (D), allowing Hailey to determine where in the network the prediction diverges from the benign class to the attacked class.

Abstract

Deep neural networks (DNNs) are increasingly powering high-stakes applications such as autonomous cars and healthcare; however, DNNs are often treated as "black boxes" in such applications. Recent research has also revealed that DNNs are highly vulnerable to adversarial attacks, raising serious concerns over deploying DNNs in the real world. To overcome these deficiencies, we are developing Massif, an interactive tool for deciphering adversarial attacks. Massif identifies and interactively visualizes neurons and their connections inside a DNN that are strongly activated or suppressed by an adversarial attack. Massif provides both a high-level, interpretable overview of the effect of an attack on a DNN, and a low-level, detailed description of the affected neurons. Massif's tightly coupled views help people better understand which input features are most vulnerable and important for correct predictions.

Citation