This paper and their previous work Deep networks are easily fooled: high confidence predictions for unrecognizable images are good material if you are interested in adversarial samples or visualization of CNN.
I removed my summary of the paper because I found the author’s blog does a good job explaining their work.
[DR017]: Starting from recognition, nowadays computer vision community aiming to tackle more complex computer vision tasks, like one-shot learning or image understanding. Especially, since deep learning is the most popular but the most mysterious method dominating the computer vision community, researchers are eager to analyze its mechanism. Visualizing what is learned in CNN is just a baby step towards understanding the mechanism of CNN. Although we can visualize the meaning of some neurons, however how CNN abstract that and how can we improve the learning progress still remain unknown.
Combining the development of adversarial samples, it would be interesting to simultaneously visualize the neuron activation of the original image and the tampered one. If the attacking procedure is continuous, or we can interpolate them to produce the intermediate images, then we can watch how the neurons are being attacked.
From another perspective, it is the correct way to depict human’s ability to imagine objects? Finding the most activated image w.r.t. a certain category can generate an image that looks like:
However, the encoding procedure should be irreversible since it has to be invariant or tolerant to intra-class variability, lighting, scale, in-plane and in-depth rotation, background clutter, etc to generalize the learned knowledge. But now we can to some extent reverse this procedure by maximizing the classification probability. There should be some connections between the generalization and the reversibility of CNN.
Speaking of adversarial samples, if we can find the most activated image, why don’t we use it to locate the area in image space that is more “solid”, namely harder to be attacked. Can it be a new way to defend the attacking? Another method is to visualize the neuron activations of those attacked images and find out what possible patterns they may obey. In that way, maybe we can defend not by solidify the image but proposing a new classification protocol where both the classification confidence and the inner activation are taken into consideration.