[024] Latest generative models

Posted on July 08, 2018

> Relativistic (average) GAN/RGAN/RaGAN: The relativistic discriminator: a key element missing from standard GAN

This is an interesting work that just gets published on arxiv. Also, the author posts a blog to give a quick review of his paper.

According to this paper, previous framework of GAN has two drawbacks: (1) the discriminator cannot utilize the fact that half of the data being examined is fake; (2) the generator can only make the fake ones look like the real ones, but it cannot make the real ones look faker than the fake one. The second drawback acts like a limitation to the optimal solution of the generator: “it can at best generate as realistic as the given one”. But, if possible, we want the generator to output more confusing sample that can surpass the real one (given a same discriminator).

Previously, the discriminator is limited to output a single probability for a single sample, like D(C(x)), where D(\cdot) is usually an activation function with range from 0 to 1 and C(\cdot) is a score for the sample x. In this paper, the authors switch it with D(C(x_r)-C(x_f))=\text{sigmoid}(C(x_r)-C(x_f)), which means the discriminator compares two samples to make a judgement. 

With this new design, (1) the discriminator can really “compare” two samples simultaneously rather than assigning probability independently; (2) the generator can make the score of the fake data C(x_f) higher than the one of the real data C(x_r) rather than making the probability D(C(x_f)) approaching 1.

To avoid the randomness when sampling pairs, the authors purpose an averaged version: the original RGAN loss D(C(x_r)-C(x_f))  is replaced by  D(C(x_r)-\overline{C(x_f)}), where \overline{C(x_f)} is an average score of the real fake data. Please refer to the original paper for more details.

Based on their description, RGAN or RaGAN (when mixed with other techniques) can (1) achieve higher stability, (2) use less update iterations of the discriminator, and (3) generate better quality using limited samples.

TL;DR: Make the single-sample judgment D(C(x)) pairwise D(C(x_r)-C(x_f)) ,which works better with an average version D(C(x_r)-\overline{C(x_f)}) .


> Glow: Better Reversible Generative Models