Imagine a scenario where a forger attempts to produce fake currencies and the policeman has to try to distinguish those fake currencies from the real ones. At the beginning, both don’t have much experience, the forger will just come with a piece of paper with a dollar bill scribbled on it. Obviously, is that is a fake currency, but the unexperienced policeman still will struggle to figure out whether it is actually fake or not. After the first iteration, both two will learn how to improve in their own task thanks to the experience acquired in the previous iteration: if the policemen detected the fake currency, then the forger will learn how to make fake money more similar to the real ones. Otherwise, the policeman will develop new more sophisticated techniques to correctly classify fake and real currencies and so on.

This is equivalent to a zero-sum game where both parts have to improve their own strategy in order to succeed in their task and win over their opponent. Eventually, this competition will drive the forger to coin new currencies which are almost indistinguishable from the real ones and policemen will improve his detection methods until no fake currency can trick him.

What if we replace the forger with a generative model and the police with a classification model? This is analogous to how generative adversarial networks (GAN) work. The generative model is matched against a discriminative model whose task is to learn to determine whether a sample is from the model distribution (sample generated by the generative model) or comes from the data distribution (it’s a real sample). This is why this kind of models are also known as adversarial networks: two networks have to compete with each other in order to maximize their reward.

### Generative models

What I cannot create, I do not understand

Richard Feynman

Generative models, such as VAEs, are some kind of networks whose goal is to generate new samples from a distribution that is as similar as possible to the distribution of the original data using unsupervised learning techniques. The first step consists in collecting a certain amount of data and then, the model is trained to find certain parameters W defining a distribution resembling the distribution of the real data and eventually generating new samples extracted from it. Thus, generative model’s objective is to find some parameters W such that the distance between the estimated distribution and the distribution of the actual data is minimized.

One of their useful application consists in dealing with unbalanced datasets. Imagine a binary labeled dataset composed of 95% of positive labels and only the remaining 5% labels marked as negative. This is the typical case we can find in the anomaly detection field where, under normal conditions, anomalies are much less frequent than normal events such as player’s departure prediction in online video games or system anomalies detection. To relive this negative effect, we can use GAN to generate new samples, which have not been seen before, relative to the less frequent labels until negative and positive samples don’t have comparable size.

The goal of the generator is to learn the distribution pg over the target data x by mapping an input noise pz(z) to a new sample x‘ generated through the differentiable function G(z;θg) represented by the generative model G with parameters θg. On the other hand, the discriminator D(x;θd) is a model which outputs a single scalar value D(x) representing the probability that x came from the real data rather than from the distribution pg. Therefore, we simultaneously train D to maximize the probability of assigning the correct label to both training examples and samples from the generative model and train the generator to minimize the probability that D correctly detects a generated sample by the G. Overall, the loss function is the following minimax equation:

$min_G \, max_D V(D,G)=E_{x\sim p_{data(x)}}[logD(x)]+ E_{z\sim p_{z(z)}}[log(1-D(G(z)))]$

From the above function we can clearly observe that for correct predictions of the discriminator both terms tend toward log1 which is equal to 0. Conversely, when the generator successfully deceives the discriminator, the second term tends toward negative values by approaching close to log0, thus minimizing the value of the function.

Since in the early stages of the training phase G would be a very pool model with random parameters, D would detect fakes samples with a very high conﬁdence because the sample generated by G would be clearly different from the ones in the training data. Thus, the term log(1 − D(G(z))) would saturate since D(G(z)) will get every time very close to 0. To overcome this problem, rather than training G to minimize log(1−D(G(z))) we can train G to maximize logD(G(z)). This objective function has been proven to provide much stronger gradients early in learning.

### Training phase

During training, we alternate two phases: one in which we will train the discriminator and in the other one we will train the generator. Following, are listed the three steps involved in the first phase to update the weights of the discriminator:

• Sample a minibatch of m noise samples {z(1),…,z(m)} from the noise prior pg(z).
• Sample a minibatch of m examples {x(1),…,x(m)} from data generating distribution pdata(x).
• Update the discriminator by its stochastic gradient ascent:

$\nabla_{\theta_d} \frac{1}{m}\sum_{i=1}^{m}[logD(x^{(i)})+log(1-D(G(z^{(i)}))]$

Then, we alternate the process with the generative model training phase:

• Sample a minibatch of m noise samples {z(1),…,z(m)} from noise prior pg(z).
• Update the generator by descending its stochastic gradient:

 $\nabla_{\theta_g} \frac{1}{m}\sum_{i=1}^{m}log(1-D(G(z^{(i)}))$

As for the gradient update rules, we can use any standard gradient-based learning rule such as adam or SDG.

Ideally, we would like to train the GAN model until the discriminator D can no longer differentiate between real and fake samples because the estimated distribution is almost indistinguishable from the actual data distribution, that is, pg=pdata. At this point, D will output a 50% probability for every sample it encounters.

### Interesting projects using GAN

The next section will present some real applications of GAN models to both understand what kind of problems are GANs capable to deal and to give an idea of how powerful applications can be developed thanks to the contribution of artificial intelligence.

### U-GAT-IT (Face to Anime)

The adversarial network U-GAT-IT developed by some South-Korean developers in 2019 manages to convert real people’s faces into japanese Anime. More examples can be found in this gallery.

### FaceAPP (Face translation)

FaceApp is a face editing app that allows the user to edit its own face or someone else’s face to make it appear older or younger than their actual age. This App uses generative adversarial networks to train its program to create specific categories of realistic images. It then transfers the features to the photo uploaded by the user, giving it the desired category of the filter.

### AnimeGAN

AnimeGAN is a kind of deep convolutional GAN (DCGAN) trained to draw from scratch anime faces and it improves over time gradually generating faces resembling the ones drawn by japanese artists. A post about this project can be found on this link.