We have already discussed whether machines can think in a previous post. Now, the next question I would like to propose is: “Can machines dream?”.
After reading that, you may think that I got completely crazy, that I lost my head somewhere in the cosmos and came back with new extravagant ideas.
Well…Actually, you are not so wrong. My head, while navigating among the deepest layers of a particular neural network, got stuck in finding what were they dreaming.
And found the answer: Yes, machines can really dream. Here I would like to show some of their dreams I traveled in.
Do you want to keep dreaming?
If yes, then click here to travel inside more machine’s dreams as I did, but be careful though, you are going to be catapulted in a very strange world. It might not be easy to get out…
In the next part of the post, we are leaving the deep dream world to get back to the real world and try to understand what it goes under the hood.
Welcome to the convolutional network’s world.
Convolutional neural networks
Convolutional neural networks (CNN) are some kind of artificial neural networks typically used for image classification, that is, given one image as input, their output is the label associated with the image. Basically, their task consists in recognizing which object is depicted in the input picture. To make this possible, these networks are trained with millions of images called training samples, and the parameters of the network are adjusted until it doesn’t achieve a satisfactory rate of images correctly classified.
The main difference between CNNs and traditional neural networks is that the formers are formed by a certain number of convolutional layers stacked on top of each other instead of normal fully connected layers. Convolutional layers themselves are just a collection of filters, where, each of the filter in the lower layers extracts low-level features from the input image such as edges or corners, while, the higher-level ones compose these basic features to extract higher-level structures like persons or cars. Finally, the final layer assembles these features to get a complete interpretation of the image and activate the right neurons to classify the image with the appropriate label.
As an example, the lowest layers might detect edges and segments, next, some intermediate levels layers can put these shapes together in order to recognize eyes, mouths, or noses. Subsequently, higher-level layers may assemble these intermediate-level features to recognize faces and the output layer will try to guess who these faces belong on top on the extracted features which form a features map.
What do they actually learn?
What we want to know is what exactly do these layers and filters learn? One easy way to do this consists of simply feeding the network with an arbitrary image and just let the model analyze it. Afterward, we choose a layer or a filter and iteratively enhance its output, this will make the network recognize the enhanced patterns more strongly and gradually draw them out from nowhere. For example, if we choose to enhance low-level layers, strokes, or another simple very abstract pattern will tend to emerge. Conversely, by enhancing high-level layers that have been trained to identify complex patterns, we will see that birds, dogs, faces, and cars will magically begin populating the output image.
Can machines dream?
The results are amazing, a simple neural network designed by humans can reinterpret an image with its own feature. Just like what happens in a dream, we enter into an abstract world where our thoughts interact with real scenery we saw during our life and modify it in a way we can’t control them. We can say that layers’ filters generate machines’ thoughts and turn the original input image into the network’s dream. Then yes, machines can dream.
This network was trained mostly on images of animals taken from the ImageNet dataset, so naturally, it tends to interpret shapes as animals. But because the data is stored at such a high abstraction, the results are an interesting remix of these learned features.
We can apply this algorithm iteratively, zoom into the image after each iteration, or even enhancing two or more filters or layers together in order to get an endless number of new unbelievable impressions able to overtake even the creativity of the greatest artists of the modern age. We have just to try different combinations and explore what can the network dream and what its thoughts are. We can even feed the network with a pure noisy or random image to give more freedom to its creativity and let the network draw its dreams from scratch.
Now we are going to be catapulted into a new world, the computer science world. But don’t worry it will just be a gentle introduction.
The key to creating the deep dream images is quite simple. What we have do is just a gradient ascent process on a particular layer or filter of the network in order to maximize the value of its activation. In this case, the gradient expresses how much we have to modify each pixel of the image to enhance the pattern it has been trained to identify. That’s why after each iteration the enhanced patterns will result to be gradually more visible. Basically, by summing the gradient of the k-th iteration to the image in the k-th iteration, we gradually draw the represented pattern which will turn to be bit by bit more visible with successive iterations. Of course, there are also a few tricks to enhance the overall result such as applying gradient ascent across multiple scales or normalizing the magnitude of the gradient ascent steps.
The full code is available on my Github repository, it contains a convolutional model already trained with ImageNet and the code to generate new images by enhancing some of its layers or filters:
More machine-generated images can be found here: