One major difference between human and machine learning is the way we retain important aspects of our knowledge, as we gather more data. All throughout our life, we keep enforcing those concepts/facts which help us most in our day-to-day activities. ML algorithms are much less selective – if you train a NN to perform task A and then retrain it to perform task B, the parameters for A will be forgotten ‘uniformly’.
- While training for task A, the algorithm measures the ‘importance‘ of a parameter as the gradient of the L2 norm of the output-error. In essence, this quantifies the absolute change in output for a small change in the param-value.
- Now, while re-training for a different task, the importance values computed in the above step are used as regularization parameters. This penalizes changes to the important params for task A, even while training for task B – as a result, the NN still performs decently at task A even after being re-trained.
This article gives a very good overview of distributed-computing mechanisms in TensorFlow. The main problem being tackled is the sharing of parameters/variables across different machines. Some take-aways:
- tf.Session, by itself, is like an isolated execution engine.
- However, multiple sessions can be made to share variables using tf.train.Servers, which are grouped together into ‘clusters’. All servers in the same cluster share variable values (this is done using namespaces).
- With tf.device, if you have multiple devices each with their own process, you can choose which one holds the original copy of a Variable.
- Each server is responsible for building its own Graph though – the elements of the graph can include process-specific params, as well as global ones shared across servers.
The post gives multiple small examples, as well as one cumulative piece detailed multiple feature – do take a look!
Another good paper from NIPS2017. The problem addressed here is that of unsupervised image-to-image translation, also shortened as UNIT. Consider the conversion of a street-photo in sunny weather, to the same street on a rainy day.
If it was supervised, we would have pairs of photos of the same streets, in sunny & rainy weather. But such data is hard to come by, especially in the quantities needed for deep learning. So what if you just have a bunch of sunny street photos, and a set of rainy ones? (with no ‘common’ street). This is basically the problem being solved here.
Nvidia uses a VAE-GAN for this purpose, with some twists. Consider the sunny/rainy example from above:
- The raw image is first converted to a latent vector by a Variational Autoencoder.
- This latent vector is then given as input to the Generative part of a GAN.
The twist is that the last few layers of the VAE, and the first few layers of the GAN are shared by both sunny-to-rainy and rainy-to-sunny networks. Why so?
You can intuitively see that the VAE is converting the raw pixels into a vector encoding basic attributes of the street – irrespective of the weather. While the first few layers of the VAE deal with raw pixel data, the higher ones understand abstract street-attributes (which are weather-independent). As a result, the latter layers get shared.
The same logic is applied (but in reverse), in keeping the first few layers of the Generative network common.
Heres a video of the method to give you a taste of the results obtained. They are surprisingly good!
A Redditor going by the name of ‘deepfake’ uses Tensorflow-based deep learning to paste celebrity faces onto pornstar bodies in videos. While the results are not perfect, they are good enough to cause concerns over consent.
(To give you an example of the progress that has made in video manipulation, take a look at Face2Face – They use a video of some celebrity, and combine it with actions by a user on their live feed, to generate a video of the celebrity doing the same.).