Weekly Review: 12/16/2017

Hebbian Learning in Neural Networks

One major difference between human and machine learning is the way we retain important aspects of our knowledge, as we gather more data. All throughout our life, we keep enforcing those concepts/facts which help us most in our day-to-day activities. ML algorithms are much less selective – if you train a NN to perform task A and then retrain it to perform task B, the parameters for A will be forgotten ‘uniformly’.

This paper tries to apply the Hebbian-learning principle of ‘Neurons that fire together, wire together’ to NNs. Its done as follows:

  1. While training for task A, the algorithm measures the ‘importance‘ of a parameter as the gradient of the L2 norm of the output-error. In essence, this quantifies the absolute change in output for a small change in the param-value.
  2. Now, while re-training for a different task, the importance values computed in the above step are used as regularization parameters. This penalizes changes to the important params for task A, even while training for task B – as a result, the NN still performs decently at task A even after being re-trained.

Distributed Tensorflow

This article gives a very good overview of distributed-computing mechanisms in TensorFlow. The main problem being tackled is the sharing of parameters/variables across different machines. Some take-aways:

  1. tf.Session, by itself, is like an isolated execution engine.
  2. However, multiple sessions can be made to share variables using tf.train.Servers, which are grouped together into ‘clusters’. All servers in the same cluster share variable values (this is done using namespaces).
  3. With tf.device, if you have multiple devices each with their own process, you can choose which one holds the original copy of a Variable.
  4. Each server is responsible for building its own Graph though – the elements of the graph can include process-specific params, as well as global ones shared across servers.

The post gives multiple small examples, as well as one cumulative piece detailed multiple feature – do take a look!

UNsupervised Image-to-Image Translation by Nvidia

Another good paper from NIPS2017. The problem addressed here is that of unsupervised image-to-image translation, also shortened as UNIT. Consider the conversion of a street-photo in sunny weather, to the same street on a rainy day.

If it was supervised, we would have pairs of photos of the same streets, in sunny & rainy weather. But such data is hard to come by, especially in the quantities needed for deep learning. So what if you just have a bunch of sunny street photos, and a set of rainy ones? (with no ‘common’ street). This is basically the problem being solved here.

Nvidia uses a VAE-GAN for this purpose, with some twists. Consider the sunny/rainy example from above:

  1. The raw image is first converted to a latent vector by a Variational Autoencoder.
  2. This latent vector is then given as input to the Generative part of a GAN.

The twist is that the last few layers of the VAE, and the first few layers of the GAN are shared by both sunny-to-rainy and rainy-to-sunny networks. Why so?

You can intuitively see that the VAE is converting the raw pixels into a vector encoding basic attributes of the street – irrespective of the weather. While the first few layers of the VAE deal with raw pixel data, the higher ones understand abstract street-attributes (which are weather-independent). As a result, the latter layers get shared.

The same logic is applied (but in reverse), in keeping the first few layers of the Generative network common.

Heres a video of the method to give you a taste of the results obtained. They are surprisingly good!

Fake AI porn (NSFW)

A Redditor going by the name of ‘deepfake’ uses Tensorflow-based deep learning to paste celebrity faces onto pornstar bodies in videos. While the results are not perfect, they are good enough to cause concerns over consent.

(To give you an example of the progress that has made in video manipulation, take a look at Face2Face – They use a video of some celebrity, and combine it with actions by a user on their live feed, to generate a video of the celebrity doing the same.).

Simple Turing Machine using TensorFlow

A post after many days! This one describes a simple code I wrote to simulate a Deterministic Turing Machine using TensorFlow. Its pretty basic with respect to the design, using TensorFlow only for the state change computations. The movement on the tape, and handling of halts is taken care of by old-school Python. This is not intended to be a practical usage of TensorFlow, its just something I did and am writing about here :-).

To demonstrate the code, I will use the 3-state Busy Beaver example from the Wiki page.

Encoding the Problem

Each of the states are encoded as real numbers (integers in this case, but for whatever reasons you could use floats too). So ‘A’, ‘B’, ‘C’ and ‘HALT’ become 0, 1, 2 and 3 respectively. Accordingly, the initial and final states are defined as follows:

initial_state = 0
final_states = [3]

Since there can be multiple final states generally, final_states is defined as a list. As with the states, the potential symbols are numbers too. The blank symbol, as per the Busy Beaver problem definition, is 0.

blank_input = 0

Now we come to the transition function. It is defined using two 2-D NumPy matrices, one for the inputs and the other for the corresponding outputs.

import numpy as np

transition_input = np.array([[0, 0],
                             [0, 1],
                             [1, 0],
                             [1, 1],
                             [2, 0],
                             [2, 1]])

transition_output = np.array([[1, 1, 1],
                              [2, 1, -1],
                              [0, 1, -1],
                              [1, 1, 1],
                              [1, 1, -1],
                              [3, 1, 1]])

Consider the transition_input[2] and transition_output[2]. The input denotes [current state, current symbol]. The output denotes [next state, next symbol, movement]. The movement value can either be 1(move pointer right, or increment index value) or -1(move pointer left, or decrement index value). Accordingly, transition_input[2] requires the current state to be 1 and current symbol being read to be 0. If these conditions are met, then the next state and next symbol are 0 and 1 respectively, with the pointer on the tape moving one step to the left.

The Turing Machine Code

The actual Turing Machine code follows. Its well documented as always, so you shouldn’t have a problem following whats going on if you are well-acquainted with Turing Machines and TensorFlow.

import tensorflow as tf


class TuringMachine(object):
    """
    Implementation of a basic Turing Machine using TensorFlow.
    """

    def __init__(self, initial_state, final_states, blank_input,
                 transition_input, transition_output):
        """
        Initializes the TensorFlow Graph required for running the Turing
        Machine.

        States and symbols should both be encoded as real numbers prior to
        initialization. Accordingly, 'initial_state' should be a float
        denoting the starting state. 'final_states' should be an iterable
        of numbers, denoting all the states the Machine can halt at.
        'blank_input' should be a float denoting the blank input for the
        Machine.
        'transition_input' and 'transition_output' will together denote
        the transition function followed by the Machine. Both should be
        2-D NumPy matrices. Each inner vector in 'transition_input' will
        contain [current state, current symbol], while each inner vector
        in 'transition_output' will have [next state, next symbol, move].
        "move" can either be 1: to increase pointer index by 1 (move
        right), or -1: decrease pointer index by 1 (move left).

        """
        
        #Store the necessary information as attributes
        self._initial_state = float(initial_state)
        self._final_states = set(final_states)
        self._blank_input = blank_input

        #Initialize Graph
        self._graph = tf.Graph()

        n_transitions = len(transition_input)
        if len(transition_input) != len(transition_output):
            raise ValueError("Number of transition inputs and outputs " +
                             "not same")

        #Initialize all components in the Graph
        with self._graph.as_default():

            #Placeholder for current symbol thats being read from the
            #tape
            self._current_symbol = tf.placeholder("float")

            #Variable to hold current state
            self._current_state = tf.Variable(self._initial_state)

            #Constant Ops for transitions
            transition_in = tf.constant(transition_input)
            transition_out = tf.constant(transition_output)

            #Generate Ops to compute transition
            concat_in = tf.pack([self._current_state,
                                 self._current_symbol])
            packed_in = tf.pack([concat_in for i in
                                 range(len(transition_input))])
            transition_diff = tf.reduce_sum(
                tf.pow(tf.sub(packed_in, transition_input), 2), 1)
            #This will return a 0 if theres an accurate match for the
            #transition condition (current state, current symbol). If not,
            #this will return a non-zero value.
            self._transition_match = tf.reduce_min(transition_diff, 0)
            #This will correspond to the index of the match
            match_index = tf.cast(tf.argmin(transition_diff, 0), "int32")
            #Pick out the transition output
            self._transition = tf.reshape(
                tf.slice(transition_out,  tf.pack([match_index, 0]),
                         tf.pack([1, 3])), [3])
            #This will change state Variable
            self._state_change = tf.assign(
                self._current_state, tf.reshape(
                    tf.cast(tf.slice(self._transition, tf.constant([0]),
                                     tf.constant([1])), "float32"), []))

            #This will reset the state Variable
            self._state_reset = tf.assign(self._current_state,
                                          self._initial_state)

            #Initialize Session
            self._sess = tf.Session()
 
            #Initialize Variables
            init_op = tf.initialize_all_variables()
            self._sess.run(init_op)

    def run(self, input_tape, start_index, max_iter=1000):
        """
        Runs the Turing Machine with the given input.
        'input_tape' must be an index-able object, with 'start_index'
        denoting the index number to start computation at.

        Obviously, beware of inputs that may cause the machine to run
        indefinitely. To control this, set 'max_iter' to some number of
        iterations you would want to allow.

        Will return final state and final index. If Machine halts at a
        non-final state, or goes beyond max_iter, i.e. the Machine does
        not accept the input, will return None, None.
        The input tape will be modified in-place.
        """

        #First reset the state
        self._sess.run(self._state_reset)

        #Run the Machine
        index = start_index
        for i in range(max_iter):
            #Figure out input to be given
            if (index > -1 and len(input_tape) > index):
                machine_input = input_tape[index]
            else:
                machine_input = self._blank_input

            #Run one step of the Machine
            _, match_value, transition = self._sess.run(
                [self._state_change, self._transition_match,
                 self._transition],
                feed_dict = {self._current_symbol: machine_input})

            #If match_value is not zero, return None, None
            if match_value != 0:
                return None, None

            #First modify tape contents at current index as per transition
            if (index > -1 and len(input_tape) > index):
                #Index is within input tape limits
                input_tape[index] = transition[1]
            elif index == len(input_tape):
                #Index has gone beyond the range of the input tape,
                #to the right
                input_tape.append(transition[1])
            else:
                #Index has gone beyond the range of the input tape,
                #to the left
                input_tape.insert(0, transition[1])
                index = 0

            #Change index as per transition
            index = index + transition[2]

            #If its a final state, return state, index
            if transition[0] in self._final_states:
                return transition[0], index

        return None, None

A few things:

1. If the tape ‘pointer’ goes beyond the limits of input_tape, the blank input is automatically given to the machine (lines 114-118). Moreover,  depending on which end of the tape the pointer has gone across, the output value from the machine is inserted into the list (lines 130-142).

2. Since there is no way to check if the Machine will stop, the run method lets the user provide a maximum number of iterations. If the machine keeps running beyond the limit, (None, None) is returned.

3. The input_tape list is modified in-place.

4. If you want to track the transitions the Turing Machine follows, add a ‘print’ statement (or some kind of logging) at line 125.

Running the Turing Machine

First we define the contents of the input tape, and the index to start at.

>>> input_tape = [0, 1, 1, 1, 1, 1]
>>> start_index = 0

Then, we initialize a TuringMachine instance with the algorithm as encoded above:

>>> machine = TuringMachine(initial_state, final_states, blank_input,
                        transition_input, transition_output)

Finally, we ‘run’ the machine, and check if the input tape has been modified correctly:

>>> machine.run(input_tape, start_index)
(3, 5)
>>> input_tape
[1, 1, 1, 1, 1, 1, 1]
>>> machine.run(input_tape, start_index)
(3, 9)
>>> input_tape
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Thats it! This is my first implementation of a Turing Machine, so do leave a comment if there are any errors/improvements possible. Cheers!

Self-Organizing Maps with Google’s TensorFlow

[This post assumes that you know the basics of Google’s TensorFlow library. If you don’t, have a look at my earlier post to get started.]

kohonen1

A Self-Organizing Map, or SOM, falls under the rare domain of unsupervised learning in Neural Networks. Its essentially a grid of neurons, each denoting one cluster learned during training. Traditionally speaking, there is no concept of neuron ‘locations’ in ANNs. However, in an SOM, each neuron has a location, and neurons that lie close to each other represent clusters with similar properties. Each neuron has a weightage vector, which is equal to the centroid of its particular cluster.

AI-Junkie’s post does a great job of explaining how an SOM is trained, so I won’t re-invent the wheel.

The Code

Here’s my code for a 2-D version of an SOM. Its written with TensorFlow as its core training architecture: (Its heavily commented, so look at the inline docs if you want to hack/dig around)


import tensorflow as tf
import numpy as np


class SOM(object):
    """
    2-D Self-Organizing Map with Gaussian Neighbourhood function
    and linearly decreasing learning rate.
    """

    #To check if the SOM has been trained
    _trained = False

    def __init__(self, m, n, dim, n_iterations=100, alpha=None, sigma=None):
        """
        Initializes all necessary components of the TensorFlow
        Graph.

        m X n are the dimensions of the SOM. 'n_iterations' should
        should be an integer denoting the number of iterations undergone
        while training.
        'dim' is the dimensionality of the training inputs.
        'alpha' is a number denoting the initial time(iteration no)-based
        learning rate. Default value is 0.3
        'sigma' is the the initial neighbourhood value, denoting
        the radius of influence of the BMU while training. By default, its
        taken to be half of max(m, n).
        """

        #Assign required variables first
        self._m = m
        self._n = n
        if alpha is None:
            alpha = 0.3
        else:
            alpha = float(alpha)
        if sigma is None:
            sigma = max(m, n) / 2.0
        else:
            sigma = float(sigma)
        self._n_iterations = abs(int(n_iterations))

        ##INITIALIZE GRAPH
        self._graph = tf.Graph()

        ##POPULATE GRAPH WITH NECESSARY COMPONENTS
        with self._graph.as_default():

            ##VARIABLES AND CONSTANT OPS FOR DATA STORAGE

            #Randomly initialized weightage vectors for all neurons,
            #stored together as a matrix Variable of size [m*n, dim]
            self._weightage_vects = tf.Variable(tf.random_normal(
                [m*n, dim]))

            #Matrix of size [m*n, 2] for SOM grid locations
            #of neurons
            self._location_vects = tf.constant(np.array(
                list(self._neuron_locations(m, n))))

            ##PLACEHOLDERS FOR TRAINING INPUTS
            #We need to assign them as attributes to self, since they
            #will be fed in during training

            #The training vector
            self._vect_input = tf.placeholder("float", [dim])
            #Iteration number
            self._iter_input = tf.placeholder("float")

            ##CONSTRUCT TRAINING OP PIECE BY PIECE
            #Only the final, 'root' training op needs to be assigned as
            #an attribute to self, since all the rest will be executed
            #automatically during training

            #To compute the Best Matching Unit given a vector
            #Basically calculates the Euclidean distance between every
            #neuron's weightage vector and the input, and returns the
            #index of the neuron which gives the least value
            bmu_index = tf.argmin(tf.sqrt(tf.reduce_sum(
                tf.pow(tf.sub(self._weightage_vects, tf.pack(
                    [self._vect_input for i in range(m*n)])), 2), 1)),
                                  0)

            #This will extract the location of the BMU based on the BMU's
            #index
            slice_input = tf.pad(tf.reshape(bmu_index, [1]),
                                 np.array([[0, 1]]))
            bmu_loc = tf.reshape(tf.slice(self._location_vects, slice_input,
                                          tf.constant(np.array([1, 2]))),
                                 [2])

            #To compute the alpha and sigma values based on iteration
            #number
            learning_rate_op = tf.sub(1.0, tf.div(self._iter_input,
                                                  self._n_iterations))
            _alpha_op = tf.mul(alpha, learning_rate_op)
            _sigma_op = tf.mul(sigma, learning_rate_op)

            #Construct the op that will generate a vector with learning
            #rates for all neurons, based on iteration number and location
            #wrt BMU.
            bmu_distance_squares = tf.reduce_sum(tf.pow(tf.sub(
                self._location_vects, tf.pack(
                    [bmu_loc for i in range(m*n)])), 2), 1)
            neighbourhood_func = tf.exp(tf.neg(tf.div(tf.cast(
                bmu_distance_squares, "float32"), tf.pow(_sigma_op, 2))))
            learning_rate_op = tf.mul(_alpha_op, neighbourhood_func)

            #Finally, the op that will use learning_rate_op to update
            #the weightage vectors of all neurons based on a particular
            #input
            learning_rate_multiplier = tf.pack([tf.tile(tf.slice(
                learning_rate_op, np.array([i]), np.array([1])), [dim])
                                               for i in range(m*n)])
            weightage_delta = tf.mul(
                learning_rate_multiplier,
                tf.sub(tf.pack([self._vect_input for i in range(m*n)]),
                       self._weightage_vects))                                         
            new_weightages_op = tf.add(self._weightage_vects,
                                       weightage_delta)
            self._training_op = tf.assign(self._weightage_vects,
                                          new_weightages_op)                                       

            ##INITIALIZE SESSION
            self._sess = tf.Session()

            ##INITIALIZE VARIABLES
            init_op = tf.initialize_all_variables()
            self._sess.run(init_op)

    def _neuron_locations(self, m, n):
        """
        Yields one by one the 2-D locations of the individual neurons
        in the SOM.
        """
        #Nested iterations over both dimensions
        #to generate all 2-D locations in the map
        for i in range(m):
            for j in range(n):
                yield np.array([i, j])

    def train(self, input_vects):
        """
        Trains the SOM.
        'input_vects' should be an iterable of 1-D NumPy arrays with
        dimensionality as provided during initialization of this SOM.
        Current weightage vectors for all neurons(initially random) are
        taken as starting conditions for training.
        """

        #Training iterations
        for iter_no in range(self._n_iterations):
            #Train with each vector one by one
            for input_vect in input_vects:
                self._sess.run(self._training_op,
                               feed_dict={self._vect_input: input_vect,
                                          self._iter_input: iter_no})

        #Store a centroid grid for easy retrieval later on
        centroid_grid = [[] for i in range(self._m)]
        self._weightages = list(self._sess.run(self._weightage_vects))
        self._locations = list(self._sess.run(self._location_vects))
        for i, loc in enumerate(self._locations):
            centroid_grid[loc[0]].append(self._weightages[i])
        self._centroid_grid = centroid_grid

        self._trained = True

    def get_centroids(self):
        """
        Returns a list of 'm' lists, with each inner list containing
        the 'n' corresponding centroid locations as 1-D NumPy arrays.
        """
        if not self._trained:
            raise ValueError("SOM not trained yet")
        return self._centroid_grid

    def map_vects(self, input_vects):
        """
        Maps each input vector to the relevant neuron in the SOM
        grid.
        'input_vects' should be an iterable of 1-D NumPy arrays with
        dimensionality as provided during initialization of this SOM.
        Returns a list of 1-D NumPy arrays containing (row, column)
        info for each input vector(in the same order), corresponding
        to mapped neuron.
        """

        if not self._trained:
            raise ValueError("SOM not trained yet")

        to_return = []
        for vect in input_vects:
            min_index = min([i for i in range(len(self._weightages))],
                            key=lambda x: np.linalg.norm(vect-
                                                         self._weightages[x]))
            to_return.append(self._locations[min_index])

        return to_return

A few points about the code:

1) Since my post on K-Means Clustering, I have gotten more comfortable with matrix operations in TensorFlow. You need to be comfortable with matrices if you want to work with TensorFlow (or any data flow infrastructure for that matter, even SciPy). You can code pretty much any logic or operational flow with TensorFlow, you just need to be able to build up complex functionality from basic components(ops), and structure the flow of data(tensors/variables) well.

2) It took quite a while for me to build the whole graph in such a way that the entire training functionality could be enclosed in a single op. This op is called during each iteration, for every vector, during training. Such an implementation is more in line with TensorFlow’s way of doing things, than my previous attempt with clustering.

3) I have used a 2-D grid for the SOM, you can use any geometry you wish. You would just have to modify the _neuron_locations method appropriately, and also the method that returns the centroid outputs. You could return a dict that maps neuron location to the corresponding cluster centroid.

4) To keep things simple, I haven’t provided for online training. You could do that by having bounds for the learning rate(s).

Sample Usage

I have used PyMVPA’s example of RGB colours to confirm that the code does work. PyMVPA provides functionality to train SOMs too (along with many other learning techniques).

Here’s how you would do it with my code:


#For plotting the images
from matplotlib import pyplot as plt

#Training inputs for RGBcolors
colors = np.array(
     [[0., 0., 0.],
      [0., 0., 1.],
      [0., 0., 0.5],
      [0.125, 0.529, 1.0],
      [0.33, 0.4, 0.67],
      [0.6, 0.5, 1.0],
      [0., 1., 0.],
      [1., 0., 0.],
      [0., 1., 1.],
      [1., 0., 1.],
      [1., 1., 0.],
      [1., 1., 1.],
      [.33, .33, .33],
      [.5, .5, .5],
      [.66, .66, .66]])
color_names = \
    ['black', 'blue', 'darkblue', 'skyblue',
     'greyblue', 'lilac', 'green', 'red',
     'cyan', 'violet', 'yellow', 'white',
     'darkgrey', 'mediumgrey', 'lightgrey']

#Train a 20x30 SOM with 400 iterations
som = SOM(20, 30, 3, 400)
som.train(colors)

#Get output grid
image_grid = som.get_centroids()

#Map colours to their closest neurons
mapped = som.map_vects(colors)

#Plot
plt.imshow(image_grid)
plt.title('Color SOM')
for i, m in enumerate(mapped):
    plt.text(m[1], m[0], color_names[i], ha='center', va='center',
             bbox=dict(facecolor='white', alpha=0.5, lw=0))
plt.show()

Here’s a sample of the output you would get (varies each time you train, but the color names should go to the correct locations in the image):

figure_1

Recursively copying elements from one Graph to another in TensorFlow

In my previous post on Google’s TensorFlow, I had mentioned the idea of using the library for Genetic Programming applications. In my free time, I tried to map out how such an application would be run. The focus obviously wasn’t on building the smartest way to do GP, but rather on exploring if it was practically possible. One of the ideas that stuck out, was to have a Graph for each candidate solution – which would be populated based on cross-overed elements from the parent solutions’ Graphs. This would require a good API to copy computational elements from one Graph to another in TensorFlow. I tried digging around to see if such functionality was available, but couldn’t find any (atleast from a good exploring of the github repo).

I wrote some rudimentary code to accomplish this, and heres a basic outline of how it works:

Consider the example given on TensorFlow’s Get Started page.


import tensorflow as tf
import numpy as np

# Make 100 phony data points in NumPy.
x_data = np.float32(np.random.rand(2, 100)) # Random input
y_data = np.dot([0.100, 0.200], x_data) + 0.300

# Construct a linear model.
b = tf.Variable(tf.zeros([1]))
W = tf.Variable(tf.random_uniform([1, 2], -1.0, 1.0))
y = tf.matmul(W, x_data) + b

# Minimize the squared errors.
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)

# For initializing the variables.
init = tf.initialize_all_variables()

# Launch the graph
sess = tf.Session()
sess.run(init)

# Fit the plane.
for step in xrange(0, 201):
    sess.run(train)
    if step % 20 == 0:
        print step, sess.run(W), sess.run(b)

# Learns best fit is W: [[0.100  0.200]], b: [0.300]

Lets say you want to copy the main training element train to another graph as defined here:


to_graph = tf.Graph()

 

1) You first decide on a namespace inside which all the copied elements will exist in to_graph. This is not really required if the Graph you are copying to is empty. But its important to remember that element names matter a lot in TensorFlow’s workings. Therefore, to avoid naming conflicts, its better to define such a namespace. So basically, what would be called “C” in to_graph‘s namespace, would now be called “N/C” where “N” is the namespace String.

Lets assume the namespace we define the copied elements in our example to, is “CopiedOps”.


namespace = "CopiedOps"

 

2) You then copy all the variables first, one by one, using a dedicated function copy_variable_to_graph. Each time, you supply the original instance, the target graph (to_graph in this case), the namespace, and an extra dictionary called copied_variables. This dictionary is useful while copying the computational nodes (Operation instances, we will call them ops) in the next step. Since Variable instances act as inputs for ops, we need a way to keep track of them for later.

I initially wanted to combine this initialization of variables with the function that copies the computational elements, but I found it really tricky to capture the appropriate Variable instances based on their connections to ops. Anyways, since variables are more like parameter-storing units whose values are needed frequently, its better to initialize them separately.

The Variable instances in the above example are b and W. Heres how you would do it with my code:


copied_variables = {}
b1 = copy_variable_to_graph(b, to_graph, namespace, copied_variables)
W1 = copy_variable_to_graph(W, to_graph, namespace, copied_variables)

Ofcourse, if your code has a lot of variables, you could just store them all in a list and run the above function over all of them with a common dictionary for copied variables.

 

3) You then recursively copy all the computational nodes (ops, Placeholders) to the other graph. Now heres the nice things about the method- You only need to do it for the topmost node computational node. All connected inputs and Tensors are automatically taken care of!

For the above example, the train object constructed on line 16 is the ‘topmost’ node. So copying the whole learner is as simple as:

train_copy = copy_to_graph(train, to_graph, copied_variables, namespace)

Thats it! The other instances like y, optimizer, loss are automatically replicated in to_graph.

Theres also a helper function in case you want to find the equivalent of an element from the original graph, in to_graph:


loss_copy = get_copied(loss, to_graph, copied_variables, namespace)

 

4) You can now run the new node in to_graph. Remember to initialize a new Session instance linked to the graph, and initialize all Variables. So heres how you would go about it:


with to_graph.as_default():
    init1 = tf.initialize_all_variables()
    sess1 = tf.Session()
    sess1.run(init1)

    for step in xrange(0, 201):
        sess1.run(train_copy)
        if step % 20 == 0:
            print step, sess1.run(W1), sess1.run(b1)

This provides an output similar to what you would get from the Get Started original example.

The Code


import tensorflow as tf
from tensorflow.python.framework import ops
from copy import deepcopy


def copy_variable_to_graph(org_instance, to_graph, namespace,
                           copied_variables={}):
    """
    Copies the Variable instance 'org_instance' into the graph
    'to_graph', under the given namespace.
    The dict 'copied_variables', if provided, will be updated with
    mapping the new variable's name to the instance.
    """

    if not isinstance(org_instance, tf.Variable):
        raise TypeError(str(org_instance) + " is not a Variable")

    #The name of the new variable
    if namespace != '':
        new_name = (namespace + '/' +
                    org_instance.name[:org_instance.name.index(':')])
    else:
        new_name = org_instance.name[:org_instance.name.index(':')]

    #Get the collections that the new instance needs to be added to.
    #The new collections will also be a part of the given namespace,
    #except the special ones required for variable initialization and
    #training.
    collections = []
    for name, collection in org_instance.graph._collections.items():
        if org_instance in collection:
            if (name == ops.GraphKeys.VARIABLES or
                name == ops.GraphKeys.TRAINABLE_VARIABLES or
                namespace == ''):
                collections.append(name)
            else:
                collections.append(namespace + '/' + name)

    #See if its trainable.
    trainable = (org_instance in org_instance.graph.get_collection(
        ops.GraphKeys.TRAINABLE_VARIABLES))
    #Get the initial value
    with org_instance.graph.as_default():
        temp_session = tf.Session()
        init_value = temp_session.run(org_instance.initialized_value())

    #Initialize the new variable
    with to_graph.as_default():
        new_var = tf.Variable(init_value,
                              trainable,
                              name=new_name,
                              collections=collections,
                              validate_shape=False)

    #Add to the copied_variables dict
    copied_variables[new_var.name] = new_var

    return new_var


def copy_to_graph(org_instance, to_graph, copied_variables={}, namespace=""):
    """
    Makes a copy of the Operation/Tensor instance 'org_instance'
    for the graph 'to_graph', recursively. Therefore, all required
    structures linked to org_instance will be automatically copied.
    'copied_variables' should be a dict mapping pertinent copied variable
    names to the copied instances.
    
    The new instances are automatically inserted into the given 'namespace'.
    If namespace='', it is inserted into the graph's global namespace.
    However, to avoid naming conflicts, its better to provide a namespace.
    If the instance(s) happens to be a part of collection(s), they are
    are added to the appropriate collections in to_graph as well.
    For example, for collection 'C' which the instance happens to be a
    part of, given a namespace 'N', the new instance will be a part of
    'N/C' in to_graph.

    Returns the corresponding instance with respect to to_graph.

    TODO: Order of insertion into collections is not preserved
    """

    #The name of the new instance
    if namespace != '':
        new_name = namespace + '/' + org_instance.name
    else:
        new_name = org_instance.name

    #If a variable by the new name already exists, return the
    #correspondng tensor that will act as an input
    if new_name in copied_variables:
        return to_graph.get_tensor_by_name(
            copied_variables[new_name].name)

    #If an instance of the same name exists, return appropriately
    try:
        already_present = to_graph.as_graph_element(new_name,
                                                    allow_tensor=True,
                                                    allow_operation=True)
        return already_present
    except:
        pass

    #Get the collections that the new instance needs to be added to.
    #The new collections will also be a part of the given namespace.
    collections = []
    for name, collection in org_instance.graph._collections.items():
        if org_instance in collection:
            if namespace == '':
                collections.append(name)
            else:
                collections.append(namespace + '/' + name)
    
    #Take action based on the class of the instance

    if isinstance(org_instance, tf.python.framework.ops.Tensor):

        #If its a Tensor, it is one of the outputs of the underlying
        #op. Therefore, copy the op itself and return the appropriate
        #output.
        op = org_instance.op
        new_op = copy_to_graph(op, to_graph, copied_variables, namespace)
        output_index = op.outputs.index(org_instance)
        new_tensor = new_op.outputs[output_index]
        #Add to collections if any
        for collection in collections:
            to_graph.add_to_collection(collection, new_tensor)

        return new_tensor

    elif isinstance(org_instance, tf.python.framework.ops.Operation):

        op = org_instance

        #If it has an original_op parameter, copy it
        if op._original_op is not None:
            new_original_op = copy_to_graph(op._original_op, to_graph,
                                            copied_variables, namespace)
        else:
            new_original_op = None

        #If it has control inputs, call this function recursively on each.
        new_control_inputs = [copy_to_graph(x, to_graph, copied_variables,
                                            namespace)
                              for x in op.control_inputs]

        #If it has inputs, call this function recursively on each.
        new_inputs = [copy_to_graph(x, to_graph, copied_variables,
                                    namespace)
                      for x in op.inputs]

        #Make a new node_def based on that of the original.
        #An instance of tensorflow.core.framework.graph_pb2.NodeDef, it
        #stores String-based info such as name, device and type of the op.
        #Unique to every Operation instance.
        new_node_def = deepcopy(op._node_def)
        #Change the name
        new_node_def.name = new_name

        #Copy the other inputs needed for initialization
        output_types = op._output_types[:]
        input_types = op._input_types[:]

        #Make a copy of the op_def too.
        #Its unique to every _type_ of Operation.
        op_def = deepcopy(op._op_def)

        #Initialize a new Operation instance
        new_op = tf.python.framework.ops.Operation(new_node_def,
                                                   to_graph,
                                                   new_inputs,
                                                   output_types,
                                                   new_control_inputs,
                                                   input_types,
                                                   new_original_op,
                                                   op_def)
        #Use Graph's hidden methods to add the op
        to_graph._add_op(new_op)
        to_graph._record_op_seen_by_control_dependencies(new_op)
        for device_function in reversed(to_graph._device_function_stack):
            new_op._set_device(device_function(new_op))

        return new_op

    else:
        raise TypeError("Could not copy instance: " + str(org_instance))


def get_copied(original, graph, copied_variables={}, namespace=""):
    """
    Get a copy of the instance 'original', present in 'graph', under
    the given 'namespace'.
    'copied_variables' is a dict mapping pertinent variable names to the
    copy instances.
    """

    #The name of the copied instance
    if namespace != '':
        new_name = namespace + '/' + original.name
    else:
        new_name = original.name

    #If a variable by the name already exists, return it
    if new_name in copied_variables:
        return copied_variables[new_name]

    return graph.as_graph_element(new_name, allow_tensor=True,
                                  allow_operation=True)

Working with feeding is pretty simple too:


>>> x = tf.placeholder("float")
>>> a = tf.constant(3, "float")
>>> y = tf.add(x, a)
>>> namespace = "CopiedOps"
>>> to_graph = tf.Graph()
>>> copied_variables = {}
>>> y1 = copy_to_graph(y, to_graph, namespace)
>>> x1 = get_copied(x, to_graph, namespace)
>>> with to_graph.as_default():
    sess = tf.Session()
    print sess.run(y1, feed_dict={x1: 5})

    
8.0

I guess thats all for my hacking around with TensorFlow for the week. If you intend to use this code, please note that it may not be perfect at doing what it says. I haven’t tried it out with all sorts of TensorFlow data structures as yet, so be open to getting an Exception or two that you may have to fix. Infact, do drop me a comment or mail so I can make this code as fool-proof as I can. Cheers!

K-Means Clustering with TensorFlow

Google recently open-sourced its Artificial Intelligence/Numerical Computing library called TensorFlow. TensorFlow was developed by members of the Google Brain team, and has the flexibility to run on a variety of platforms – including GPUs and mobile devices.

687474703a2f2f7777772e616e64726f696463656e7472616c2e636f6d2f73697465732f616e64726f696463656e7472616c2e636f6d2f66696c65732f7374796c65732f6c617267652f7075626c69632f61727469636c655f696d616765732f323031352f31312f74656e736f72666c6f772e706e67

TensorFlow’s methodology uses what they called data-flow graphs. Consider the following diagram from the Wikipedia page on Genetic Programming (which could have some interesting applications with TensorFlow, I think):

Genetic_Program_TreeAs you probably understood, the graphical structure is a way of representing a computational expression in the form of a Tree. Every node is an operation (TensorFlow calls them ops, short for operations). The non-leaf nodes are pretty easy to understand. Some leaf nodes are a special case of an operation, always ‘returning’ a constant value (like 7 or 2.2 in the Tree). Others (like X or Y) act as placeholders that will be fed in at the time of execution. If you look at the arrows, you will realize that their directions denote the dependencies between outputs of different nodes. Hence, data (TensorFlow calls them Tensors) will flow in the opposite direction along each node – Hence the name TensorFlow. TensorFlow provides other components over this graphical abstraction, like persistent memory elements that retain data (called Variables), and optimization techniques to fine-tune the parameters in these Variables in applications like Neural Networks.

TensorFlow has a powerful Python API. The TensorFlow team has done an awesome job of writing the documentation (which is a little tricky to navigate). If you are completely new to this, heres a few links to get you started (in the order you should visit them):

1. Basic setup

2. Read this example to get a vague idea of what a TensorFlow code looks like.

3. Now read this explanation of the basic components of TensorFlow. It helps if you read the above example again, or simultaneously.

4. Read this detailed example of using TensorFlow for a common ML problem.

5. Once you have a decent understanding of the basic components and how they work, you can look at the Python docs for reference.

Now here is the code I wrote for K-Means clustering using TensorFlow. As a disclaimer, I will mention that this code is based on my (at the time of writing this) 2-day old understanding of how the library works. If you find any errors or know any optimizations possible, do drop a comment! The code is heavily documented, so do go through in-line docs.


import tensorflow as tf
from random import choice, shuffle
from numpy import array


def TFKMeansCluster(vectors, noofclusters):
    """
    K-Means Clustering using TensorFlow.
    'vectors' should be a n*k 2-D NumPy array, where n is the number
    of vectors of dimensionality k.
    'noofclusters' should be an integer.
    """

    noofclusters = int(noofclusters)
    assert noofclusters < len(vectors)

    #Find out the dimensionality
    dim = len(vectors[0])

    #Will help select random centroids from among the available vectors
    vector_indices = list(range(len(vectors)))
    shuffle(vector_indices)

    #GRAPH OF COMPUTATION
    #We initialize a new graph and set it as the default during each run
    #of this algorithm. This ensures that as this function is called
    #multiple times, the default graph doesn't keep getting crowded with
    #unused ops and Variables from previous function calls.

    graph = tf.Graph()

    with graph.as_default():

        #SESSION OF COMPUTATION

        sess = tf.Session()

        ##CONSTRUCTING THE ELEMENTS OF COMPUTATION

        ##First lets ensure we have a Variable vector for each centroid,
        ##initialized to one of the vectors from the available data points
        centroids = [tf.Variable((vectors[vector_indices[i]]))
                     for i in range(noofclusters)]
        ##These nodes will assign the centroid Variables the appropriate
        ##values
        centroid_value = tf.placeholder("float64", [dim])
        cent_assigns = []
        for centroid in centroids:
            cent_assigns.append(tf.assign(centroid, centroid_value))

        ##Variables for cluster assignments of individual vectors(initialized
        ##to 0 at first)
        assignments = [tf.Variable(0) for i in range(len(vectors))]
        ##These nodes will assign an assignment Variable the appropriate
        ##value
        assignment_value = tf.placeholder("int32")
        cluster_assigns = []
        for assignment in assignments:
            cluster_assigns.append(tf.assign(assignment,
                                             assignment_value))

        ##Now lets construct the node that will compute the mean
        #The placeholder for the input
        mean_input = tf.placeholder("float", [None, dim])
        #The Node/op takes the input and computes a mean along the 0th
        #dimension, i.e. the list of input vectors
        mean_op = tf.reduce_mean(mean_input, 0)

        ##Node for computing Euclidean distances
        #Placeholders for input
        v1 = tf.placeholder("float", [dim])
        v2 = tf.placeholder("float", [dim])
        euclid_dist = tf.sqrt(tf.reduce_sum(tf.pow(tf.sub(
            v1, v2), 2)))

        ##This node will figure out which cluster to assign a vector to,
        ##based on Euclidean distances of the vector from the centroids.
        #Placeholder for input
        centroid_distances = tf.placeholder("float", [noofclusters])
        cluster_assignment = tf.argmin(centroid_distances, 0)

        ##INITIALIZING STATE VARIABLES

        ##This will help initialization of all Variables defined with respect
        ##to the graph. The Variable-initializer should be defined after
        ##all the Variables have been constructed, so that each of them
        ##will be included in the initialization.
        init_op = tf.initialize_all_variables()

        #Initialize all variables
        sess.run(init_op)

        ##CLUSTERING ITERATIONS

        #Now perform the Expectation-Maximization steps of K-Means clustering
        #iterations. To keep things simple, we will only do a set number of
        #iterations, instead of using a Stopping Criterion.
        noofiterations = 100
        for iteration_n in range(noofiterations):

            ##EXPECTATION STEP
            ##Based on the centroid locations till last iteration, compute
            ##the _expected_ centroid assignments.
            #Iterate over each vector
            for vector_n in range(len(vectors)):
                vect = vectors[vector_n]
                #Compute Euclidean distance between this vector and each
                #centroid. Remember that this list cannot be named
                #'centroid_distances', since that is the input to the
                #cluster assignment node.
                distances = [sess.run(euclid_dist, feed_dict={
                    v1: vect, v2: sess.run(centroid)})
                             for centroid in centroids]
                #Now use the cluster assignment node, with the distances
                #as the input
                assignment = sess.run(cluster_assignment, feed_dict = {
                    centroid_distances: distances})
                #Now assign the value to the appropriate state variable
                sess.run(cluster_assigns[vector_n], feed_dict={
                    assignment_value: assignment})

            ##MAXIMIZATION STEP
            #Based on the expected state computed from the Expectation Step,
            #compute the locations of the centroids so as to maximize the
            #overall objective of minimizing within-cluster Sum-of-Squares
            for cluster_n in range(noofclusters):
                #Collect all the vectors assigned to this cluster
                assigned_vects = [vectors[i] for i in range(len(vectors))
                                  if sess.run(assignments[i]) == cluster_n]
                #Compute new centroid location
                new_location = sess.run(mean_op, feed_dict={
                    mean_input: array(assigned_vects)})
                #Assign value to appropriate variable
                sess.run(cent_assigns[cluster_n], feed_dict={
                    centroid_value: new_location})

        #Return centroids and assignments
        centroids = sess.run(centroids)
        assignments = sess.run(assignments)
        return centroids, assignments

Never, ever, EVER, do something like this:


for i in range(100):
    x = sess.run(tf.assign(variable1, placeholder))

This may seem pretty harmless at first glance, but every time you initialize an op, (like tf.assign or even tf.zeros, you are adding new ops instances to the default graph. Instead, as shown in the code, define a particular op for each task (however specialized) just once in the code. Then, during every of your iterations, call sess.run over the required nodes. To check if you are crowding your graph with unnecessary ops, just print out the value of len(graph.get_operations()) during every iteration and see if its is increasing. In fact, sess.run should be the only way you interact with the graph during every iteration.

As visible on lines 138 and 139, you can call sess.run over a list of ops/Variables to return a list of the outputs in the same order.

There are a lot of intricacies of TensorFlow that this code does not go into, such as assigning devices to nodes, Graph collections, dependencies, etc. Thats partially because I am still understanding these aspects one by one. But at first glance, TensorFlow seems to be a pretty powerful and flexible way of doing AI/ML-based computations. I would personally like to explore its applications in developing dependency-based statistical metrics for data – for which I am currently using custom tree-like data structures. Lets hope this gesture by Google does lead to an increase in the applications and research in AI. Cheers!