The magic behind Attribute Access in Python

Most people know just one thing when it comes to attribute access – the dot ‘.’ (as in x.some_attribute). In simple terms, attribute access is the way you retrieve an object linked to the one you already have. To someone who uses Python without delving too much into the details, it may seem pretty straightforward. However, under the hood, theres a lot that goes on for this seemingly trivial task.

Lets look at each of the components one by one.

The __dict__ attribute

Every object in Python has an attribute denoted by __dict__. This dictionary/dictionary-like (I will explain this shortly) object contains all the attributes defined for the object itself. It maps the attribute name to its value.

Heres an example:


>>> class C(object):
	x = 4

>>> c = C()
>>> c.y = 5
>>> c.__dict__
{'y': 5}

Notice how 'x' is not in c.__dict__. The reason for this is simple enough. While y was defined for the object c, x was defined for its class (C). Therefore, it will appear in the __dict__ of C. In fact, C‘s __dict__ contains a lot of other keys too (including '__dict__'):


>>> c.__class__.__dict__['x']
4
>>> c.__class__.__dict__
dict_proxy({'__dict__': <attribute '__dict__' of 'C' objects>, 'x': 4, 
'__module__': '__main__', '__weakref__': <attribute '__weakref__' of 'C' objects>, 
'__doc__': None})

We will look at what dictproxy means soon.

The __dict__ of an object is simple enough to understand. It behaves like a Python dict, and is one too.


>>> c.__dict__
{'y': 5}
>>> c.__dict__.__class__
<type 'dict'>
>>> c.__dict__ = {}
>>> c.y

Traceback (most recent call last):
  File "<pyshell#81>", line 1, in <module>
    c.y
AttributeError: 'C' object has no attribute 'y'
>>> c.__dict__['y'] = 5
>>> c.y
5

The __dict__ of a class however, is not that straight-forward. Its actually an object of a class called dictproxy. dictproxy is a special class whose objects behave like normal dicts, but they differ in some key behaviours.


>>> C.__dict__
dict_proxy({'__dict__': <attribute '__dict__' of 'C' objects>, 'x': 4, '__module__': '__main__', '__weakref__': <attribute '__weakref__' of 'C' objects>, '__doc__': None})
>>> C.__dict__.__class__
<type 'dictproxy'>
>>> C.__dict__['x']
4
>>> C.__dict__['x'] = 6

Traceback (most recent call last):
  File "<pyshell#87>", line 1, in <module>
    C.__dict__['x'] = 4
TypeError: 'dictproxy' object does not support item assignment
>>> C.x = 6
>>> C.__dict__ = {}

Traceback (most recent call last):
  File "<pyshell#89>", line 1, in <module>
    C.__dict__ = {}
AttributeError: attribute '__dict__' of 'type' objects is not writable

Notice how you cannot set a key in a dictproxy directly (C.__dict__['x'] = 4 does not work). You can accomplish the same using C.x = 6 however, since the internal behaviour then is different. Also notice how you cannot set the __dict__ attribute itself either(C.__dict__ = {} does not work).

Theres a reason behind this weird implementation. If you don’t want to get into the details, just know that its for the Python interpreter to keep working properly, and to enforce some optimizations. If you want a more detailed explanation, have a look at Scott H’s answer to this StackOverflow question.

Descriptors

A descriptor is an object that has atleast one of the following magic methods in its attributes: __get__, __set__ or __delete__ (Remember, methods are ultimately objects in Python). Mind you, its the object we are talking about. Its class may or may not have implemented them.

Descriptors can help you define the behaviour of an object’s attribute in Python. With each of the magic methods just mentioned, you implement how the attribute (‘described’ by the descriptor) will be retrieved, set and deleted in the object respectively. There are two types of descriptors – Data Descriptors, and Non-Data Descriptors.

Non-Data Descriptors only have __get__ defined. All others are Data Descriptors. You would naturally think, why these two types are called so. The answer is intuitive. Usually, its data-related attributes that we tend to ‘set’ or ‘delete’ with respect to an object. Other attributes, like methods themselves, we don’t. So their descriptors are called Non-Data Descriptors. As with a lot of other things in Python, this is not a hard-and-fast rule, but a convention. You could just as well describe a method with a Data Descriptor. But then, its __get__ should return a function.

Heres an example of two classes that will come up with data and non-data descriptor objects respectively:


class DataDesc(object):
    def __init__(self, name):
        self._name = name

    def __get__(self, obj, objclass):
        try:
            print("Retrieving attr " + self._name + " from " +
                  str(obj) + "...")
            return objclass.x + " + " + obj.y
        except:
            raise AttributeError("Attr " + self._name + " could not be " +
                                 "retrieved from " + str(obj))
    
    def __set__(self, obj, value):
        raise AttributeError("Attr " + self._name + " cannot be " +
                             "set in " + str(obj))

    def __delete__(self, obj):
        raise AttributeError("Attr " + self._name + " cannot be " +
                             "deleted in " + str(obj))

class NonDataDesc(object):
    def __init__(self, name):
        self._name = name

    def __get__(self, obj, objclass):
        try:
            print("Retrieving attr " + self._name + " from " +
                  str(obj) + "...")
            return objclass.x + " + " + obj.y
        except:
            raise AttributeError("Attr " + self._name + " could not be " +
                                 "retrieved from " + str(obj))

Notice how the __get__ function takes in an object obj and (its) class objclass. Similarly, setting the value requires obj and some candidate value. Deletion just needs obj. Taking these parameters in (along with the initializer __init__) helps you differentiate between objects of the same descriptor class. Mind you, its the objects that are intended to be the descriptors.
(P.S. If you don’t define the __get__ method for a descriptor, the descriptor object itself will get returned).

Lets use these classes in some code.


class ParentClass(object):
    x = "x1"
    y = "y1"
    data_attr_parent = DataDesc("desc1")
    data_attr_child = DataDesc("desc2")

class ChildClass(ParentClass):
    x = "x2"
    y = "y2"
    data_attr_child = DataDesc("desc3")
    non_data_attr_child = NonDataDesc("desc4")

some_object = ChildClass()

Thats it! You can access the ‘described’ objects as usual in Python.


>>> some_object.data_attr_parent
Retrieving attr desc1 from <__main__.ChildClass object at 0x1062c5790>...
'x2 + y2'

Descriptors are used for a lot of attribute and method related functionality in Python, including static methods, class methods and properties. Using descriptors, you can gain better control over how attributes and methods of a class/its objects are accessed – including defining some ‘behind the scenes’ functionality like logging.

Now lets look at the high-level rules governing attribute access in Python.

The Rules

Quoting Shalabh Chaturvedi’s book verbatim, the workflow is as follows:

  1. If attrname is a special (i.e. Python-provided) attribute for objectname, return it.
  2. Check objectname.__class__.__dict__ for attrname. If it exists and is a data-descriptor, return the descriptor result. Search all bases of objectname.__class__ for the same case.
  3. Check objectname.__dict__ for attrname, and return if found. If objectname is a class, search its bases too. If it is a class and a descriptor exists in it or its bases, return the descriptor result.
  4. Check objectname.__class__.__dict__ for attrname. If it exists and is a non-data descriptor, return the descriptor result. If it exists, and is not a descriptor, just return it. If it exists and is a data descriptor, we shouldn’t be here because we would have returned at point 2. Search all bases of objectname.__class__for same case.
  5. Raise AttributeError

 

To make things clearer, heres some tinkering using the code we wrote in the Descriptors section (Have a look at it again just to be clear about things):

data_attr_child is a Data descriptor in some_object‘s class. So you cant write over it. Also, the version in ChildClass (‘desc3’) is used, not the one in ParentClass.


>>> some_object.data_attr_child
Retrieving attr desc3 from <__main__.ChildClass object at 0x1110c9790>...
'x2 + y2'
>>> some_object.data_attr_child = 'xyz'

Traceback (most recent call last):
  File "<pyshell#112>", line 1, in <module>
    some_object.data_attr_child = 'xyz'
  File "/Users/srjoglekar/metaclasses.py", line 16, in __set__
    "set in " + str(obj))
AttributeError: Attr desc3 cannot be set in <__main__.ChildClass object at 0x10883f790>

Infact, even if you make an appropriate entry in some_object‘s dict, it still won’t matter (as per Rule 1).


>>> some_object.__dict__['data_attr_child'] = 'xyz'
>>> some_object.data_attr_child
Retrieving attr desc3 from <__main__.ChildClass object at 0x10883f790>...
'x2 + y2'

The Non-Data Descriptor attribute, on the other hand, can be easily overwritten.


>>> some_object.non_data_attr_child
Retrieving attr desc4 from <__main__.ChildClass object at 0x10883f790>...
'x2 + y2'
>>> some_object.non_data_attr_child = 'xyz'
>>> some_object.non_data_attr_child
'xyz'
>>> some_object.__dict__
{'data_attr_child': 'xyz', 'non_data_attr_child': 'xyz'}

You can, however, change the behaviour of data_attr_child, if you go to some_object‘s class and modify it in the dictproxy there itself.


>>> some_object.__class__.data_attr_child = 'abc'
>>> some_object.data_attr_child
'xyz'

Notice how the moment you replace the Data-Descriptor in the class with some non-data descriptor (or some object like a String in this case), the entry that we initially made in some_object‘s __dict__ comes into play. Therefore, some_object.data_attr_child returns 'xyz', not 'abc'.

The data_attr_parent attribute behaves similar to data_attr_child.


>>> some_object.data_attr_parent
Retrieving attr desc1 from <__main__.ChildClass object at 0x10883f790>...
'x2 + y2'
>>> some_object.data_attr_parent = 'xyz'

Traceback (most recent call last):
  File "<pyshell#127>", line 1, in <module>
    some_object.data_attr_parent = 'xyz'
  File "/Users/srjoglekar/metaclasses.py", line 16, in __set__
    "set in " + str(obj))
AttributeError: Attr desc1 cannot be set in <__main__.ChildClass object at 0x10883f790>
>>> some_object.__class__.data_attr_parent = 'xyz'
>>> some_object.__class__.data_attr_parent
'xyz'

Notice how you cant ‘write-over’ data_attr_parent in ChildClass itself. Once you do that, we go through Rules 1-2-3 and stop at 4, to get the result 'xyz'.

Rules for Setting Attributes

Way simpler than the rules for ‘getting them’. Quoting Shalabh’s book again,

  1. Check objectname.__class__.__dict__ for attrname. If it exists and is a data-descriptor, use the descriptor to set the value. Search all bases of objectname.__class__ for the same case.
  2. Insert something into objectname.__dict__ for key "attrname".

Thats it! :-).

__slots__

To put it concisely, __slots__ is a way to disallow objects from having their own __dict__ in Python. This means, that if you define __slots__ in a Class, then you cannot set arbitrary attributes(apart from the ones mentioned in the ‘slots’) on its objects.

Heres an example of such a class:


class SomeClass(object):
    __slots__ = ['x', 'y']

obj = SomeClass()

Now see how this behaves:


>>> obj.x = 4
>>> obj.y = 5
>>> obj.x
4
>>> obj.y
5
>>> obj.z = 6

Traceback (most recent call last):
  File "<pyshell#135>", line 1, in <module>
    obj.z = 6
AttributeError: 'SomeClass' object has no attribute 'z'

You can ofcourse do this:


>>> obj.__class__.z = 6
>>> obj.z
6

But then, remember you have now defined z in SomeClass‘s __dict__, not in obj‘s.

As Guido van Rossum himself mentions in his blog post, __slots__ were implemented in Python to introduce efficiency, not ‘stricter’ attribute-setting. The basic intuition is this: Suppose you have a class, whose objects you intend to construct in a large number. You don’t really need the flexibility of having ‘dynamic’ attributes on the objects themselves, but you want efficiency. Since slots essentially eliminates the __dict__ attribute in each one of the objects, you get a lot of memory savings this way.

Interestingly, slots are implemented using descriptors in Python.

 

Further Reading

Have a look at this book I have already quoted in the post. It goes into a lot of detail regarding attribute access in Python, including method resolutions.

Thats all for now. Cheers!

Simple Turing Machine using TensorFlow

A post after many days! This one describes a simple code I wrote to simulate a Deterministic Turing Machine using TensorFlow. Its pretty basic with respect to the design, using TensorFlow only for the state change computations. The movement on the tape, and handling of halts is taken care of by old-school Python. This is not intended to be a practical usage of TensorFlow, its just something I did and am writing about here :-).

To demonstrate the code, I will use the 3-state Busy Beaver example from the Wiki page.

Encoding the Problem

Each of the states are encoded as real numbers (integers in this case, but for whatever reasons you could use floats too). So ‘A’, ‘B’, ‘C’ and ‘HALT’ become 0, 1, 2 and 3 respectively. Accordingly, the initial and final states are defined as follows:

initial_state = 0
final_states = [3]

Since there can be multiple final states generally, final_states is defined as a list. As with the states, the potential symbols are numbers too. The blank symbol, as per the Busy Beaver problem definition, is 0.

blank_input = 0

Now we come to the transition function. It is defined using two 2-D NumPy matrices, one for the inputs and the other for the corresponding outputs.

import numpy as np

transition_input = np.array([[0, 0],
                             [0, 1],
                             [1, 0],
                             [1, 1],
                             [2, 0],
                             [2, 1]])

transition_output = np.array([[1, 1, 1],
                              [2, 1, -1],
                              [0, 1, -1],
                              [1, 1, 1],
                              [1, 1, -1],
                              [3, 1, 1]])

Consider the transition_input[2] and transition_output[2]. The input denotes [current state, current symbol]. The output denotes [next state, next symbol, movement]. The movement value can either be 1(move pointer right, or increment index value) or -1(move pointer left, or decrement index value). Accordingly, transition_input[2] requires the current state to be 1 and current symbol being read to be 0. If these conditions are met, then the next state and next symbol are 0 and 1 respectively, with the pointer on the tape moving one step to the left.

The Turing Machine Code

The actual Turing Machine code follows. Its well documented as always, so you shouldn’t have a problem following whats going on if you are well-acquainted with Turing Machines and TensorFlow.

import tensorflow as tf


class TuringMachine(object):
    """
    Implementation of a basic Turing Machine using TensorFlow.
    """

    def __init__(self, initial_state, final_states, blank_input,
                 transition_input, transition_output):
        """
        Initializes the TensorFlow Graph required for running the Turing
        Machine.

        States and symbols should both be encoded as real numbers prior to
        initialization. Accordingly, 'initial_state' should be a float
        denoting the starting state. 'final_states' should be an iterable
        of numbers, denoting all the states the Machine can halt at.
        'blank_input' should be a float denoting the blank input for the
        Machine.
        'transition_input' and 'transition_output' will together denote
        the transition function followed by the Machine. Both should be
        2-D NumPy matrices. Each inner vector in 'transition_input' will
        contain [current state, current symbol], while each inner vector
        in 'transition_output' will have [next state, next symbol, move].
        "move" can either be 1: to increase pointer index by 1 (move
        right), or -1: decrease pointer index by 1 (move left).

        """
        
        #Store the necessary information as attributes
        self._initial_state = float(initial_state)
        self._final_states = set(final_states)
        self._blank_input = blank_input

        #Initialize Graph
        self._graph = tf.Graph()

        n_transitions = len(transition_input)
        if len(transition_input) != len(transition_output):
            raise ValueError("Number of transition inputs and outputs " +
                             "not same")

        #Initialize all components in the Graph
        with self._graph.as_default():

            #Placeholder for current symbol thats being read from the
            #tape
            self._current_symbol = tf.placeholder("float")

            #Variable to hold current state
            self._current_state = tf.Variable(self._initial_state)

            #Constant Ops for transitions
            transition_in = tf.constant(transition_input)
            transition_out = tf.constant(transition_output)

            #Generate Ops to compute transition
            concat_in = tf.pack([self._current_state,
                                 self._current_symbol])
            packed_in = tf.pack([concat_in for i in
                                 range(len(transition_input))])
            transition_diff = tf.reduce_sum(
                tf.pow(tf.sub(packed_in, transition_input), 2), 1)
            #This will return a 0 if theres an accurate match for the
            #transition condition (current state, current symbol). If not,
            #this will return a non-zero value.
            self._transition_match = tf.reduce_min(transition_diff, 0)
            #This will correspond to the index of the match
            match_index = tf.cast(tf.argmin(transition_diff, 0), "int32")
            #Pick out the transition output
            self._transition = tf.reshape(
                tf.slice(transition_out,  tf.pack([match_index, 0]),
                         tf.pack([1, 3])), [3])
            #This will change state Variable
            self._state_change = tf.assign(
                self._current_state, tf.reshape(
                    tf.cast(tf.slice(self._transition, tf.constant([0]),
                                     tf.constant([1])), "float32"), []))

            #This will reset the state Variable
            self._state_reset = tf.assign(self._current_state,
                                          self._initial_state)

            #Initialize Session
            self._sess = tf.Session()
 
            #Initialize Variables
            init_op = tf.initialize_all_variables()
            self._sess.run(init_op)

    def run(self, input_tape, start_index, max_iter=1000):
        """
        Runs the Turing Machine with the given input.
        'input_tape' must be an index-able object, with 'start_index'
        denoting the index number to start computation at.

        Obviously, beware of inputs that may cause the machine to run
        indefinitely. To control this, set 'max_iter' to some number of
        iterations you would want to allow.

        Will return final state and final index. If Machine halts at a
        non-final state, or goes beyond max_iter, i.e. the Machine does
        not accept the input, will return None, None.
        The input tape will be modified in-place.
        """

        #First reset the state
        self._sess.run(self._state_reset)

        #Run the Machine
        index = start_index
        for i in range(max_iter):
            #Figure out input to be given
            if (index > -1 and len(input_tape) > index):
                machine_input = input_tape[index]
            else:
                machine_input = self._blank_input

            #Run one step of the Machine
            _, match_value, transition = self._sess.run(
                [self._state_change, self._transition_match,
                 self._transition],
                feed_dict = {self._current_symbol: machine_input})

            #If match_value is not zero, return None, None
            if match_value != 0:
                return None, None

            #First modify tape contents at current index as per transition
            if (index > -1 and len(input_tape) > index):
                #Index is within input tape limits
                input_tape[index] = transition[1]
            elif index == len(input_tape):
                #Index has gone beyond the range of the input tape,
                #to the right
                input_tape.append(transition[1])
            else:
                #Index has gone beyond the range of the input tape,
                #to the left
                input_tape.insert(0, transition[1])
                index = 0

            #Change index as per transition
            index = index + transition[2]

            #If its a final state, return state, index
            if transition[0] in self._final_states:
                return transition[0], index

        return None, None

A few things:

1. If the tape ‘pointer’ goes beyond the limits of input_tape, the blank input is automatically given to the machine (lines 114-118). Moreover,  depending on which end of the tape the pointer has gone across, the output value from the machine is inserted into the list (lines 130-142).

2. Since there is no way to check if the Machine will stop, the run method lets the user provide a maximum number of iterations. If the machine keeps running beyond the limit, (None, None) is returned.

3. The input_tape list is modified in-place.

4. If you want to track the transitions the Turing Machine follows, add a ‘print’ statement (or some kind of logging) at line 125.

Running the Turing Machine

First we define the contents of the input tape, and the index to start at.

>>> input_tape = [0, 1, 1, 1, 1, 1]
>>> start_index = 0

Then, we initialize a TuringMachine instance with the algorithm as encoded above:

>>> machine = TuringMachine(initial_state, final_states, blank_input,
                        transition_input, transition_output)

Finally, we ‘run’ the machine, and check if the input tape has been modified correctly:

>>> machine.run(input_tape, start_index)
(3, 5)
>>> input_tape
[1, 1, 1, 1, 1, 1, 1]
>>> machine.run(input_tape, start_index)
(3, 9)
>>> input_tape
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Thats it! This is my first implementation of a Turing Machine, so do leave a comment if there are any errors/improvements possible. Cheers!

Predicting Trigonometric Waves few steps ahead with LSTMs in TensorFlow

I have recently been revisiting my study of Deep Learning, and I thought of doing some experiments with Wave prediction using LSTMs. This is nothing new, just more of a log of some tinkering done using TensorFlow.

The Problem

The basic input to the model is a 2-D vector – each number corresponding to the value attained by the corresponding wave. Each wave in turn is: (a constant + a sine wave + a cosine wave). The waves themselves have different magnitudes, initial phases and frequencies. The goal is to predict the values that will be attained a certain (I chose 23) steps ahead on the curve.

So first off, heres the wave-generation code:


##Producing Training/Testing inputs+output
from numpy import array, sin, cos, pi
from random import random

#Random initial angles
angle1 = random()
angle2 = random()

#The total 2*pi cycle would be divided into 'frequency'
#number of steps
frequency1 = 300
frequency2 = 200
#This defines how many steps ahead we are trying to predict
lag = 23

def get_sample():
    """
    Returns a [[sin value, cos value]] input.
    """
    global angle1, angle2
    angle1 += 2*pi/float(frequency1)
    angle2 += 2*pi/float(frequency2)
    angle1 %= 2*pi
    angle2 %= 2*pi
    return array([array([
        5 + 5*sin(angle1) + 10*cos(angle2),
        7 + 7*sin(angle2) + 14*cos(angle1)])])


sliding_window = []

for i in range(lag - 1):
    sliding_window.append(get_sample())


def get_pair():
    """
    Returns an (current, later) pair, where 'later' is 'lag'
    steps ahead of the 'current' on the wave(s) as defined by the
    frequency.
    """

    global sliding_window
    sliding_window.append(get_sample())
    input_value = sliding_window[0]
    output_value = sliding_window[-1]
    sliding_window = sliding_window[1:]
    return input_value, output_value

Essentially, you just need to call get_pair to get an ‘input, output’ pair – the output being 23 time intervals ahead on the curve. Each have the NumPy dimensionality of [1, 2]. The first value ‘1’ means that the batch size is 1 – we will feed one input at a time while training/testing.

Now, I don’t pass the input directly into the LSTM. I try to improve the LSTM’s understanding of the input, by providing its first and second derivatives as well. So, if the input at time t is x(t), the derivative is x'(t) = (x(t) – x(t-1)). Following the analogy, x”(t) = (x'(t) – x'(t-1)). Here’s the code for that:


#Input Params
input_dim = 2

#To maintain state
last_value = array([0 for i in range(input_dim)])
last_derivative = array([0 for i in range(input_dim)])


def get_total_input_output():
    """
    Returns the overall Input and Output as required by the model.
    The input is a concatenation of the wave values, their first and
    second derivatives.
    """
    global last_value, last_derivative
    raw_i, raw_o = get_pair()
    raw_i = raw_i[0]
    l1 = list(raw_i)
    derivative = raw_i - last_value
    l2 = list(derivative)
    last_value = raw_i
    l3 = list(derivative - last_derivative)
    last_derivative = derivative
    return array([l1 + l2 + l3]), raw_o

So the overall input to the model becomes a concatenated version of x(t), x'(t), x”(t). The obvious question to ask would be- Why not do this in the TensorFlow Graph itself? I did try it, and for some reason (which I don’t understand yet), there seems to seep in some noise into the Variables that act as memory units to maintain state.

But anyways, here’s the code for that too:


#Imports
import tensorflow as tf
from tensorflow.models.rnn.rnn import *

#Input Params
input_dim = 2


##The Input Layer as a Placeholder
#Since we will provide data sequentially, the 'batch size'
#is 1.
input_layer = tf.placeholder(tf.float32, [1, input_dim])

##First Order Derivative Layer
#This will store the last recorded value
last_value1 = tf.Variable(tf.zeros([1, input_dim]))
#Subtract last value from current
sub_value1 = tf.sub(input_layer, last_value1)
#Update last recorded value
last_assign_op1 = last_value1.assign(input_layer)

##Second Order Derivative Layer
#This will store the last recorded derivative
last_value2 = tf.Variable(tf.zeros([1, input_dim]))
#Subtract last value from current
sub_value2 = tf.sub(sub_value1, last_value2)
#Update last recorded value
last_assign_op2 = last_value2.assign(sub_value1)

##Overall input to the LSTM
#x and its first and second order derivatives as outputs of
#earlier layers
zero_order = last_assign_op1
first_order = last_assign_op2
second_order = sub_value2
#Concatenated
total_input = tf.concat(1, [zero_order, first_order, second_order])

If you have an idea of what might be going wrong, do leave a comment! In any case, the core model follows.

 

The Model

So heres the the TensorFlow model:

1) The Imports:


#Imports
import tensorflow as tf
from tensorflow.models.rnn.rnn import *

 

2) Our input layer, as always, will be a Placeholder instance with the appropriate type and dimensions:


#Input Params
input_dim = 2

##The Input Layer as a Placeholder
#Since we will provide data sequentially, the 'batch size'
#is 1.
input_layer = tf.placeholder(tf.float32, [1, input_dim*3])

 

3) We then define out LSTM layer. If you are new to Recurrent Neural Networks or LSTMs, here are two excellent resources:

  1. This blog post by Christopher Olah
  2. This deeplearning.net post. It defines the math behind the LSTM cell pretty succinctly.

If you like to see implementation-level details too, then heres the relevant portion of the TensorFlow source for you.

Now the LSTM layer:


##The LSTM Layer-1
#The LSTM Cell initialization
lstm_layer1 = rnn_cell.BasicLSTMCell(input_dim*3)
#The LSTM state as a Variable initialized to zeroes
lstm_state1 = tf.Variable(tf.zeros([1, lstm_layer1.state_size]))
#Connect the input layer and initial LSTM state to the LSTM cell
lstm_output1, lstm_state_output1 = lstm_layer1(input_layer, lstm_state1,
                                              scope=&quot;LSTM1&quot;)
#The LSTM state will get updated
lstm_update_op1 = lstm_state1.assign(lstm_state_output1)

We only use 1 LSTM layer. Providing a scope to the LSTM layer call (on line 8) helps in avoiding variable-scope conflicts if you have multiple LSTM layers.

The LSTM layer is followed by a simple linear regression layer, whose output becomes the final output.


##The Regression-Output Layer1
#The Weights and Biases matrices first
output_W1 = tf.Variable(tf.truncated_normal([input_dim*3, input_dim]))
output_b1 = tf.Variable(tf.zeros([input_dim]))
#Compute the output
final_output = tf.matmul(lstm_output1, output_W1) + output_b1

 

We have finished defining the model itself. But now, we need to initialize the training components. These help fine-tune the parameters/state of the model to make it ready for deployment. We won’t be using these components post training (ideally).

 

4) First, a Placeholder for the correct output associated with the input:


##Input for correct output (for training)
correct_output = tf.placeholder(tf.float32, [1, input_dim])

Then, the error will be computed using the LSTM output and the correct output as the Sum-of-Squares loss.


##Calculate the Sum-of-Squares Error
error = tf.pow(tf.sub(final_output, correct_output), 2)

Finally, we initialize an Optimizer to adjust the weights for the LSTM layer. I tried Gradient Descent, RMSProp as well as Adam Optimization. Adam works best for this model. Gradient Descent works really bad on LSTMs for some reason (that I can’t grasp right now). If you want to read more about Adam-Optimization, read this paper. I decided on the learning rate of 0.0006 after a lot of trial-and-error, and it seems to work best for the number of iterations I use (100k).


##The Optimizer
#Adam works best
train_step = tf.train.AdamOptimizer(0.0006).minimize(error)

 

5) Finally, we initialize the Session and all required Variables as always.


##Session
sess = tf.Session()
#Initialize all Variables
sess.run(tf.initialize_all_variables())

The Training

Here’s the rudimentary code I used for training the model:


##Training

actual_output1 = []
actual_output2 = []
network_output1 = []
network_output2 = []
x_axis = []


for i in range(80000):
    input_v, output_v = get_total_input_output()
    _, _, network_output = sess.run([lstm_update_op1,
                                     train_step,
                                     final_output],
                                    feed_dict = {
                                        input_layer: input_v,
                                        correct_output: output_v})

    actual_output1.append(output_v[0][0])
    actual_output2.append(output_v[0][1])
    network_output1.append(network_output[0][0])
    network_output2.append(network_output[0][1])
    x_axis.append(i)

import matplotlib.pyplot as plt
plt.plot(x_axis, network_output1, 'r-', x_axis, actual_output1, 'b-')
plt.show()
plt.plot(x_axis, network_output2, 'r-', x_axis, actual_output2, 'b-')
plt.show()

Training takes almost a minute on my Intel i5 machine.

Consider the first wave. Initially, the network output is far from the correct one(The red one is the LSTM output):

w1_init

But by the end, it fits pretty well:

w1_final

Similar trends are seen for the second wave:

w2_overall

Testing

In practical scenarios, the state at which you end training would rarely be the state at which you deploy. Therefore, prior to testing, I ‘fastforward’ both the waves first. Then, I flush the contents of the LSTM cell (mind you, the learned matrix parameters for the individual functions don’t change).


##Testing

for i in range(200):
    get_total_input_output()

#Flush LSTM state
sess.run(lstm_state1.assign(tf.zeros([1, lstm_layer1.state_size])))

And here’s the rest of the testing code:


actual_output1 = []
actual_output2 = []
network_output1 = []
network_output2 = []
x_axis = []


for i in range(1000):
    input_v, output_v = get_total_input_output()
    _, network_output = sess.run([lstm_update_op1,
                                  final_output],
                                 feed_dict = {
                                     input_layer: input_v,
                                     correct_output: output_v})

    actual_output1.append(output_v[0][0])
    actual_output2.append(output_v[0][1])
    network_output1.append(network_output[0][0])
    network_output2.append(network_output[0][1])
    x_axis.append(i)

import matplotlib.pyplot as plt
plt.plot(x_axis, network_output1, 'r-', x_axis, actual_output1, 'b-')
plt.show()
plt.plot(x_axis, network_output2, 'r-', x_axis, actual_output2, 'b-')
plt.show()

Its pretty similar to the training one, except for one small difference: I don’t run the training op anymore. Therefore, those components of the Graph don’t work at all.

Here’s the correct output with the model’s output for the first wave:

w1_test

And the second wave:w2_test

Thats all for now! I am not a deep learning expert, and I still experimenting with RNNs, so do leave comments/suggestions if you have any! Cheers!

Redis + Celery: Reactive Computing in Django for IoT applications

To quote Wikipedia, Reactive Programming is a programming paradigm oriented around data flows and the propagation of change.

Consider the simple equation a = b + c. This relationship defines a flow of data, in that it describes how the data for a is computed given the data for b and c. Suppose the values of b and c at time t=0 are 4 and 5 respectively. Therefore, a_{t=0} = 9. Now, lets say that the value of b changes at t=1 so that b_{t=1} = 10. By the definition of a mentioned before, it follows that a_{t=1}=15. Thus, the change in the value of b got propagated, causing the value of its ‘parent’ a to change. I use the word parent because if we were to draw the relationship between a, b and c as a tree, a be the parent of b and c (like in a data flow graph). A common example of reactive programming is seen in spreadsheet software like MS-Excel, where the change in the value of a particular cell causes a change in the values of cells that depend on it by some pre-defined equation.

RxJS and Meteor are two javascript-based frameworks that implement reactive programming for web-based applications.

Reactivity, in general, is a paradigm that’s difficult to find in Python-based systems, and that includes Django. This post describes a method to perform mathematical computations (not reactive programming in general, just mathematical calculations) in a reactive manner in Django-based systems. I will soon describe the exact problem solved by my code, with suggestions on how to modify it to suit your needs.

Prerequisites

Heres the prerequisites, apart from Django:

i. We will use Celery to perform tasks asynchronously.

ii. We will use Redis as for temporary data storage, and synchronization.

This is a good place to look if you want to set up both.

iii. We will use python-redis-lock to use Redis for locking mechanisms. Locking ensures that one data update/read doesn’t get messed up due to another update that’s running in parallel.

The ‘Problem Statement’

Consider the two equations:

A = C + D + E …(1)

B = A * F …(2)

We will refer to A, B, C, D, E and F as computational nodes. C, D, E and F are input values that we acquire from certain sources of data (whatever they might be). They are essentially the independent variables in this context. A‘s value is derived C, D and E (making it their ‘parent’ node). You can also say that C, D and E are the arguments for A. B in turn has two dependencies, A and another independent variable F.

You don’t know when you will receive the values of the independent variables. You also don’t know what order they will arrive in. Some of them may come together, some very frequently, etc. etc. Lets say that your application dictates that the value of any node remains valid for about a minute its received or computed. Whenever you have had the latest values of C, D and E come within the past 60 seconds, you would want to compute the value of A. The same logic applies to B.

Once the latest value of A is computed, you would like the node to ‘forget’ the values of its arguments(children). Therefore, whenever comes the next time that we have fresh values of C, D and E together, A will be calculated again. The same logic once again applies to B. The code can be easily modified (and I will mention how) so that this restriction is lifted. In that case, A would be recomputed whenever the value of any of C, D and E changes, provided that the other values are still valid.

Whenever you receive/compute the value of a computational node (whether independent or derived), you would want to store its value – along with the appropriate timestamp – in your database.

Now lets see how this data-flow logic can be implemented in a flexible, efficient manner.

The Implementation

The whole functionality will be enclosed in a Django App called ‘reactive_compute’. It will have the following files:

i. models.py

ii. admin.py

iii. db.py

iv. functions.py

v. functions.py

and obviously

vi. __init__.py

First of all, we will need to define the derived nodes A and B, in terms of their Function and Number of Arguments. For example, A‘s function can be described as an Addition, with the number of arguments being 3. B on the other hand is a Multiplication, with 2 arguments. To remember this, we will first need to define the database model in models.py. You don’t store this information in Redis, since its ‘permanent’.

The model:

from django.db import models


class FunctionType(models.Model):
    """
    Stores information about the type of function/computation
    associated with a node.
    The function name should have a corresponding mapping in
    reactivecompute.functions.Functions.mapping
    """

    node = models.CharField('Node', max_length=25)
    functionname = models.CharField('Function Name', max_length=25)
    noofargs = models.IntegerField('Number of Arguments')

As you can see, functionname is a String that will act as an identifier for the Function. We will have “ADD” for Addition, and “MUL” for Multiplication. But there must be someplace to implement those functions. Thats where the functions.py files comes in.


##File with all computational functions


def add(args):
    """
    Addition Function.
    """
    return sum(args)


def sub(args):
    """
    Subtraction Function.
    """
    return (args[0] - args[1])


def mul(args):
    """
    Multiplication function.
    """
    prod = 1
    for arg in args:
        prod *= arg
    return prod


class Functions(object):
    """
    Class to store mapping between function name and function
    implementation.
    """

    mapping = {
        'ADD': add,
        'SUB': sub,
        'MUL': mul
        }

Adding the mapping as a dictionary to the Functions class allows easy importing. Each of the functions takes in the arguments(in appropriate order) as a list.

The following screenshot shows the appropriate database entries made for A and B in the Django Admin.

Screenshot from 2016-01-04 22:51:04

Now that we have defined the ‘nodes’ we need to direct the appropriate args to them. This is done by creating another model in models.py, for storing the data-flow information as tree ‘edges’. Every child is an argument for the parent.


class Dependency(models.Model):
    """
    Models a Parent-Child relationship between computational
    nodes. Also defines the order in which the children are to be
    arranged, to get the Parent's value.
    """

    parent = models.CharField('Parent Node', max_length=25)
    child = models.CharField('Child Node', max_length=25)
    argno = models.IntegerField('Argument Number')

 

argno doesn’t really matter for the operations of multiplication and addition, but for others(like subtraction), it might.

Heres a Django-Admin screenshot with the required entries for A and B.

Screenshot from 2016-01-04 22:52:05

As I mentioned earlier, we would like to store the values of the nodes, whenever they are updated. So heres a model for that:


class DataLog(models.Model):
    """
    Stores information about the type of function/computation
    associated with nodes.
    """

    node = models.CharField('Node', max_length=25)
    timestamp = models.DateTimeField('Time Stamp')
    value = models.FloatField('Value', null=True)

And heres the contents of the db.py file, that define a function to store the latest value of a node into the database:


from reactive_compute.models import DataLog
from django.utils import timezone


def save_node_value(node, value):
    """
    Saves the latest computed value of the given node.
    """
    data_entry = DataLog(node=str(node),
                         timestamp=timezone.now(),
                         value=value)
    data_entry.save()

To present a complete example, the following are the contents of the admin.py file.

from django.contrib import admin
from reactive_compute.models import *


class DependencyAdmin(admin.ModelAdmin):
    list_display = ('parent', 'child', 'argno')
    list_filter = ['parent', 'child']
    search_fields = ['parent', 'child']

admin.site.register(Dependency, DependencyAdmin)


class FunctionTypeAdmin(admin.ModelAdmin):
    list_display = ('node', 'functionname', 'noofargs')
    list_filter = ['functionname']
    search_fields = ['node', 'functionname']

admin.site.register(FunctionType, FunctionTypeAdmin)


class DataLogAdmin(admin.ModelAdmin):
    list_display = ('node', 'timestamp', 'value')
    list_filter = ['node', 'timestamp']
    search_fields = ['node']

admin.site.register(DataLog, DataLogAdmin)

And finally, we come to the crux of the implementation – the tasks.py file. If you read the resource I mentioned for Celery installation, you know that the Celery task is defined in this file. I have heavily commented it, so do go through the in-line docs.


from __future__ import absolute_import
from celery import shared_task
from reactive_compute.models import Dependency, FunctionType
from reactive_compute.db import save_node_value
from reactive_compute.functions import Functions
from redis import StrictRedis
import redis_lock


@shared_task
def compute_nodes(nodes, timeout=60):
    """
    Computes values for all computational nodes as defined in the
    FunctionType model. Recursive calls are made based on information
    present in respective Dependency entries.

    'nodes' should be a dict mapping name of computational node, to:
    A floating point number as input, OR
    None, if the value is to be (possibly) computed.
    'timeout' defines the time interval for which the values of the
    nodes mentioned in 'nodes' will be valid. Default is a minute.
    """

    #First handle the boundary case
    if len(nodes) == 0:
        return None

    #Get a connection to Redis
    conn = StrictRedis()

    #This will contain all the parent nodes that we will try to compute
    #recursively based on the args currently provided.
    parents = set([])

    #Default initialization for Celery
    value = None

    #Iterate over all nodes
    for node in nodes:
        ##First obtain the value of the node
        if nodes[node] is not None:
            #Value has been provided as input
            try:
                #Ensure the given value can be parsed/converted to
                #a float.
                value = float(nodes[node])
            except:
                #If representing the value as a float fails,
                #skip this one.
                continue
        else:
            #Value has to be computed.

            #First acquire lock for the particular node.
            #This ensures that the inputs needed for computing
            #the current node don't get modified midway(unless
            #one expires, in which case the computation may or may
            #not go through).
            lock = redis_lock.RedisLock(conn, node + '_lock')
            if lock.acquire():
                try:
                    #This will indicate if all args are present for
                    #computation of the result
                    all_args_flag = True
                    #This will store all the required arguments in order
                    args = []
                    #Get the pertinent FunctionType instance
                    func_info = FunctionType.objects.get(node=node)
                    #Iterate over all arguments
                    for i in range(func_info.noofargs):
                        #Get Redis value
                        v = conn.get(node + '_' + `i`)
                        if v is None or v == 'None':
                            #If value not present stop iterations
                            all_args_flag = False
                            break
                        else:
                            args.append(float(v))
                    #If any arg was absent, abort processing of this node
                    if not all_args_flag:
                        continue
                    #Compute the value, since all args are present
                    value = Functions.mapping[func_info.functionname](
                        args)
                    #Delete info about current args
                    for i in range(func_info.noofargs):
                        conn.delete(node + '_' + `i`)
                except:
                    pass
                finally:
                    #Release lock
                    lock.release()

        ##Now that the value has been obtained, update the args info
        ##for all parent nodes
        parent_objs = Dependency.objects.filter(child=node)
        for parent_obj in parent_objs:
            #Get lock for parent
            lock = redis_lock.RedisLock(conn, parent_obj.parent + '_lock')
            if lock.acquire():
                try:
                    #Set value
                    conn.set(parent_obj.parent + '_' + `parent_obj.argno`,
                             value)
                    #Set expiry time
                    conn.expire(parent_obj.parent + '_' + `parent_obj.argno`,
                                timeout)
                except:
                    pass
                finally:
                    #Release lock
                    lock.release()
            #Add this parent to the set of parents to process
            parents.add(parent_obj.parent)

        #Save value in database as needed
        save_node_value(node, value)

    #Make the recursive call on parent nodes
    compute_nodes.delay(dict((parent, None) for parent in parents),
                        timeout)

Given below is an example of using the above task in a simple Django view. It accepts the complete input as a stringified JSON object (like “\{"C": 4, "D": 5\}“), and makes a call to the Celery task.

from django.http import HttpResponse
from django.views.decorators.csrf import csrf_exempt
from reactive_compute.tasks import compute_nodes


@csrf_exempt
def input_values(request):
    #Get JSON-ed values dictionary
    value_dict = eval(request.GET.get('input'))
    #Make Celery call
    compute_nodes.delay(value_dict)

    return HttpResponse("OK")

Lets run the server and see how things work.
Heres that the data-logs look like given the input “\{"C": 4, "D": 5\}“.

Screenshot from 2016-01-04 22:55:12

Within a minute, we provide “\{"E": 6\}“.

Screenshot from 2016-01-04 22:55:29

Notice how A gets computed as well. Now let us provide all inputs, with new values: “\{"C": 1, "D": 1, "E": 1, "F": 3\}“.

Screenshot from 2016-01-04 22:56:08

As you must have observed, B is computed with the latest value of A, which is 3.

One unusual flexibility that this codes provide, is that we can update the values of nodes that are ‘usually’ derived, manually. Heres what happens if we provide “\{"A": 4, "F": 5\}“.

Screenshot from 2016-01-04 22:56:33

You don’t need to Copy-Paste this code anywhere, its all present in my github repo. Feel free to fork it and send a PR if you can modify it in some useful way!

Thoughts

1. The way its implemented, you have complete control over how each computational node works. I have used simple mathematical operations, but you could have your own metrics/functionality written in. As long as you define the arguments and order them right, things will work well.

2. I have worked with floating-point numbers, but your arguments could as well be vectors! Then you might have to run eval over your Redis values, instead of a simple float() conversion. But since every call is made as a Celery task, your runtime will still be optimized.

3. Earlier, I mentioned that you could remove the restriction of deleting previous values of arguments (as long as they are valid). You could do this pretty easily by removing lines 85-87 in the tasks.py code. If you don’t want your node values to have a sense of time-based validity at all, you could just remove lines 106 and 107 from tasks.py.

4. If you want different values to be stored in different ways into the database, you could add another parameter to the FunctionType model, that specified a String that denotes how information about that particular node is added to the database. Ideally, this database storing would also be a Celery task.

5. You could also store the Data-Flow diagram better using a No-SQL database such as MongoDB.

Thats all for now :-). Cheers!

 

Using Circular Self-Organizing Maps to solve the Symmetric TSP

Prerequisites:

1. Knowing what the Traveling Salesman Problem (TSP) is. The wiki article is a good place to start. The symmetrical form of the problem is where the distance from one city to another is the same in both directions.

TravelingSalesmanProblem_1000

2. A decent understanding of what Kohonen/Self-Organizing Maps are. This blog post will point you in the right direction to learn more.

After I wrote my former blog post on implementing SOMs with TensorFlow, someone on Reddit mentioned that there are ways to solve TSPs using them. When I did some research, it appeared that there have been a few attempts to do so. Heres one, and heres another. So in my spare time, I decided to try and have a crack at it myself.

I will describe my approach (along with the code) in a short while, but let me give a disclaimer first. This post in no way claims to have found the ‘best’ way to solve TSPs (ACOs are pretty much the state of the art). It just describes my attempt, and the results I obtained (which were interesting, with respect to the methodology implemented).

I obviously make a few assumptions, and here they are:

1. Every city has a corresponding vector available, denoting its coordinates/location in a 2D/3D space. This source presents many TSP benchmark data-sets in this format. I also tried assigning every city a vector where every dimension is its distance from all the cities(including itself). This gives pretty sub-par results, so I decided not to pursue that. Maybe you could use a dimensionality reduction technique like PCA on these ‘distance’ vectors and get good results.

2. Because of the above point, my method is pretty much restricted to solving the symmetric form of a TSP (atleast the way I have done it).

 

The Method

Suppose the total number of neurons is N. The total number of cities in question is C. After some trial and error, I set N = 2C. It can be higher/lower as well. The lower the number, the less optimal is the understanding of the solution space. And as you go higher, after a certain point, there seems to be no improvement in the output (but your execution time increases many-fold).

The individual neurons are provided locations around a circle, as follows:

l_i = ( cos(\frac{2i\pi}{N}), sin(\frac{2i\pi}{N}) )

where l_i is the location vector for the i th neuron in a 2-D space. i goes from 0 to N - 1. It is important to not confuse this space with the space in which the cities lie (even though both are 2-dimensional). Lets call one the Neuron-Space, and the other, the City-Space.

Because of the circular nature of this SOM, we inherently have a closed loop-structure to our framework (as is needed by the TSP solution), where neurons in proximity of each other in the Neuron-Space lie near-by in the City-Space. Also, do note how every neuron essentially corresponds to an angle ( \frac{2i\pi}{N}) around the circle. This point would be useful soon.

The initialization of the neuron vectors in City-Space is done by using the mean and standard deviation of the dimensions of the city vectors in City-Space. This is because I observed that the scale of all the dimensions in the city-vectors is pretty much the same. If it isn’t, you might have to resort to building a mean vector and covariance matrix. For training, I use \alpha = 0.3 and \sigma = 0.3 as parameters for the SOM. Putting a constant value of \sigma ensures that the same proportion of neurons around the Best-Matching Unit get affected everytime, irrespective of the total number of neurons being used.

Once the SOM trained, we can say we have the rough ‘skeleton’ for a good route in the City-Space. But how do you convert this skeleton to the appropriate route? By using those angles I just talked about. Consider a city c. Lets say the k nearest neurons to it, are n_1, n_2, ..., n_k.

So, we can construct a vector for every city in Neuron-Space, using this simple formula:

l_c = \sum_k \frac{1}{d_{c, n_k}} l_c

where d_{c, n_k} is the distance between city c and neuron n_k in City-Space. Using this vector and some pretty simple (inverse) trigonometry, you can assign the city an angle around the circle on which the SOM-neurons themselves lie in Neuron-Space. Now all you need to do to come up with an approximately good solution, is just sort the cities with respect to the angles they are assigned. Voila!

Ofcourse, the solution that this algorithm gives you would most likely be in the neighbourhood of a slightly better one, so I use a standard TSP local search heuristic like 3-opt over the solution obtained from my algorithm. What 3-opt essentially does, is change around the solution obtained from some other algorithm over so slightly in some pre-defined ways, in the hope of hitting on a better solution. 3-opt is better than 2-opt for this algorithm. I dint try higher-order k-opt searches.

One interesting possibility about this algorithm, is the fact that it could provide great solutions to solving TSPs for different target cities, without retraining the SOM– as long as they lie in the same City-Space. As long as the set of cities you try to solve for follows (atleast approximately) the trends shown by the dataset used for training, all you would need to do is compute their respective angles (which does not involve re-training), and just apply 3-opt. In short, you could use the same SOM, once trained, to solve for multiple subsets of cities.

The Implementation

This code builds on my Tensorflow-based SOM code. Its pretty rough around the edges, especially the 3-opt search. It was meant just as a proof-of-concept, so feel free to hack around if you like.


import tensorflow as tf
import numpy as np


def _distance(tsp_solution, distance_matrix):
    """
    Total distance for a solution.
    """
    total_distance = 0
    for i in range(len(tsp_solution)-1):
        total_distance += (distance_matrix[tsp_solution[i]][
            tsp_solution[i + 1]])
    total_distance += (distance_matrix[tsp_solution[0]][
        tsp_solution[-1]])
    return total_distance


def _3_opt(route, distance_matrix):
    """
    Uses 3-opt to find a better solution in the neighbourhood.
    """

    current_dist = _distance(route, distance_matrix)
    #Find three connections to break
    for i in range(len(route) - 1):
        for j in range(i + 2, len(route) - 1):
            for k in range(j + 2, len(route) - 1):
                #Now reconnect the graph in every other way possible,
                #and see if any gives a better solution
                #Way 1
                temp = route[:]
                temp[j+1:k+1] = reversed(route[j+1:k+1])
                dist = _distance(temp, distance_matrix)
                if dist < current_dist:
                    route = temp
                    current_dist = dist
                #Way 2
                temp = route[:]
                temp[i+1:j+1] = reversed(route[i+1:j+1])
                dist = _distance(temp, distance_matrix)
                if dist < current_dist:
                    route = temp
                    current_dist = dist
                #Way 3
                temp = route[:]
                temp[i+1:j+1], temp[j+1:k+1] = (
                    reversed(route[i+1:j+1]), reversed(route[j+1:k+1]))
                dist = _distance(temp, distance_matrix)
                if dist < current_dist:
                    route = temp
                    current_dist = dist
                #Way 4
                temp = (route[:i+1] + route[j+1:k+1] +
                        route[i+1:j+1] + route[k+1:])
                dist = _distance(temp, distance_matrix)
                if dist < current_dist:
                    route = temp
                    current_dist = dist
                #Way 5
                temp = route[:i+1] + route[j+1:k+1]
                temp += reversed(route[i+1:j+1])
                temp += route[k+1:]
                dist = _distance(temp, distance_matrix)
                if dist < current_dist:
                    route = temp
                    current_dist = dist
                #Way 6
                temp = route[:i+1]
                temp += reversed(route[j+1:k+1])
                temp += route[i+1:j+1]
                temp += route[k+1:]
                dist = _distance(temp, distance_matrix)
                if dist < current_dist:
                    route = temp
                    current_dist = dist
                #Way 7
                temp = route[:i+1]
                temp += reversed(route[j+1:k+1])
                temp += reversed(route[i+1:j+1])
                temp += route[k+1:]
                dist = _distance(temp, distance_matrix)
                if dist < current_dist:
                    route = temp
                    current_dist = dist
    return route

 
class TSPSolver(object):
    """
    Uses a Circular Self-Organizing Map with a Gaussian Neighbourhood
    function and linearly decreasing learning rate, to solve TSP.
    """
 
    #To check if the SOM has been trained
    _trained = False
 
    def __init__(self, input_vects, n_iterations=200):
        """
        Initializes all necessary components of the TensorFlow
        Graph.

        'input_vects' should be a list of 1-D NumPy vectors denoting
        location vectors for the cities.
        'n_iterations' is an integer denoting the number of iterations
        that would be run while training the SOM.
        """
 
        #Assign required variables first
        self._n = len(input_vects)
        dim = len(input_vects[0])
        self._n_centroids = self._n*3
        alpha = 0.3
        sigma = 0.3
        self._n_iterations = abs(int(n_iterations))

        #For initialization of the centroid(neuron) vectors
        init_values = []
        for vect in input_vects:
            init_values.extend(list(vect))
        init_mean = np.mean(init_values)
        init_dev = np.sqrt(np.var(init_values))
 
        ##INITIALIZE GRAPH
        self._graph = tf.Graph()
 
        ##POPULATE GRAPH WITH NECESSARY COMPONENTS
        with self._graph.as_default():
 
            ##VARIABLES AND CONSTANT OPS FOR DATA STORAGE
 
            #Randomly initialized weightage vectors for all neurons,
            #stored together as a matrix Variable of size [self._n, dim]
            self._weightage_vects = tf.Variable(tf.random_normal(
                [self._n_centroids, dim], init_mean, init_dev))
 
            #Matrix of size [self._n, 2] for SOM grid locations
            #of neurons
            self._location_vects = tf.constant(np.array(
                list(self._neuron_locations(self._n_centroids))))
 
            ##PLACEHOLDERS FOR TRAINING INPUTS
            #We need to assign them as attributes to self, since they
            #will be fed in during training
 
            #The training vector
            self._vect_input = tf.placeholder("float", [dim])
            #Iteration number
            self._iter_input = tf.placeholder("float")
 
            ##CONSTRUCT TRAINING OP PIECE BY PIECE
            #Only the final, 'root' training op needs to be assigned as
            #an attribute to self, since all the rest will be executed
            #automatically during training
 
            #To compute the Best Matching Unit given a vector
            #Basically calculates the Euclidean distance between every
            #neuron's weightage vector and the input, and returns the
            #index of the neuron which gives the least value
            bmu_index = tf.argmin(tf.sqrt(tf.reduce_sum(
                tf.pow(tf.sub(self._weightage_vects, tf.pack(
                    [self._vect_input for i in range(
                        self._n_centroids)])), 2), 1)),
                                  0)
 
            #This will extract the location of the BMU based on the BMU's
            #index
            slice_input = tf.pad(tf.reshape(bmu_index, [1]),
                                 np.array([[0, 1]]))
            bmu_loc = tf.reshape(tf.slice(self._location_vects, slice_input,
                                          tf.constant(np.array([1, 2]))),
                                 [2])
 
            #To compute the alpha and sigma values based on iteration
            #number
            learning_rate_op = tf.sub(1.0, tf.div(self._iter_input,
                                                  self._n_iterations))
            _alpha_op = tf.mul(alpha, learning_rate_op)
            _sigma_op = tf.mul(sigma, learning_rate_op)
 
            #Construct the op that will generate a vector with learning
            #rates for all neurons, based on iteration number and location
            #wrt BMU.
            bmu_distance_squares = tf.reduce_sum(tf.pow(tf.sub(
                self._location_vects, tf.pack(
                    [bmu_loc for i in range(self._n_centroids)])), 2), 1)
            neighbourhood_func = tf.exp(tf.neg(tf.div(tf.cast(
                bmu_distance_squares, "float32"), tf.pow(_sigma_op, 2))))
            learning_rate_op = tf.mul(_alpha_op, neighbourhood_func)
 
            #Finally, the op that will use learning_rate_op to update
            #the weightage vectors of all neurons based on a particular
            #input
            learning_rate_multiplier = tf.pack([tf.tile(tf.slice(
                learning_rate_op, np.array([i]), np.array([1])), [dim])
                                               for i in range(
                                                   self._n_centroids)])
            weightage_delta = tf.mul(
                learning_rate_multiplier,
                tf.sub(tf.pack([self._vect_input for i in range(
                    self._n_centroids)]),
                       self._weightage_vects))                                         
            new_weightages_op = tf.add(self._weightage_vects,
                                       weightage_delta)
            self._training_op = tf.assign(self._weightage_vects,
                                          new_weightages_op)                                       
 
            ##INITIALIZE SESSION
            self._sess = tf.Session()
 
            ##INITIALIZE VARIABLES
            init_op = tf.initialize_all_variables()
            self._sess.run(init_op)

            #Train the Solver
            self.train(input_vects)

    @property
    def solution(self):
        """
        Returns the TSP solution order as a list, and the total
        Euclidean distance with respect to the input vectors.
        """
        if not self._trained:
            raise ValueError("SOM not trained yet")
        return self._sol, self._sol_dist

    @classmethod
    def _neuron_locations(cls, n_centroids):
        """
        Yields one by one the 2-D locations of the individual neurons
        in the SOM.
        """

        for i in range(n_centroids):
            yield np.array([np.cos(i*2*np.pi/float(n_centroids)),
                            np.sin(i*2*np.pi/float(n_centroids))])
 
    def train(self, input_vects):
        """
        Trains the SOM.
        'input_vects' should be a list of 1-D NumPy arrays with
        dimensionality as provided during initialization of this SOM.
        The points in the solution are enumerated as they are ordered
        in 'input_vects'.
        Current weightage vectors for all neurons(initially random) are
        taken as starting conditions for training.
        """
 
        #Training iterations
        for iter_no in range(self._n_iterations):
            #Train with each vector one by one
            for input_vect in input_vects:
                self._sess.run(self._training_op,
                               feed_dict={self._vect_input: input_vect,
                                          self._iter_input: iter_no})
 
        self._input_vects = input_vects
        self._compute_solution()
        self._trained = True
 
    def _compute_solution(self):
        """
        Computes the solution to the TSP with respect to the input
        vectors.
        """
        #TODO: All nearest-neighbour searches could be speeded up
        #significantly with a KD-Tree/Ball Tree.

        #Get the centroid data 
        centroid_vects = list(self._sess.run(self._weightage_vects))
        centroid_locations = list(self._sess.run(self._location_vects))

        #Distance matrix mapping input point number to list of distances
        #from centroids
        distances = {}

        for point in range(self._n):
            distances[point] = []
            for centroid in range(self._n_centroids):
                distances[point].append(
                    np.linalg.norm(centroid_vects[centroid] -
                                   self._input_vects[point]))

        #Distance matrix mapping input point number to list of distances
        #from other input points
        point_distances = {}

        for point in range(self._n):
            point_distances[point] = []
            for other_point in range(self._n):
                point_distances[point].append(
                    np.linalg.norm(self._input_vects[other_point] -
                                   self._input_vects[point]))

        #Compute angle with respect to each city(point)
        point_angles = {}
        for point in range(self._n):
            total_vect = 0
            cents = [j for j in range(self._n_centroids)]
            cents.sort(key=lambda x: distances[point][x])
            for centroid in cents[:2]:
                total_vect += (1.0/(distances[point][centroid]) *
                               centroid_locations[centroid])
            total_vect = total_vect/np.linalg.norm(total_vect)
            if total_vect[0] > 0 and total_vect[1] > 0:
                angle = np.arcsin(total_vect[1])
            elif total_vect[0] < 0 and total_vect[1] > 0:
                angle = np.arccos(total_vect[0])
            elif total_vect[0] < 0 and total_vect[1] < 0:
                angle = np.pi - np.arcsin(total_vect[1])
            else:
                angle = 2*np.pi - np.arccos(total_vect[0])
            point_angles[point] = angle

        #Find the rough solution
        tsp_solution = [i for i in range(self._n)]
        tsp_solution.sort(key=lambda x: point_angles[x])

        tsp_solution = _3_opt(tsp_solution, point_distances)

        #Compute the total distance for the solution
        total_distance = _distance(tsp_solution, point_distances)

        self._sol = tsp_solution
        self._sol_dist = total_distance

Results

Each of these problem datasets can be found at the source here. All the presented results are over 10 runs. The max runtime was experienced with krA100, around one minute.

Berlin-52: Best Error: 0.0%, Average Error: 4.1%.

krA100: Best Error: 1.6%, Average Error: 2.7%.

eli76: Best Error: 2%, Average Error: 4.4%.

These results aren’t ‘close’ to those provided by ACOs, but they are practically feasible, and show a good promise (Oh now I just sound like I am writing a research paper.) Plus, the algorithm isn’t a time-hogger either (judging by runtimes on my laptop).

Anyways. Thanks for reading! Do drop a comment if you think of an improvement over this work.

Self-Organizing Maps with Google’s TensorFlow

[This post assumes that you know the basics of Google’s TensorFlow library. If you don’t, have a look at my earlier post to get started.]

kohonen1

A Self-Organizing Map, or SOM, falls under the rare domain of unsupervised learning in Neural Networks. Its essentially a grid of neurons, each denoting one cluster learned during training. Traditionally speaking, there is no concept of neuron ‘locations’ in ANNs. However, in an SOM, each neuron has a location, and neurons that lie close to each other represent clusters with similar properties. Each neuron has a weightage vector, which is equal to the centroid of its particular cluster.

AI-Junkie’s post does a great job of explaining how an SOM is trained, so I won’t re-invent the wheel.

The Code

Here’s my code for a 2-D version of an SOM. Its written with TensorFlow as its core training architecture: (Its heavily commented, so look at the inline docs if you want to hack/dig around)


import tensorflow as tf
import numpy as np


class SOM(object):
    """
    2-D Self-Organizing Map with Gaussian Neighbourhood function
    and linearly decreasing learning rate.
    """

    #To check if the SOM has been trained
    _trained = False

    def __init__(self, m, n, dim, n_iterations=100, alpha=None, sigma=None):
        """
        Initializes all necessary components of the TensorFlow
        Graph.

        m X n are the dimensions of the SOM. 'n_iterations' should
        should be an integer denoting the number of iterations undergone
        while training.
        'dim' is the dimensionality of the training inputs.
        'alpha' is a number denoting the initial time(iteration no)-based
        learning rate. Default value is 0.3
        'sigma' is the the initial neighbourhood value, denoting
        the radius of influence of the BMU while training. By default, its
        taken to be half of max(m, n).
        """

        #Assign required variables first
        self._m = m
        self._n = n
        if alpha is None:
            alpha = 0.3
        else:
            alpha = float(alpha)
        if sigma is None:
            sigma = max(m, n) / 2.0
        else:
            sigma = float(sigma)
        self._n_iterations = abs(int(n_iterations))

        ##INITIALIZE GRAPH
        self._graph = tf.Graph()

        ##POPULATE GRAPH WITH NECESSARY COMPONENTS
        with self._graph.as_default():

            ##VARIABLES AND CONSTANT OPS FOR DATA STORAGE

            #Randomly initialized weightage vectors for all neurons,
            #stored together as a matrix Variable of size [m*n, dim]
            self._weightage_vects = tf.Variable(tf.random_normal(
                [m*n, dim]))

            #Matrix of size [m*n, 2] for SOM grid locations
            #of neurons
            self._location_vects = tf.constant(np.array(
                list(self._neuron_locations(m, n))))

            ##PLACEHOLDERS FOR TRAINING INPUTS
            #We need to assign them as attributes to self, since they
            #will be fed in during training

            #The training vector
            self._vect_input = tf.placeholder("float", [dim])
            #Iteration number
            self._iter_input = tf.placeholder("float")

            ##CONSTRUCT TRAINING OP PIECE BY PIECE
            #Only the final, 'root' training op needs to be assigned as
            #an attribute to self, since all the rest will be executed
            #automatically during training

            #To compute the Best Matching Unit given a vector
            #Basically calculates the Euclidean distance between every
            #neuron's weightage vector and the input, and returns the
            #index of the neuron which gives the least value
            bmu_index = tf.argmin(tf.sqrt(tf.reduce_sum(
                tf.pow(tf.sub(self._weightage_vects, tf.pack(
                    [self._vect_input for i in range(m*n)])), 2), 1)),
                                  0)

            #This will extract the location of the BMU based on the BMU's
            #index
            slice_input = tf.pad(tf.reshape(bmu_index, [1]),
                                 np.array([[0, 1]]))
            bmu_loc = tf.reshape(tf.slice(self._location_vects, slice_input,
                                          tf.constant(np.array([1, 2]))),
                                 [2])

            #To compute the alpha and sigma values based on iteration
            #number
            learning_rate_op = tf.sub(1.0, tf.div(self._iter_input,
                                                  self._n_iterations))
            _alpha_op = tf.mul(alpha, learning_rate_op)
            _sigma_op = tf.mul(sigma, learning_rate_op)

            #Construct the op that will generate a vector with learning
            #rates for all neurons, based on iteration number and location
            #wrt BMU.
            bmu_distance_squares = tf.reduce_sum(tf.pow(tf.sub(
                self._location_vects, tf.pack(
                    [bmu_loc for i in range(m*n)])), 2), 1)
            neighbourhood_func = tf.exp(tf.neg(tf.div(tf.cast(
                bmu_distance_squares, "float32"), tf.pow(_sigma_op, 2))))
            learning_rate_op = tf.mul(_alpha_op, neighbourhood_func)

            #Finally, the op that will use learning_rate_op to update
            #the weightage vectors of all neurons based on a particular
            #input
            learning_rate_multiplier = tf.pack([tf.tile(tf.slice(
                learning_rate_op, np.array([i]), np.array([1])), [dim])
                                               for i in range(m*n)])
            weightage_delta = tf.mul(
                learning_rate_multiplier,
                tf.sub(tf.pack([self._vect_input for i in range(m*n)]),
                       self._weightage_vects))                                         
            new_weightages_op = tf.add(self._weightage_vects,
                                       weightage_delta)
            self._training_op = tf.assign(self._weightage_vects,
                                          new_weightages_op)                                       

            ##INITIALIZE SESSION
            self._sess = tf.Session()

            ##INITIALIZE VARIABLES
            init_op = tf.initialize_all_variables()
            self._sess.run(init_op)

    def _neuron_locations(self, m, n):
        """
        Yields one by one the 2-D locations of the individual neurons
        in the SOM.
        """
        #Nested iterations over both dimensions
        #to generate all 2-D locations in the map
        for i in range(m):
            for j in range(n):
                yield np.array([i, j])

    def train(self, input_vects):
        """
        Trains the SOM.
        'input_vects' should be an iterable of 1-D NumPy arrays with
        dimensionality as provided during initialization of this SOM.
        Current weightage vectors for all neurons(initially random) are
        taken as starting conditions for training.
        """

        #Training iterations
        for iter_no in range(self._n_iterations):
            #Train with each vector one by one
            for input_vect in input_vects:
                self._sess.run(self._training_op,
                               feed_dict={self._vect_input: input_vect,
                                          self._iter_input: iter_no})

        #Store a centroid grid for easy retrieval later on
        centroid_grid = [[] for i in range(self._m)]
        self._weightages = list(self._sess.run(self._weightage_vects))
        self._locations = list(self._sess.run(self._location_vects))
        for i, loc in enumerate(self._locations):
            centroid_grid[loc[0]].append(self._weightages[i])
        self._centroid_grid = centroid_grid

        self._trained = True

    def get_centroids(self):
        """
        Returns a list of 'm' lists, with each inner list containing
        the 'n' corresponding centroid locations as 1-D NumPy arrays.
        """
        if not self._trained:
            raise ValueError("SOM not trained yet")
        return self._centroid_grid

    def map_vects(self, input_vects):
        """
        Maps each input vector to the relevant neuron in the SOM
        grid.
        'input_vects' should be an iterable of 1-D NumPy arrays with
        dimensionality as provided during initialization of this SOM.
        Returns a list of 1-D NumPy arrays containing (row, column)
        info for each input vector(in the same order), corresponding
        to mapped neuron.
        """

        if not self._trained:
            raise ValueError("SOM not trained yet")

        to_return = []
        for vect in input_vects:
            min_index = min([i for i in range(len(self._weightages))],
                            key=lambda x: np.linalg.norm(vect-
                                                         self._weightages[x]))
            to_return.append(self._locations[min_index])

        return to_return

A few points about the code:

1) Since my post on K-Means Clustering, I have gotten more comfortable with matrix operations in TensorFlow. You need to be comfortable with matrices if you want to work with TensorFlow (or any data flow infrastructure for that matter, even SciPy). You can code pretty much any logic or operational flow with TensorFlow, you just need to be able to build up complex functionality from basic components(ops), and structure the flow of data(tensors/variables) well.

2) It took quite a while for me to build the whole graph in such a way that the entire training functionality could be enclosed in a single op. This op is called during each iteration, for every vector, during training. Such an implementation is more in line with TensorFlow’s way of doing things, than my previous attempt with clustering.

3) I have used a 2-D grid for the SOM, you can use any geometry you wish. You would just have to modify the _neuron_locations method appropriately, and also the method that returns the centroid outputs. You could return a dict that maps neuron location to the corresponding cluster centroid.

4) To keep things simple, I haven’t provided for online training. You could do that by having bounds for the learning rate(s).

Sample Usage

I have used PyMVPA’s example of RGB colours to confirm that the code does work. PyMVPA provides functionality to train SOMs too (along with many other learning techniques).

Here’s how you would do it with my code:


#For plotting the images
from matplotlib import pyplot as plt

#Training inputs for RGBcolors
colors = np.array(
     [[0., 0., 0.],
      [0., 0., 1.],
      [0., 0., 0.5],
      [0.125, 0.529, 1.0],
      [0.33, 0.4, 0.67],
      [0.6, 0.5, 1.0],
      [0., 1., 0.],
      [1., 0., 0.],
      [0., 1., 1.],
      [1., 0., 1.],
      [1., 1., 0.],
      [1., 1., 1.],
      [.33, .33, .33],
      [.5, .5, .5],
      [.66, .66, .66]])
color_names = \
    ['black', 'blue', 'darkblue', 'skyblue',
     'greyblue', 'lilac', 'green', 'red',
     'cyan', 'violet', 'yellow', 'white',
     'darkgrey', 'mediumgrey', 'lightgrey']

#Train a 20x30 SOM with 400 iterations
som = SOM(20, 30, 3, 400)
som.train(colors)

#Get output grid
image_grid = som.get_centroids()

#Map colours to their closest neurons
mapped = som.map_vects(colors)

#Plot
plt.imshow(image_grid)
plt.title('Color SOM')
for i, m in enumerate(mapped):
    plt.text(m[1], m[0], color_names[i], ha='center', va='center',
             bbox=dict(facecolor='white', alpha=0.5, lw=0))
plt.show()

Here’s a sample of the output you would get (varies each time you train, but the color names should go to the correct locations in the image):

figure_1

Recursively copying elements from one Graph to another in TensorFlow

In my previous post on Google’s TensorFlow, I had mentioned the idea of using the library for Genetic Programming applications. In my free time, I tried to map out how such an application would be run. The focus obviously wasn’t on building the smartest way to do GP, but rather on exploring if it was practically possible. One of the ideas that stuck out, was to have a Graph for each candidate solution – which would be populated based on cross-overed elements from the parent solutions’ Graphs. This would require a good API to copy computational elements from one Graph to another in TensorFlow. I tried digging around to see if such functionality was available, but couldn’t find any (atleast from a good exploring of the github repo).

I wrote some rudimentary code to accomplish this, and heres a basic outline of how it works:

Consider the example given on TensorFlow’s Get Started page.


import tensorflow as tf
import numpy as np

# Make 100 phony data points in NumPy.
x_data = np.float32(np.random.rand(2, 100)) # Random input
y_data = np.dot([0.100, 0.200], x_data) + 0.300

# Construct a linear model.
b = tf.Variable(tf.zeros([1]))
W = tf.Variable(tf.random_uniform([1, 2], -1.0, 1.0))
y = tf.matmul(W, x_data) + b

# Minimize the squared errors.
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)

# For initializing the variables.
init = tf.initialize_all_variables()

# Launch the graph
sess = tf.Session()
sess.run(init)

# Fit the plane.
for step in xrange(0, 201):
    sess.run(train)
    if step % 20 == 0:
        print step, sess.run(W), sess.run(b)

# Learns best fit is W: [[0.100  0.200]], b: [0.300]

Lets say you want to copy the main training element train to another graph as defined here:


to_graph = tf.Graph()

 

1) You first decide on a namespace inside which all the copied elements will exist in to_graph. This is not really required if the Graph you are copying to is empty. But its important to remember that element names matter a lot in TensorFlow’s workings. Therefore, to avoid naming conflicts, its better to define such a namespace. So basically, what would be called “C” in to_graph‘s namespace, would now be called “N/C” where “N” is the namespace String.

Lets assume the namespace we define the copied elements in our example to, is “CopiedOps”.


namespace = "CopiedOps"

 

2) You then copy all the variables first, one by one, using a dedicated function copy_variable_to_graph. Each time, you supply the original instance, the target graph (to_graph in this case), the namespace, and an extra dictionary called copied_variables. This dictionary is useful while copying the computational nodes (Operation instances, we will call them ops) in the next step. Since Variable instances act as inputs for ops, we need a way to keep track of them for later.

I initially wanted to combine this initialization of variables with the function that copies the computational elements, but I found it really tricky to capture the appropriate Variable instances based on their connections to ops. Anyways, since variables are more like parameter-storing units whose values are needed frequently, its better to initialize them separately.

The Variable instances in the above example are b and W. Heres how you would do it with my code:


copied_variables = {}
b1 = copy_variable_to_graph(b, to_graph, namespace, copied_variables)
W1 = copy_variable_to_graph(W, to_graph, namespace, copied_variables)

Ofcourse, if your code has a lot of variables, you could just store them all in a list and run the above function over all of them with a common dictionary for copied variables.

 

3) You then recursively copy all the computational nodes (ops, Placeholders) to the other graph. Now heres the nice things about the method- You only need to do it for the topmost node computational node. All connected inputs and Tensors are automatically taken care of!

For the above example, the train object constructed on line 16 is the ‘topmost’ node. So copying the whole learner is as simple as:

train_copy = copy_to_graph(train, to_graph, copied_variables, namespace)

Thats it! The other instances like y, optimizer, loss are automatically replicated in to_graph.

Theres also a helper function in case you want to find the equivalent of an element from the original graph, in to_graph:


loss_copy = get_copied(loss, to_graph, copied_variables, namespace)

 

4) You can now run the new node in to_graph. Remember to initialize a new Session instance linked to the graph, and initialize all Variables. So heres how you would go about it:


with to_graph.as_default():
    init1 = tf.initialize_all_variables()
    sess1 = tf.Session()
    sess1.run(init1)

    for step in xrange(0, 201):
        sess1.run(train_copy)
        if step % 20 == 0:
            print step, sess1.run(W1), sess1.run(b1)

This provides an output similar to what you would get from the Get Started original example.

The Code


import tensorflow as tf
from tensorflow.python.framework import ops
from copy import deepcopy


def copy_variable_to_graph(org_instance, to_graph, namespace,
                           copied_variables={}):
    """
    Copies the Variable instance 'org_instance' into the graph
    'to_graph', under the given namespace.
    The dict 'copied_variables', if provided, will be updated with
    mapping the new variable's name to the instance.
    """

    if not isinstance(org_instance, tf.Variable):
        raise TypeError(str(org_instance) + " is not a Variable")

    #The name of the new variable
    if namespace != '':
        new_name = (namespace + '/' +
                    org_instance.name[:org_instance.name.index(':')])
    else:
        new_name = org_instance.name[:org_instance.name.index(':')]

    #Get the collections that the new instance needs to be added to.
    #The new collections will also be a part of the given namespace,
    #except the special ones required for variable initialization and
    #training.
    collections = []
    for name, collection in org_instance.graph._collections.items():
        if org_instance in collection:
            if (name == ops.GraphKeys.VARIABLES or
                name == ops.GraphKeys.TRAINABLE_VARIABLES or
                namespace == ''):
                collections.append(name)
            else:
                collections.append(namespace + '/' + name)

    #See if its trainable.
    trainable = (org_instance in org_instance.graph.get_collection(
        ops.GraphKeys.TRAINABLE_VARIABLES))
    #Get the initial value
    with org_instance.graph.as_default():
        temp_session = tf.Session()
        init_value = temp_session.run(org_instance.initialized_value())

    #Initialize the new variable
    with to_graph.as_default():
        new_var = tf.Variable(init_value,
                              trainable,
                              name=new_name,
                              collections=collections,
                              validate_shape=False)

    #Add to the copied_variables dict
    copied_variables[new_var.name] = new_var

    return new_var


def copy_to_graph(org_instance, to_graph, copied_variables={}, namespace=""):
    """
    Makes a copy of the Operation/Tensor instance 'org_instance'
    for the graph 'to_graph', recursively. Therefore, all required
    structures linked to org_instance will be automatically copied.
    'copied_variables' should be a dict mapping pertinent copied variable
    names to the copied instances.
    
    The new instances are automatically inserted into the given 'namespace'.
    If namespace='', it is inserted into the graph's global namespace.
    However, to avoid naming conflicts, its better to provide a namespace.
    If the instance(s) happens to be a part of collection(s), they are
    are added to the appropriate collections in to_graph as well.
    For example, for collection 'C' which the instance happens to be a
    part of, given a namespace 'N', the new instance will be a part of
    'N/C' in to_graph.

    Returns the corresponding instance with respect to to_graph.

    TODO: Order of insertion into collections is not preserved
    """

    #The name of the new instance
    if namespace != '':
        new_name = namespace + '/' + org_instance.name
    else:
        new_name = org_instance.name

    #If a variable by the new name already exists, return the
    #correspondng tensor that will act as an input
    if new_name in copied_variables:
        return to_graph.get_tensor_by_name(
            copied_variables[new_name].name)

    #If an instance of the same name exists, return appropriately
    try:
        already_present = to_graph.as_graph_element(new_name,
                                                    allow_tensor=True,
                                                    allow_operation=True)
        return already_present
    except:
        pass

    #Get the collections that the new instance needs to be added to.
    #The new collections will also be a part of the given namespace.
    collections = []
    for name, collection in org_instance.graph._collections.items():
        if org_instance in collection:
            if namespace == '':
                collections.append(name)
            else:
                collections.append(namespace + '/' + name)
    
    #Take action based on the class of the instance

    if isinstance(org_instance, tf.python.framework.ops.Tensor):

        #If its a Tensor, it is one of the outputs of the underlying
        #op. Therefore, copy the op itself and return the appropriate
        #output.
        op = org_instance.op
        new_op = copy_to_graph(op, to_graph, copied_variables, namespace)
        output_index = op.outputs.index(org_instance)
        new_tensor = new_op.outputs[output_index]
        #Add to collections if any
        for collection in collections:
            to_graph.add_to_collection(collection, new_tensor)

        return new_tensor

    elif isinstance(org_instance, tf.python.framework.ops.Operation):

        op = org_instance

        #If it has an original_op parameter, copy it
        if op._original_op is not None:
            new_original_op = copy_to_graph(op._original_op, to_graph,
                                            copied_variables, namespace)
        else:
            new_original_op = None

        #If it has control inputs, call this function recursively on each.
        new_control_inputs = [copy_to_graph(x, to_graph, copied_variables,
                                            namespace)
                              for x in op.control_inputs]

        #If it has inputs, call this function recursively on each.
        new_inputs = [copy_to_graph(x, to_graph, copied_variables,
                                    namespace)
                      for x in op.inputs]

        #Make a new node_def based on that of the original.
        #An instance of tensorflow.core.framework.graph_pb2.NodeDef, it
        #stores String-based info such as name, device and type of the op.
        #Unique to every Operation instance.
        new_node_def = deepcopy(op._node_def)
        #Change the name
        new_node_def.name = new_name

        #Copy the other inputs needed for initialization
        output_types = op._output_types[:]
        input_types = op._input_types[:]

        #Make a copy of the op_def too.
        #Its unique to every _type_ of Operation.
        op_def = deepcopy(op._op_def)

        #Initialize a new Operation instance
        new_op = tf.python.framework.ops.Operation(new_node_def,
                                                   to_graph,
                                                   new_inputs,
                                                   output_types,
                                                   new_control_inputs,
                                                   input_types,
                                                   new_original_op,
                                                   op_def)
        #Use Graph's hidden methods to add the op
        to_graph._add_op(new_op)
        to_graph._record_op_seen_by_control_dependencies(new_op)
        for device_function in reversed(to_graph._device_function_stack):
            new_op._set_device(device_function(new_op))

        return new_op

    else:
        raise TypeError("Could not copy instance: " + str(org_instance))


def get_copied(original, graph, copied_variables={}, namespace=""):
    """
    Get a copy of the instance 'original', present in 'graph', under
    the given 'namespace'.
    'copied_variables' is a dict mapping pertinent variable names to the
    copy instances.
    """

    #The name of the copied instance
    if namespace != '':
        new_name = namespace + '/' + original.name
    else:
        new_name = original.name

    #If a variable by the name already exists, return it
    if new_name in copied_variables:
        return copied_variables[new_name]

    return graph.as_graph_element(new_name, allow_tensor=True,
                                  allow_operation=True)

Working with feeding is pretty simple too:


>>> x = tf.placeholder("float")
>>> a = tf.constant(3, "float")
>>> y = tf.add(x, a)
>>> namespace = "CopiedOps"
>>> to_graph = tf.Graph()
>>> copied_variables = {}
>>> y1 = copy_to_graph(y, to_graph, namespace)
>>> x1 = get_copied(x, to_graph, namespace)
>>> with to_graph.as_default():
    sess = tf.Session()
    print sess.run(y1, feed_dict={x1: 5})

    
8.0

I guess thats all for my hacking around with TensorFlow for the week. If you intend to use this code, please note that it may not be perfect at doing what it says. I haven’t tried it out with all sorts of TensorFlow data structures as yet, so be open to getting an Exception or two that you may have to fix. Infact, do drop me a comment or mail so I can make this code as fool-proof as I can. Cheers!

K-Means Clustering with TensorFlow

Google recently open-sourced its Artificial Intelligence/Numerical Computing library called TensorFlow. TensorFlow was developed by members of the Google Brain team, and has the flexibility to run on a variety of platforms – including GPUs and mobile devices.

687474703a2f2f7777772e616e64726f696463656e7472616c2e636f6d2f73697465732f616e64726f696463656e7472616c2e636f6d2f66696c65732f7374796c65732f6c617267652f7075626c69632f61727469636c655f696d616765732f323031352f31312f74656e736f72666c6f772e706e67

TensorFlow’s methodology uses what they called data-flow graphs. Consider the following diagram from the Wikipedia page on Genetic Programming (which could have some interesting applications with TensorFlow, I think):

Genetic_Program_TreeAs you probably understood, the graphical structure is a way of representing a computational expression in the form of a Tree. Every node is an operation (TensorFlow calls them ops, short for operations). The non-leaf nodes are pretty easy to understand. Some leaf nodes are a special case of an operation, always ‘returning’ a constant value (like 7 or 2.2 in the Tree). Others (like X or Y) act as placeholders that will be fed in at the time of execution. If you look at the arrows, you will realize that their directions denote the dependencies between outputs of different nodes. Hence, data (TensorFlow calls them Tensors) will flow in the opposite direction along each node – Hence the name TensorFlow. TensorFlow provides other components over this graphical abstraction, like persistent memory elements that retain data (called Variables), and optimization techniques to fine-tune the parameters in these Variables in applications like Neural Networks.

TensorFlow has a powerful Python API. The TensorFlow team has done an awesome job of writing the documentation (which is a little tricky to navigate). If you are completely new to this, heres a few links to get you started (in the order you should visit them):

1. Basic setup

2. Read this example to get a vague idea of what a TensorFlow code looks like.

3. Now read this explanation of the basic components of TensorFlow. It helps if you read the above example again, or simultaneously.

4. Read this detailed example of using TensorFlow for a common ML problem.

5. Once you have a decent understanding of the basic components and how they work, you can look at the Python docs for reference.

Now here is the code I wrote for K-Means clustering using TensorFlow. As a disclaimer, I will mention that this code is based on my (at the time of writing this) 2-day old understanding of how the library works. If you find any errors or know any optimizations possible, do drop a comment! The code is heavily documented, so do go through in-line docs.


import tensorflow as tf
from random import choice, shuffle
from numpy import array


def TFKMeansCluster(vectors, noofclusters):
    """
    K-Means Clustering using TensorFlow.
    'vectors' should be a n*k 2-D NumPy array, where n is the number
    of vectors of dimensionality k.
    'noofclusters' should be an integer.
    """

    noofclusters = int(noofclusters)
    assert noofclusters < len(vectors)

    #Find out the dimensionality
    dim = len(vectors[0])

    #Will help select random centroids from among the available vectors
    vector_indices = list(range(len(vectors)))
    shuffle(vector_indices)

    #GRAPH OF COMPUTATION
    #We initialize a new graph and set it as the default during each run
    #of this algorithm. This ensures that as this function is called
    #multiple times, the default graph doesn't keep getting crowded with
    #unused ops and Variables from previous function calls.

    graph = tf.Graph()

    with graph.as_default():

        #SESSION OF COMPUTATION

        sess = tf.Session()

        ##CONSTRUCTING THE ELEMENTS OF COMPUTATION

        ##First lets ensure we have a Variable vector for each centroid,
        ##initialized to one of the vectors from the available data points
        centroids = [tf.Variable((vectors[vector_indices[i]]))
                     for i in range(noofclusters)]
        ##These nodes will assign the centroid Variables the appropriate
        ##values
        centroid_value = tf.placeholder("float64", [dim])
        cent_assigns = []
        for centroid in centroids:
            cent_assigns.append(tf.assign(centroid, centroid_value))

        ##Variables for cluster assignments of individual vectors(initialized
        ##to 0 at first)
        assignments = [tf.Variable(0) for i in range(len(vectors))]
        ##These nodes will assign an assignment Variable the appropriate
        ##value
        assignment_value = tf.placeholder("int32")
        cluster_assigns = []
        for assignment in assignments:
            cluster_assigns.append(tf.assign(assignment,
                                             assignment_value))

        ##Now lets construct the node that will compute the mean
        #The placeholder for the input
        mean_input = tf.placeholder("float", [None, dim])
        #The Node/op takes the input and computes a mean along the 0th
        #dimension, i.e. the list of input vectors
        mean_op = tf.reduce_mean(mean_input, 0)

        ##Node for computing Euclidean distances
        #Placeholders for input
        v1 = tf.placeholder("float", [dim])
        v2 = tf.placeholder("float", [dim])
        euclid_dist = tf.sqrt(tf.reduce_sum(tf.pow(tf.sub(
            v1, v2), 2)))

        ##This node will figure out which cluster to assign a vector to,
        ##based on Euclidean distances of the vector from the centroids.
        #Placeholder for input
        centroid_distances = tf.placeholder("float", [noofclusters])
        cluster_assignment = tf.argmin(centroid_distances, 0)

        ##INITIALIZING STATE VARIABLES

        ##This will help initialization of all Variables defined with respect
        ##to the graph. The Variable-initializer should be defined after
        ##all the Variables have been constructed, so that each of them
        ##will be included in the initialization.
        init_op = tf.initialize_all_variables()

        #Initialize all variables
        sess.run(init_op)

        ##CLUSTERING ITERATIONS

        #Now perform the Expectation-Maximization steps of K-Means clustering
        #iterations. To keep things simple, we will only do a set number of
        #iterations, instead of using a Stopping Criterion.
        noofiterations = 100
        for iteration_n in range(noofiterations):

            ##EXPECTATION STEP
            ##Based on the centroid locations till last iteration, compute
            ##the _expected_ centroid assignments.
            #Iterate over each vector
            for vector_n in range(len(vectors)):
                vect = vectors[vector_n]
                #Compute Euclidean distance between this vector and each
                #centroid. Remember that this list cannot be named
                #'centroid_distances', since that is the input to the
                #cluster assignment node.
                distances = [sess.run(euclid_dist, feed_dict={
                    v1: vect, v2: sess.run(centroid)})
                             for centroid in centroids]
                #Now use the cluster assignment node, with the distances
                #as the input
                assignment = sess.run(cluster_assignment, feed_dict = {
                    centroid_distances: distances})
                #Now assign the value to the appropriate state variable
                sess.run(cluster_assigns[vector_n], feed_dict={
                    assignment_value: assignment})

            ##MAXIMIZATION STEP
            #Based on the expected state computed from the Expectation Step,
            #compute the locations of the centroids so as to maximize the
            #overall objective of minimizing within-cluster Sum-of-Squares
            for cluster_n in range(noofclusters):
                #Collect all the vectors assigned to this cluster
                assigned_vects = [vectors[i] for i in range(len(vectors))
                                  if sess.run(assignments[i]) == cluster_n]
                #Compute new centroid location
                new_location = sess.run(mean_op, feed_dict={
                    mean_input: array(assigned_vects)})
                #Assign value to appropriate variable
                sess.run(cent_assigns[cluster_n], feed_dict={
                    centroid_value: new_location})

        #Return centroids and assignments
        centroids = sess.run(centroids)
        assignments = sess.run(assignments)
        return centroids, assignments

Never, ever, EVER, do something like this:


for i in range(100):
    x = sess.run(tf.assign(variable1, placeholder))

This may seem pretty harmless at first glance, but every time you initialize an op, (like tf.assign or even tf.zeros, you are adding new ops instances to the default graph. Instead, as shown in the code, define a particular op for each task (however specialized) just once in the code. Then, during every of your iterations, call sess.run over the required nodes. To check if you are crowding your graph with unnecessary ops, just print out the value of len(graph.get_operations()) during every iteration and see if its is increasing. In fact, sess.run should be the only way you interact with the graph during every iteration.

As visible on lines 138 and 139, you can call sess.run over a list of ops/Variables to return a list of the outputs in the same order.

There are a lot of intricacies of TensorFlow that this code does not go into, such as assigning devices to nodes, Graph collections, dependencies, etc. Thats partially because I am still understanding these aspects one by one. But at first glance, TensorFlow seems to be a pretty powerful and flexible way of doing AI/ML-based computations. I would personally like to explore its applications in developing dependency-based statistical metrics for data – for which I am currently using custom tree-like data structures. Lets hope this gesture by Google does lead to an increase in the applications and research in AI. Cheers!

Generating rudimentary Mind-Maps from Word2Vec models

Mind Maps are notorious for being a very powerful organizational tool for a variety of tasks, such as brainstorming, planning and problem solving. Visual (or rather, graphical) arrangement of ideas aids the thought process, and in fact mimics the way we explore our mental knowledge base while ‘thinking’. There are a lot of online tools available for drawing out mind maps, but none that can generate one. By generate, I mean coming up with the verbal content.

For the longest time (almost 8 months now), I have been tinkering with ways to combine text mining and graph theory into a framework to generate a Mind-Map (given a text document). Ofcourse, the first thing argument would be that there cannot be a single possible Mind-Map for any block of text. And its true! However, having such an automated Map as a reference while building your own, might give you more insights (especially while brainstorming), or help you remember links that you might miss out (for studying). Lets see what a Mind-Map looks like:

laws_mindmap

Two points:

i. A Mind-Map is NOT a tree that divides the overall topic into its subtopics recursively. Its infact more like a graph, that links terms that are semantically related.

ii. Like ‘Night’ might make the word ‘Day’ pop up in your mind, a mind-map may have links between opposite meaning concepts, like Thicker-Thinner in the above example.

There are of course other points like using images to enhance concepts, and so on. But thats not the point of this post (And I suck at designer-style creativity anyways). Heres an article to help you get acquainted with the process of building and using your own Mind-Maps, just in case.

In my last post, I described a method to generate a Word2Vec model from a text document (where I used Wikipedia articles as an example). I will now describe the methodology I followed to generate a rudimentary mind-map from a Wikipedia article’s Word2Vec model model.

Step 1: Figuring out the top n terms from the document

(As I mentioned in my previous post, I only use stemmed unigrams. You can of course use higher-order ngrams, but that makes things a little tricky (but more accurate, if your algorithms for n-gram generation are solid).)

Here, n denotes the number of ‘nodes’ in my Mind-Map. In my trial-and-errors, 50 is usually a good enough number. Too less means too little information, and too much would mean noisy mind-maps. You can obviously play around with different choices of n. I use the co-occurrence based technique described in this paper to list out the top n words in a document. Heres the Python code for it:


def _get_param_matrices(vocabulary, sentence_terms):
    """
    Returns
    =======
    1. Top 300(or lesser, if vocab is short) most frequent terms(list)
    2. co-occurence matrix wrt the most frequent terms(dict)
    3. Dict containing Pg of most-frequent terms(dict)
    4. nw(no of terms affected) of each term(dict)
    """

    #Figure out top n terms with respect to mere occurences
    n = min(300, len(vocabulary))
    topterms = list(vocabulary.keys())
    topterms.sort(key = lambda x: vocabulary[x], reverse = True)
    topterms = topterms[:n]

    #nw maps term to the number of terms it 'affects'
    #(sum of number of terms in all sentences it
    #appears in)
    nw = {}
    #Co-occurence values are wrt top terms only
    co_occur = {}
    #Initially, co-occurence matrix is empty
    for x in vocabulary:
        co_occur[x] = [0 for i in range(len(topterms))]

    #Iterate over list of all sentences' vocabulary dictionaries
    #Build the co-occurence matrix
    for sentence in sentence_terms:
        total_terms = sum(list(sentence.values()))
        #This list contains the indices of all terms from topterms,
        #that are present in this sentence
        top_indices = []
        #Populate top_indices
        top_indices = [topterms.index(x) for x in sentence
                       if x in topterms]
        #Update nw dict, and co-occurence matrix
        for term in sentence:
            nw[term] = nw.get(term, 0) + total_terms
            for index in top_indices:
                co_occur[term][index] += (sentence[term] *
                                          sentence[topterms[index]])

    #Pg is just nw[term]/total vocabulary of text
    Pg = {}
    N = sum(list(vocabulary.values()))
    for x in topterms:
        Pg[x] = float(nw[x])/N

    return topterms, co_occur, Pg, nw


def get_top_n_terms(vocabulary, sentence_terms, n=50):
    """
    Returns the top 'n' terms from a block of text, in the form of a list,
    from most important to least.

    'vocabulary' should be a dict mapping each term to the number
    of its occurences in the entire text.
    'sentence_terms' should be an iterable of dicts, each denoting the
    vocabulary of the corresponding sentence.
    """

    #First compute the matrices
    topterms, co_occur, Pg, nw = _get_param_matrices(vocabulary,
                                                     sentence_terms)

    #This dict will map each term to its weightage with respect to the
    #document
    result = {}

    N = sum(list(vocabulary.values()))
    #Iterates over all terms in vocabulary
    for term in co_occur:
        term = str(term)
        org_term = str(term)
        for x in Pg:
            #expected_cooccur is the expected cooccurence of term with this
            #term, based on nw value of this and Pg value of the other
            expected_cooccur = nw[term] * Pg[x]
            #Result measures the difference(in no of terms) of expected
            #cooccurence and  actual cooccurence
            result[org_term] = ((co_occur[term][topterms.index(x)] -
                                 expected_cooccur)**2/ float(expected_cooccur))

    terms = list(result.keys())
    terms.sort(key=lambda x: result[x],
               reverse=True)

    return terms[:n]

The get_top_n_terms function does the job, and I guess the docstrings and in-line comments explain how (combined with the paper, of course). If you have the patience and time, you can infact just see the entire vocabulary of your Word2Vec model and pick out those terms that you want to see in your Mind-Map. This is likely to give you the best results (but with a lot of efforts).

Step 2: Deciding the Root

The Root would be that term out of your nodes, which denotes the central idea behind your entire Mind-Map. Since the number of nodes is pretty small compared to the vocabulary, its best to pick this one out manually. OR, you could use that term which has the highest occurrence in the vocabulary, among the selected nodes. This step may require some trial and error (But then what part of data science doesn’t?).

Step 3: Generating the graph (Mind-Map)

This is of course the most crucial step, and the one I spent the most time on. First off, let me define what I call the contextual vector of a term.

Say the root of the Mind Map is ‘computer’. It is linked to the term ‘hardware’. ‘hardware’ is in turn linked to ‘keyboard’. The Word2Vec vector of ‘keyboard’ would be obtained as model[keyboard] in the Python/Gensim environment. Lets denote it with v_{keyboard}.

Now consider yourself in the position of someone building a Mind Map. When you think of ‘keyboard’, given the structure of what you have come up with so far, you will be thinking of it in the context of ‘computer’ and ‘hardware’. Thats why you probably won’t link ‘keyboard’ to ‘music’ (atleast not directly). This basically shows that the contextual vector for ‘keyboard’ (lets call it v'_{keyboard}) must be biased in its direction (since we use cosine similarity with Word2Vec models, only directions matter) towards v_{computer} and v_{hardware}. Moreover, intuitively speaking, the influence of v_{hardware} on v'_{keyboard} should be greater than that of v_{computer} – in essence, the influence of the context of a parent reduces as you go further and further away from it. To take this into account, I use what I call the contextual decay factor \alpha. Expressing it mathematically,

v'_{computer} = v_{computer}

v'_{hardware} = (1 - \alpha) v_{hardware} + \alpha v'_{computer}

v'_{keyboard} = (1 - \alpha) v_{keyboard} + \alpha v'_{hardware}

And so on…

Finally, to generate the actual Mind-Map, heres the algorithm I use (I hope the inline comments are enough to let you know what I have done):


from scipy.spatial.distance import cosine
from networkx import Graph

def build_mind_map(model, stemmer, root, nodes, alpha=0.2):
    """
    Returns the Mind-Map in the form of a NetworkX Graph instance.

    'model' should be an instance of gensim.models.Word2Vec
    'nodes' should be a list of terms, included in the vocabulary of
    'model'.
    'root' should be the node that is to be used as the root of the Mind
    Map graph.
    'stemmer' should be an instance of StemmingHelper.
    """

    #This will be the Mind-Map
    g = Graph()

    #Ensure that the every node is in the vocabulary of the Word2Vec
    #model, and that the root itself is included in the given nodes
    for node in nodes:
        if node not in model.vocab:
            raise ValueError(node + " not in model's vocabulary")
    if root not in nodes:
        raise ValueError("root not in nodes")

    ##Containers for algorithm run
    #Initially, all nodes are unvisited
    unvisited_nodes = set(nodes)
    #Initially, no nodes are visited
    visited_nodes = set([])
    #The following will map visited node to its contextual vector
    visited_node_vectors = {}
    #Thw following will map unvisited nodes to (closest_distance, parent)
    #parent will obviously be a visited node
    node_distances = {}

    #Initialization with respect to root
    current_node = root
    visited_node_vectors[root] = model[root]
    unvisited_nodes.remove(root)
    visited_nodes.add(root)

    #Build the Mind-Map in n-1 iterations
    for i in range(1, len(nodes)):
        #For every unvisited node 'x'
        for x in unvisited_nodes:
            #Compute contextual distance between current node and x
            dist_from_current = cosine(visited_node_vectors[current_node],
                                       model[x])
            #Get the least contextual distance to x found until now
            distance = node_distances.get(x, (100, ''))
            #If current node provides a shorter path to x, update x's
            #distance and parent information
            if distance[0] > dist_from_current:
                node_distances[x] = (dist_from_current, current_node)

        #Choose next 'current' as that unvisited node, which has the
        #lowest contextual distance from any of the visited nodes
        next_node = min(unvisited_nodes,
                        key=lambda x: node_distances[x][0])

        ##Update all containers
        parent = node_distances[next_node][1]
        del node_distances[next_node]
        next_node_vect = ((1 - alpha)*model[next_node] +
                          alpha*visited_node_vectors[parent])
        visited_node_vectors[next_node] = next_node_vect
        unvisited_nodes.remove(next_node)
        visited_nodes.add(next_node)

        #Add the link between newly selected node and its parent(from the
        #visited nodes) to the NetworkX Graph instance
        g.add_edge(stemmer.original_form(parent).capitalize(),
                   stemmer.original_form(next_node).capitalize())

        #The new node becomes the current node for the next iteration
        current_node = next_node

    return g

Notes: I use NetworkX’s simple Graph-building infrastructure to do the core job of maintaining the Mind-Map (makes it easier later for visualization too). To compute cosine distance, I use SciPy. Moreover, on lines 74 and 75, I use the StemmingHelper class from my last post to include the stemmed words in their original form in the actual mind-map (instead of their stemmed versions). You can pass the StemmingHelper class directly as the parameter stemmer. On the other hand, if you aren’t using stemming at all, just remove those parts of the code on lines 4, 74, and 75.

If you look at the algorithm, you will realize that its somewhat like Dijkstra’s algorithm for single-source shortest paths, but in a different context.

Example Outputs

Now for the results. (I used PyGraphViz for simple, quick-and-dirty visualization)

Heres the Mind-Map that was generated for the Wikipedia article on Machine Learning:

ml

One on Artificial Intellgence:

aiAnd finally, one on Psychology:

yoyo

The results are similar on all the other topics I tried out. A few things I noticed:

i. I should try and involve bi-grams and trigrams too. I am pretty sure it will make the Word2Vec model itself way stronger, and thus improve the interpretation of terms with respect to the document.

ii. There are some unnecessary terms in the Mind Maps, but given the short length of the texts (compared to most text mining tasks), the Keyword extraction algorithm in the paper I mentioned before, seems really good.

iii. Maybe one could use this for brainstorming. Like you start out with a term(node) of your choice. Then, the framework suggests you terms to connect it to. Once you select one of them, you get further recommendations for it based on the context, etc. – Something like a Mind-Map helper.

Anyways, this was a long post, and thanks a lot if you stuck to the end :-). Cheers!

Generating a Word2Vec model from a block of Text using Gensim (Python)

screen-shot-2015-04-10-at-4-16-00-pm

Word2Vec is a semantic learning framework that uses a shallow neural network to learn the representations of words/phrases in a particular text. Simply put, its an algorithm that takes in all the terms (with repetitions) in a particular document, divided into sentences, and outputs a vectorial form of each. The ‘advantage’ word2vec offers is in its utilization of a neural model in understanding the semantic meaning behind those terms. For example, a document may employ the words ‘dog’ and ‘canine’ to mean the same thing, but never use them together in a sentence. Ideally, Word2Vec would be able to learn the context and place them together in its semantic space. Most applications of Word2Vec using cosine similarity to quantify closeness. This Quora question (or rather its answers) does a good job of explaining the intuition behind it.

You would need to take the following steps to develop a Word2Vec model from a block of text (Usually, documents that are extensive and yet stick to the topic of interest with minimum ambiguity do well):

[I use Gensim’s Word2Vec API in Python to form Word2Vec models of Wikipedia articles.]

1. Obtain the text (obviously)

To obtain the Wikipedia articles, I use the Python wikipedia library. Once installed from the link, here’s how you could use it obtain all the text from an aritcle-


#'title' denotes the exact title of the article to be fetched
title = "Machine learning"
from wikipedia import page
wikipage = page(title)

You could then use wikipage.context to access the entire textual context in the form of a String. Now, incase you don’t have the exact title and want to do a search, you would do:


from wikipedia import search, page
titles = search('machine learning')
wikipage = page(titles[0])

[Tip: Store the content into a file and access it from there. This would provide you a reference later, if needed.]

2. Preprocess the text

In the context of Python, you would require an iterable that yields one iterable for each sentence in the text. The inner iterable would contain the terms in the particular sentence. A ‘term’ could be individual words like ‘machine’, or phrases(n-grams) like ‘machine learning’, or a combination of both. Coming up with appropriate bigrams/trigrams is a tricky task on its own, so I just stick to unigrams.

First of all, I remove all special characters and short lines from the article, to eliminate noise. Then, I use Porter Stemming on my unigrams, using a ‘wrapper’ around Gensim’s stemming API.


from gensim.parsing import PorterStemmer
global_stemmer = PorterStemmer()

class StemmingHelper(object):
    """
    Class to aid the stemming process - from word to stemmed form,
    and vice versa.
    The 'original' form of a stemmed word will be returned as the
    form in which its been used the most number of times in the text.
    """

    #This reverse lookup will remember the original forms of the stemmed
    #words
    word_lookup = {}

    @classmethod
    def stem(cls, word):
        """
        Stems a word and updates the reverse lookup.
        """

        #Stem the word
        stemmed = global_stemmer.stem(word)

        #Update the word lookup
        if stemmed not in cls.word_lookup:
            cls.word_lookup[stemmed] = {}
        cls.word_lookup[stemmed][word] = (
            cls.word_lookup[stemmed].get(word, 0) + 1)

        return stemmed

    @classmethod
    def original_form(cls, word):
        """
        Returns original form of a word given the stemmed version,
        as stored in the word lookup.
        """

        if word in cls.word_lookup:
            return max(cls.word_lookup[word].keys(),
                       key=lambda x: cls.word_lookup[word][x])
        else:
            return word

Refer to the code and docstrings to understand how it works. (Its pretty simple anyways). It can be used as follows-


>>> StemmingHelper.stem('learning')
'learn'
>>> StemmingHelper.original_form('learn')
'learning'

Pre-stemming, you could also use a list of stopwords to eliminate terms that occur frequently in the English language, but don’t carry much semantic meaning.

After your pre-processing, lets assume you come up with an iterable called sentences from your source of text.

3. Figure out the values for your numerical parameters

Gensim’s Word2Vec API requires some parameters for initialization. Ofcourse they do have default values, but you want to define some on your own:

i. size – Denotes the number of dimensions present in the vectorial forms. If you have read the document and have an idea of how many ‘topics’ it has, you can use that number. For sizeable blocks, people use 100-200. I use around 50 for the Wikipedia articles. Usually, you would want to repeat the initialization for different numbers of topics in a certain range, and pick the one that yields the best results (depending on your application – I will be using them to build Mind-Maps, and I usually have to try values from 20-100.). A good heuristic thats frequently used is the square-root of the length of the vocabulary, after pre-processing.

ii. min_count – Terms that occur less than min_count number of times are ignored in the calculations. This reduces noise in the semantic space. I use 2 for Wikipedia. Usually, the bigger and more extensive your text, the higher this number can be.

iii. window – Only terms hat occur within a window-neighbourhood of a term, in a sentence, are associated with it during training. The usual value is 4. Unless your text contains big sentences, leave it at that.

iv. sg – This defines the algorithm. If equal to 1, the skip-gram technique is used. Else, the CBoW method is employed. (Look at the aforementioned Quora answers). I usually use the default(1).

4. Initialize the model and use it

The model can be generated using Gensim’s API, as follows:


from gensim.models import Word2Vec
min_count = 2
size = 50
window = 4

model = Word2Vec(sentences, min_count=min_count, size=size, window=window)

Now that you have the model initialized, you can access all the terms in its vocabulary, using something like list(model.vocab.keys()). To get the vectorial representation of a particular term, use model[term]. If you have used my stemming wrapper, you could find the appropriate original form of the stemmed terms using StemmingHelper.original_form(term). Heres an example, from the Wiki article on Machine learning:


>>> vocab = list(model.vocab.keys())
>>> vocab[:10]
[u'represent', u'concept', u'founder', u'focus', u'invent', u'signific', u'abil', u'implement', u'benevol', u'hierarch']
>>> 'learn' in model.vocab
True
>>> model['learn']
array([  1.23792759e-03,   5.49776992e-03,   2.18261080e-03,
         8.37465748e-03,  -6.10323064e-03,  -6.94877980e-03,
         6.29429379e-03,  -7.06598908e-03,  -7.16267806e-03,
        -2.78065586e-03,   7.40372669e-03,   9.68673080e-03,
        -4.75220988e-03,  -8.34807567e-03,   5.25208283e-03,
         8.43616109e-03,  -1.07231298e-02,  -3.88528360e-03,
        -9.20894090e-03,   4.17305576e-03,   1.90116244e-03,
        -1.92442467e-03,   2.74807960e-03,  -1.01113841e-02,
        -3.71694425e-03,  -6.60350174e-03,  -5.90716442e-03,
         3.90679482e-03,  -5.32188127e-03,   5.63300075e-03,
        -5.52612450e-03,  -5.57334488e-03,  -8.51202477e-03,
        -8.78736563e-03,   6.41061319e-03,   6.64879987e-03,
        -3.55080629e-05,   4.81080823e-03,  -7.11903954e-03,
         9.83678619e-04,   1.60697231e-03,   7.42980337e-04,
        -2.12235347e-04,  -8.05167668e-03,   4.08948492e-03,
        -5.48054813e-04,   8.55423324e-03,  -7.08682090e-03,
         1.57684216e-03,   6.79725129e-03], dtype=float32)
>>> StemmingHelper.original_form('learn')
u'learning'
>>> StemmingHelper.original_form('hierarch')
u'hierarchical'

As you might have guessed, the vectors are NumPy arrays, and support all their functionality. Now, to compute the cosine similarity between two terms, use the similarity method. Cosine similarity is generally bounded by [-1, 1]. The corresponding ‘distance’ can be measured as 1-similarity. To figure out the terms most similar to a particular one, you can use the most_similar method.


>>> model.most_similar(StemmingHelper.stem('classification'))
[(u'spam', 0.25190210342407227), (u'metric', 0.22569453716278076), (u'supervis', 0.19861873984336853), (u'decis', 0.18607790768146515), (u'inform', 0.17607420682907104), (u'artifici', 0.16593246161937714), (u'previous', 0.16366994380950928), (u'train', 0.15940310060977936), (u'network', 0.14765430986881256), (u'term', 0.14321796596050262)]
>>> model.similarity(StemmingHelper.stem('classification'), 'supervis')
0.19861870268896875
>>> model.similarity('unsupervis', 'supervis')
-0.11546791800661522

There’s a ton of other functionality that’s supported by the class, so you should have a look at the API I gave a link to. Happy topic modelling 🙂