Recursively copying elements from one Graph to another in TensorFlow

In my previous post on Google’s TensorFlow, I had mentioned the idea of using the library for Genetic Programming applications. In my free time, I tried to map out how such an application would be run. The focus obviously wasn’t on building the smartest way to do GP, but rather on exploring if it was practically possible. One of the ideas that stuck out, was to have a Graph for each candidate solution – which would be populated based on cross-overed elements from the parent solutions’ Graphs. This would require a good API to copy computational elements from one Graph to another in TensorFlow. I tried digging around to see if such functionality was available, but couldn’t find any (atleast from a good exploring of the github repo).

I wrote some rudimentary code to accomplish this, and heres a basic outline of how it works:

Consider the example given on TensorFlow’s Get Started page.

import tensorflow as tf
import numpy as np

# Make 100 phony data points in NumPy.
x_data = np.float32(np.random.rand(2, 100)) # Random input
y_data = np.dot([0.100, 0.200], x_data) + 0.300

# Construct a linear model.
b = tf.Variable(tf.zeros())
W = tf.Variable(tf.random_uniform([1, 2], -1.0, 1.0))
y = tf.matmul(W, x_data) + b

# Minimize the squared errors.
loss = tf.reduce_mean(tf.square(y - y_data))
train = optimizer.minimize(loss)

# For initializing the variables.
init = tf.initialize_all_variables()

# Launch the graph
sess = tf.Session()
sess.run(init)

# Fit the plane.
for step in xrange(0, 201):
sess.run(train)
if step % 20 == 0:
print step, sess.run(W), sess.run(b)

# Learns best fit is W: [[0.100  0.200]], b: [0.300]

Lets say you want to copy the main training element train to another graph as defined here:

to_graph = tf.Graph()

1) You first decide on a namespace inside which all the copied elements will exist in to_graph. This is not really required if the Graph you are copying to is empty. But its important to remember that element names matter a lot in TensorFlow’s workings. Therefore, to avoid naming conflicts, its better to define such a namespace. So basically, what would be called “C” in to_graph‘s namespace, would now be called “N/C” where “N” is the namespace String.

Lets assume the namespace we define the copied elements in our example to, is “CopiedOps”.

namespace = "CopiedOps"

2) You then copy all the variables first, one by one, using a dedicated function copy_variable_to_graph. Each time, you supply the original instance, the target graph (to_graph in this case), the namespace, and an extra dictionary called copied_variables. This dictionary is useful while copying the computational nodes (Operation instances, we will call them ops) in the next step. Since Variable instances act as inputs for ops, we need a way to keep track of them for later.

I initially wanted to combine this initialization of variables with the function that copies the computational elements, but I found it really tricky to capture the appropriate Variable instances based on their connections to ops. Anyways, since variables are more like parameter-storing units whose values are needed frequently, its better to initialize them separately.

The Variable instances in the above example are b and W. Heres how you would do it with my code:

copied_variables = {}
b1 = copy_variable_to_graph(b, to_graph, namespace, copied_variables)
W1 = copy_variable_to_graph(W, to_graph, namespace, copied_variables)

Ofcourse, if your code has a lot of variables, you could just store them all in a list and run the above function over all of them with a common dictionary for copied variables.

3) You then recursively copy all the computational nodes (ops, Placeholders) to the other graph. Now heres the nice things about the method- You only need to do it for the topmost node computational node. All connected inputs and Tensors are automatically taken care of!

For the above example, the train object constructed on line 16 is the ‘topmost’ node. So copying the whole learner is as simple as:

train_copy = copy_to_graph(train, to_graph, copied_variables, namespace)

Thats it! The other instances like y, optimizer, loss are automatically replicated in to_graph.

Theres also a helper function in case you want to find the equivalent of an element from the original graph, in to_graph:

loss_copy = get_copied(loss, to_graph, copied_variables, namespace)

4) You can now run the new node in to_graph. Remember to initialize a new Session instance linked to the graph, and initialize all Variables. So heres how you would go about it:

with to_graph.as_default():
init1 = tf.initialize_all_variables()
sess1 = tf.Session()
sess1.run(init1)

for step in xrange(0, 201):
sess1.run(train_copy)
if step % 20 == 0:
print step, sess1.run(W1), sess1.run(b1)

This provides an output similar to what you would get from the Get Started original example.

The Code

import tensorflow as tf
from tensorflow.python.framework import ops
from copy import deepcopy

def copy_variable_to_graph(org_instance, to_graph, namespace,
copied_variables={}):
"""
Copies the Variable instance 'org_instance' into the graph
'to_graph', under the given namespace.
The dict 'copied_variables', if provided, will be updated with
mapping the new variable's name to the instance.
"""

if not isinstance(org_instance, tf.Variable):
raise TypeError(str(org_instance) + " is not a Variable")

#The name of the new variable
if namespace != '':
new_name = (namespace + '/' +
org_instance.name[:org_instance.name.index(':')])
else:
new_name = org_instance.name[:org_instance.name.index(':')]

#Get the collections that the new instance needs to be added to.
#The new collections will also be a part of the given namespace,
#except the special ones required for variable initialization and
#training.
collections = []
for name, collection in org_instance.graph._collections.items():
if org_instance in collection:
if (name == ops.GraphKeys.VARIABLES or
name == ops.GraphKeys.TRAINABLE_VARIABLES or
namespace == ''):
collections.append(name)
else:
collections.append(namespace + '/' + name)

#See if its trainable.
trainable = (org_instance in org_instance.graph.get_collection(
ops.GraphKeys.TRAINABLE_VARIABLES))
#Get the initial value
with org_instance.graph.as_default():
temp_session = tf.Session()
init_value = temp_session.run(org_instance.initialized_value())

#Initialize the new variable
with to_graph.as_default():
new_var = tf.Variable(init_value,
trainable,
name=new_name,
collections=collections,
validate_shape=False)

copied_variables[new_var.name] = new_var

return new_var

def copy_to_graph(org_instance, to_graph, copied_variables={}, namespace=""):
"""
Makes a copy of the Operation/Tensor instance 'org_instance'
for the graph 'to_graph', recursively. Therefore, all required
structures linked to org_instance will be automatically copied.
'copied_variables' should be a dict mapping pertinent copied variable
names to the copied instances.

The new instances are automatically inserted into the given 'namespace'.
If namespace='', it is inserted into the graph's global namespace.
However, to avoid naming conflicts, its better to provide a namespace.
If the instance(s) happens to be a part of collection(s), they are
are added to the appropriate collections in to_graph as well.
For example, for collection 'C' which the instance happens to be a
part of, given a namespace 'N', the new instance will be a part of
'N/C' in to_graph.

Returns the corresponding instance with respect to to_graph.

TODO: Order of insertion into collections is not preserved
"""

#The name of the new instance
if namespace != '':
new_name = namespace + '/' + org_instance.name
else:
new_name = org_instance.name

#If a variable by the new name already exists, return the
#correspondng tensor that will act as an input
if new_name in copied_variables:
copied_variables[new_name].name)

#If an instance of the same name exists, return appropriately
try:
allow_tensor=True,
allow_operation=True)
except:
pass

#Get the collections that the new instance needs to be added to.
#The new collections will also be a part of the given namespace.
collections = []
for name, collection in org_instance.graph._collections.items():
if org_instance in collection:
if namespace == '':
collections.append(name)
else:
collections.append(namespace + '/' + name)

#Take action based on the class of the instance

if isinstance(org_instance, tf.python.framework.ops.Tensor):

#If its a Tensor, it is one of the outputs of the underlying
#op. Therefore, copy the op itself and return the appropriate
#output.
op = org_instance.op
new_op = copy_to_graph(op, to_graph, copied_variables, namespace)
output_index = op.outputs.index(org_instance)
new_tensor = new_op.outputs[output_index]
for collection in collections:

return new_tensor

elif isinstance(org_instance, tf.python.framework.ops.Operation):

op = org_instance

#If it has an original_op parameter, copy it
if op._original_op is not None:
new_original_op = copy_to_graph(op._original_op, to_graph,
copied_variables, namespace)
else:
new_original_op = None

#If it has control inputs, call this function recursively on each.
new_control_inputs = [copy_to_graph(x, to_graph, copied_variables,
namespace)
for x in op.control_inputs]

#If it has inputs, call this function recursively on each.
new_inputs = [copy_to_graph(x, to_graph, copied_variables,
namespace)
for x in op.inputs]

#Make a new node_def based on that of the original.
#An instance of tensorflow.core.framework.graph_pb2.NodeDef, it
#stores String-based info such as name, device and type of the op.
#Unique to every Operation instance.
new_node_def = deepcopy(op._node_def)
#Change the name
new_node_def.name = new_name

#Copy the other inputs needed for initialization
output_types = op._output_types[:]
input_types = op._input_types[:]

#Make a copy of the op_def too.
#Its unique to every _type_ of Operation.
op_def = deepcopy(op._op_def)

#Initialize a new Operation instance
new_op = tf.python.framework.ops.Operation(new_node_def,
to_graph,
new_inputs,
output_types,
new_control_inputs,
input_types,
new_original_op,
op_def)
#Use Graph's hidden methods to add the op
to_graph._record_op_seen_by_control_dependencies(new_op)
for device_function in reversed(to_graph._device_function_stack):
new_op._set_device(device_function(new_op))

return new_op

else:
raise TypeError("Could not copy instance: " + str(org_instance))

def get_copied(original, graph, copied_variables={}, namespace=""):
"""
Get a copy of the instance 'original', present in 'graph', under
the given 'namespace'.
'copied_variables' is a dict mapping pertinent variable names to the
copy instances.
"""

#The name of the copied instance
if namespace != '':
new_name = namespace + '/' + original.name
else:
new_name = original.name

#If a variable by the name already exists, return it
if new_name in copied_variables:
return copied_variables[new_name]

return graph.as_graph_element(new_name, allow_tensor=True,
allow_operation=True)

Working with feeding is pretty simple too:

>>> x = tf.placeholder("float")
>>> a = tf.constant(3, "float")
>>> namespace = "CopiedOps"
>>> to_graph = tf.Graph()
>>> copied_variables = {}
>>> y1 = copy_to_graph(y, to_graph, namespace)
>>> x1 = get_copied(x, to_graph, namespace)
>>> with to_graph.as_default():
sess = tf.Session()
print sess.run(y1, feed_dict={x1: 5})

8.0

I guess thats all for my hacking around with TensorFlow for the week. If you intend to use this code, please note that it may not be perfect at doing what it says. I haven’t tried it out with all sorts of TensorFlow data structures as yet, so be open to getting an Exception or two that you may have to fix. Infact, do drop me a comment or mail so I can make this code as fool-proof as I can. Cheers!

Generating rudimentary Mind-Maps from Word2Vec models

Mind Maps are notorious for being a very powerful organizational tool for a variety of tasks, such as brainstorming, planning and problem solving. Visual (or rather, graphical) arrangement of ideas aids the thought process, and in fact mimics the way we explore our mental knowledge base while ‘thinking’. There are a lot of online tools available for drawing out mind maps, but none that can generate one. By generate, I mean coming up with the verbal content.

For the longest time (almost 8 months now), I have been tinkering with ways to combine text mining and graph theory into a framework to generate a Mind-Map (given a text document). Ofcourse, the first thing argument would be that there cannot be a single possible Mind-Map for any block of text. And its true! However, having such an automated Map as a reference while building your own, might give you more insights (especially while brainstorming), or help you remember links that you might miss out (for studying). Lets see what a Mind-Map looks like: Two points:

i. A Mind-Map is NOT a tree that divides the overall topic into its subtopics recursively. Its infact more like a graph, that links terms that are semantically related.

ii. Like ‘Night’ might make the word ‘Day’ pop up in your mind, a mind-map may have links between opposite meaning concepts, like Thicker-Thinner in the above example.

There are of course other points like using images to enhance concepts, and so on. But thats not the point of this post (And I suck at designer-style creativity anyways). Heres an article to help you get acquainted with the process of building and using your own Mind-Maps, just in case.

In my last post, I described a method to generate a Word2Vec model from a text document (where I used Wikipedia articles as an example). I will now describe the methodology I followed to generate a rudimentary mind-map from a Wikipedia article’s Word2Vec model model.

Step 1: Figuring out the top n terms from the document

(As I mentioned in my previous post, I only use stemmed unigrams. You can of course use higher-order ngrams, but that makes things a little tricky (but more accurate, if your algorithms for n-gram generation are solid).)

Here, n denotes the number of ‘nodes’ in my Mind-Map. In my trial-and-errors, 50 is usually a good enough number. Too less means too little information, and too much would mean noisy mind-maps. You can obviously play around with different choices of n. I use the co-occurrence based technique described in this paper to list out the top n words in a document. Heres the Python code for it:

def _get_param_matrices(vocabulary, sentence_terms):
"""
Returns
=======
1. Top 300(or lesser, if vocab is short) most frequent terms(list)
2. co-occurence matrix wrt the most frequent terms(dict)
3. Dict containing Pg of most-frequent terms(dict)
4. nw(no of terms affected) of each term(dict)
"""

#Figure out top n terms with respect to mere occurences
n = min(300, len(vocabulary))
topterms = list(vocabulary.keys())
topterms.sort(key = lambda x: vocabulary[x], reverse = True)
topterms = topterms[:n]

#nw maps term to the number of terms it 'affects'
#(sum of number of terms in all sentences it
#appears in)
nw = {}
#Co-occurence values are wrt top terms only
co_occur = {}
#Initially, co-occurence matrix is empty
for x in vocabulary:
co_occur[x] = [0 for i in range(len(topterms))]

#Iterate over list of all sentences' vocabulary dictionaries
#Build the co-occurence matrix
for sentence in sentence_terms:
total_terms = sum(list(sentence.values()))
#This list contains the indices of all terms from topterms,
#that are present in this sentence
top_indices = []
#Populate top_indices
top_indices = [topterms.index(x) for x in sentence
if x in topterms]
#Update nw dict, and co-occurence matrix
for term in sentence:
nw[term] = nw.get(term, 0) + total_terms
for index in top_indices:
co_occur[term][index] += (sentence[term] *
sentence[topterms[index]])

#Pg is just nw[term]/total vocabulary of text
Pg = {}
N = sum(list(vocabulary.values()))
for x in topterms:
Pg[x] = float(nw[x])/N

def get_top_n_terms(vocabulary, sentence_terms, n=50):
"""
Returns the top 'n' terms from a block of text, in the form of a list,
from most important to least.

'vocabulary' should be a dict mapping each term to the number
of its occurences in the entire text.
'sentence_terms' should be an iterable of dicts, each denoting the
vocabulary of the corresponding sentence.
"""

#First compute the matrices
topterms, co_occur, Pg, nw = _get_param_matrices(vocabulary,
sentence_terms)

#This dict will map each term to its weightage with respect to the
#document
result = {}

N = sum(list(vocabulary.values()))
#Iterates over all terms in vocabulary
for term in co_occur:
term = str(term)
org_term = str(term)
for x in Pg:
#expected_cooccur is the expected cooccurence of term with this
#term, based on nw value of this and Pg value of the other
expected_cooccur = nw[term] * Pg[x]
#Result measures the difference(in no of terms) of expected
#cooccurence and  actual cooccurence
result[org_term] = ((co_occur[term][topterms.index(x)] -
expected_cooccur)**2/ float(expected_cooccur))

terms = list(result.keys())
terms.sort(key=lambda x: result[x],
reverse=True)

return terms[:n]

The get_top_n_terms function does the job, and I guess the docstrings and in-line comments explain how (combined with the paper, of course). If you have the patience and time, you can infact just see the entire vocabulary of your Word2Vec model and pick out those terms that you want to see in your Mind-Map. This is likely to give you the best results (but with a lot of efforts).

Step 2: Deciding the Root

The Root would be that term out of your nodes, which denotes the central idea behind your entire Mind-Map. Since the number of nodes is pretty small compared to the vocabulary, its best to pick this one out manually. OR, you could use that term which has the highest occurrence in the vocabulary, among the selected nodes. This step may require some trial and error (But then what part of data science doesn’t?).

Step 3: Generating the graph (Mind-Map)

This is of course the most crucial step, and the one I spent the most time on. First off, let me define what I call the contextual vector of a term.

Say the root of the Mind Map is ‘computer’. It is linked to the term ‘hardware’. ‘hardware’ is in turn linked to ‘keyboard’. The Word2Vec vector of ‘keyboard’ would be obtained as model[keyboard] in the Python/Gensim environment. Lets denote it with $v_{keyboard}$.

Now consider yourself in the position of someone building a Mind Map. When you think of ‘keyboard’, given the structure of what you have come up with so far, you will be thinking of it in the context of ‘computer’ and ‘hardware’. Thats why you probably won’t link ‘keyboard’ to ‘music’ (atleast not directly). This basically shows that the contextual vector for ‘keyboard’ (lets call it $v'_{keyboard}$) must be biased in its direction (since we use cosine similarity with Word2Vec models, only directions matter) towards $v_{computer}$ and $v_{hardware}$. Moreover, intuitively speaking, the influence of $v_{hardware}$ on $v'_{keyboard}$ should be greater than that of $v_{computer}$ – in essence, the influence of the context of a parent reduces as you go further and further away from it. To take this into account, I use what I call the contextual decay factor $\alpha$. Expressing it mathematically, $v'_{computer} = v_{computer}$ $v'_{hardware} = (1 - \alpha) v_{hardware} + \alpha v'_{computer}$ $v'_{keyboard} = (1 - \alpha) v_{keyboard} + \alpha v'_{hardware}$

And so on…

Finally, to generate the actual Mind-Map, heres the algorithm I use (I hope the inline comments are enough to let you know what I have done):

from scipy.spatial.distance import cosine
from networkx import Graph

def build_mind_map(model, stemmer, root, nodes, alpha=0.2):
"""
Returns the Mind-Map in the form of a NetworkX Graph instance.

'model' should be an instance of gensim.models.Word2Vec
'nodes' should be a list of terms, included in the vocabulary of
'model'.
'root' should be the node that is to be used as the root of the Mind
Map graph.
'stemmer' should be an instance of StemmingHelper.
"""

#This will be the Mind-Map
g = Graph()

#Ensure that the every node is in the vocabulary of the Word2Vec
#model, and that the root itself is included in the given nodes
for node in nodes:
if node not in model.vocab:
raise ValueError(node + " not in model's vocabulary")
if root not in nodes:
raise ValueError("root not in nodes")

##Containers for algorithm run
#Initially, all nodes are unvisited
unvisited_nodes = set(nodes)
#Initially, no nodes are visited
visited_nodes = set([])
#The following will map visited node to its contextual vector
visited_node_vectors = {}
#Thw following will map unvisited nodes to (closest_distance, parent)
#parent will obviously be a visited node
node_distances = {}

#Initialization with respect to root
current_node = root
visited_node_vectors[root] = model[root]
unvisited_nodes.remove(root)

#Build the Mind-Map in n-1 iterations
for i in range(1, len(nodes)):
#For every unvisited node 'x'
for x in unvisited_nodes:
#Compute contextual distance between current node and x
dist_from_current = cosine(visited_node_vectors[current_node],
model[x])
#Get the least contextual distance to x found until now
distance = node_distances.get(x, (100, ''))
#If current node provides a shorter path to x, update x's
#distance and parent information
if distance > dist_from_current:
node_distances[x] = (dist_from_current, current_node)

#Choose next 'current' as that unvisited node, which has the
#lowest contextual distance from any of the visited nodes
next_node = min(unvisited_nodes,
key=lambda x: node_distances[x])

##Update all containers
parent = node_distances[next_node]
del node_distances[next_node]
next_node_vect = ((1 - alpha)*model[next_node] +
alpha*visited_node_vectors[parent])
visited_node_vectors[next_node] = next_node_vect
unvisited_nodes.remove(next_node)

#visited nodes) to the NetworkX Graph instance
stemmer.original_form(next_node).capitalize())

#The new node becomes the current node for the next iteration
current_node = next_node

return g

Notes: I use NetworkX’s simple Graph-building infrastructure to do the core job of maintaining the Mind-Map (makes it easier later for visualization too). To compute cosine distance, I use SciPy. Moreover, on lines 74 and 75, I use the StemmingHelper class from my last post to include the stemmed words in their original form in the actual mind-map (instead of their stemmed versions). You can pass the StemmingHelper class directly as the parameter stemmer. On the other hand, if you aren’t using stemming at all, just remove those parts of the code on lines 4, 74, and 75.

If you look at the algorithm, you will realize that its somewhat like Dijkstra’s algorithm for single-source shortest paths, but in a different context.

Example Outputs

Now for the results. (I used PyGraphViz for simple, quick-and-dirty visualization)

Heres the Mind-Map that was generated for the Wikipedia article on Machine Learning: One on Artificial Intellgence: And finally, one on Psychology: The results are similar on all the other topics I tried out. A few things I noticed:

i. I should try and involve bi-grams and trigrams too. I am pretty sure it will make the Word2Vec model itself way stronger, and thus improve the interpretation of terms with respect to the document.

ii. There are some unnecessary terms in the Mind Maps, but given the short length of the texts (compared to most text mining tasks), the Keyword extraction algorithm in the paper I mentioned before, seems really good.

iii. Maybe one could use this for brainstorming. Like you start out with a term(node) of your choice. Then, the framework suggests you terms to connect it to. Once you select one of them, you get further recommendations for it based on the context, etc. – Something like a Mind-Map helper.

Anyways, this was a long post, and thanks a lot if you stuck to the end :-). Cheers!