Recursively copying elements from one Graph to another in TensorFlow

In my previous post on Google’s TensorFlow, I had mentioned the idea of using the library for Genetic Programming applications. In my free time, I tried to map out how such an application would be run. The focus obviously wasn’t on building the smartest way to do GP, but rather on exploring if it was practically possible. One of the ideas that stuck out, was to have a Graph for each candidate solution – which would be populated based on cross-overed elements from the parent solutions’ Graphs. This would require a good API to copy computational elements from one Graph to another in TensorFlow. I tried digging around to see if such functionality was available, but couldn’t find any (atleast from a good exploring of the github repo).

I wrote some rudimentary code to accomplish this, and heres a basic outline of how it works:

Consider the example given on TensorFlow’s Get Started page.


import tensorflow as tf
import numpy as np

# Make 100 phony data points in NumPy.
x_data = np.float32(np.random.rand(2, 100)) # Random input
y_data = np.dot([0.100, 0.200], x_data) + 0.300

# Construct a linear model.
b = tf.Variable(tf.zeros([1]))
W = tf.Variable(tf.random_uniform([1, 2], -1.0, 1.0))
y = tf.matmul(W, x_data) + b

# Minimize the squared errors.
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)

# For initializing the variables.
init = tf.initialize_all_variables()

# Launch the graph
sess = tf.Session()
sess.run(init)

# Fit the plane.
for step in xrange(0, 201):
    sess.run(train)
    if step % 20 == 0:
        print step, sess.run(W), sess.run(b)

# Learns best fit is W: [[0.100  0.200]], b: [0.300]

Lets say you want to copy the main training element train to another graph as defined here:


to_graph = tf.Graph()

 

1) You first decide on a namespace inside which all the copied elements will exist in to_graph. This is not really required if the Graph you are copying to is empty. But its important to remember that element names matter a lot in TensorFlow’s workings. Therefore, to avoid naming conflicts, its better to define such a namespace. So basically, what would be called “C” in to_graph‘s namespace, would now be called “N/C” where “N” is the namespace String.

Lets assume the namespace we define the copied elements in our example to, is “CopiedOps”.


namespace = "CopiedOps"

 

2) You then copy all the variables first, one by one, using a dedicated function copy_variable_to_graph. Each time, you supply the original instance, the target graph (to_graph in this case), the namespace, and an extra dictionary called copied_variables. This dictionary is useful while copying the computational nodes (Operation instances, we will call them ops) in the next step. Since Variable instances act as inputs for ops, we need a way to keep track of them for later.

I initially wanted to combine this initialization of variables with the function that copies the computational elements, but I found it really tricky to capture the appropriate Variable instances based on their connections to ops. Anyways, since variables are more like parameter-storing units whose values are needed frequently, its better to initialize them separately.

The Variable instances in the above example are b and W. Heres how you would do it with my code:


copied_variables = {}
b1 = copy_variable_to_graph(b, to_graph, namespace, copied_variables)
W1 = copy_variable_to_graph(W, to_graph, namespace, copied_variables)

Ofcourse, if your code has a lot of variables, you could just store them all in a list and run the above function over all of them with a common dictionary for copied variables.

 

3) You then recursively copy all the computational nodes (ops, Placeholders) to the other graph. Now heres the nice things about the method- You only need to do it for the topmost node computational node. All connected inputs and Tensors are automatically taken care of!

For the above example, the train object constructed on line 16 is the ‘topmost’ node. So copying the whole learner is as simple as:

train_copy = copy_to_graph(train, to_graph, copied_variables, namespace)

Thats it! The other instances like y, optimizer, loss are automatically replicated in to_graph.

Theres also a helper function in case you want to find the equivalent of an element from the original graph, in to_graph:


loss_copy = get_copied(loss, to_graph, copied_variables, namespace)

 

4) You can now run the new node in to_graph. Remember to initialize a new Session instance linked to the graph, and initialize all Variables. So heres how you would go about it:


with to_graph.as_default():
    init1 = tf.initialize_all_variables()
    sess1 = tf.Session()
    sess1.run(init1)

    for step in xrange(0, 201):
        sess1.run(train_copy)
        if step % 20 == 0:
            print step, sess1.run(W1), sess1.run(b1)

This provides an output similar to what you would get from the Get Started original example.

The Code


import tensorflow as tf
from tensorflow.python.framework import ops
from copy import deepcopy


def copy_variable_to_graph(org_instance, to_graph, namespace,
                           copied_variables={}):
    """
    Copies the Variable instance 'org_instance' into the graph
    'to_graph', under the given namespace.
    The dict 'copied_variables', if provided, will be updated with
    mapping the new variable's name to the instance.
    """

    if not isinstance(org_instance, tf.Variable):
        raise TypeError(str(org_instance) + " is not a Variable")

    #The name of the new variable
    if namespace != '':
        new_name = (namespace + '/' +
                    org_instance.name[:org_instance.name.index(':')])
    else:
        new_name = org_instance.name[:org_instance.name.index(':')]

    #Get the collections that the new instance needs to be added to.
    #The new collections will also be a part of the given namespace,
    #except the special ones required for variable initialization and
    #training.
    collections = []
    for name, collection in org_instance.graph._collections.items():
        if org_instance in collection:
            if (name == ops.GraphKeys.VARIABLES or
                name == ops.GraphKeys.TRAINABLE_VARIABLES or
                namespace == ''):
                collections.append(name)
            else:
                collections.append(namespace + '/' + name)

    #See if its trainable.
    trainable = (org_instance in org_instance.graph.get_collection(
        ops.GraphKeys.TRAINABLE_VARIABLES))
    #Get the initial value
    with org_instance.graph.as_default():
        temp_session = tf.Session()
        init_value = temp_session.run(org_instance.initialized_value())

    #Initialize the new variable
    with to_graph.as_default():
        new_var = tf.Variable(init_value,
                              trainable,
                              name=new_name,
                              collections=collections,
                              validate_shape=False)

    #Add to the copied_variables dict
    copied_variables[new_var.name] = new_var

    return new_var


def copy_to_graph(org_instance, to_graph, copied_variables={}, namespace=""):
    """
    Makes a copy of the Operation/Tensor instance 'org_instance'
    for the graph 'to_graph', recursively. Therefore, all required
    structures linked to org_instance will be automatically copied.
    'copied_variables' should be a dict mapping pertinent copied variable
    names to the copied instances.
    
    The new instances are automatically inserted into the given 'namespace'.
    If namespace='', it is inserted into the graph's global namespace.
    However, to avoid naming conflicts, its better to provide a namespace.
    If the instance(s) happens to be a part of collection(s), they are
    are added to the appropriate collections in to_graph as well.
    For example, for collection 'C' which the instance happens to be a
    part of, given a namespace 'N', the new instance will be a part of
    'N/C' in to_graph.

    Returns the corresponding instance with respect to to_graph.

    TODO: Order of insertion into collections is not preserved
    """

    #The name of the new instance
    if namespace != '':
        new_name = namespace + '/' + org_instance.name
    else:
        new_name = org_instance.name

    #If a variable by the new name already exists, return the
    #correspondng tensor that will act as an input
    if new_name in copied_variables:
        return to_graph.get_tensor_by_name(
            copied_variables[new_name].name)

    #If an instance of the same name exists, return appropriately
    try:
        already_present = to_graph.as_graph_element(new_name,
                                                    allow_tensor=True,
                                                    allow_operation=True)
        return already_present
    except:
        pass

    #Get the collections that the new instance needs to be added to.
    #The new collections will also be a part of the given namespace.
    collections = []
    for name, collection in org_instance.graph._collections.items():
        if org_instance in collection:
            if namespace == '':
                collections.append(name)
            else:
                collections.append(namespace + '/' + name)
    
    #Take action based on the class of the instance

    if isinstance(org_instance, tf.python.framework.ops.Tensor):

        #If its a Tensor, it is one of the outputs of the underlying
        #op. Therefore, copy the op itself and return the appropriate
        #output.
        op = org_instance.op
        new_op = copy_to_graph(op, to_graph, copied_variables, namespace)
        output_index = op.outputs.index(org_instance)
        new_tensor = new_op.outputs[output_index]
        #Add to collections if any
        for collection in collections:
            to_graph.add_to_collection(collection, new_tensor)

        return new_tensor

    elif isinstance(org_instance, tf.python.framework.ops.Operation):

        op = org_instance

        #If it has an original_op parameter, copy it
        if op._original_op is not None:
            new_original_op = copy_to_graph(op._original_op, to_graph,
                                            copied_variables, namespace)
        else:
            new_original_op = None

        #If it has control inputs, call this function recursively on each.
        new_control_inputs = [copy_to_graph(x, to_graph, copied_variables,
                                            namespace)
                              for x in op.control_inputs]

        #If it has inputs, call this function recursively on each.
        new_inputs = [copy_to_graph(x, to_graph, copied_variables,
                                    namespace)
                      for x in op.inputs]

        #Make a new node_def based on that of the original.
        #An instance of tensorflow.core.framework.graph_pb2.NodeDef, it
        #stores String-based info such as name, device and type of the op.
        #Unique to every Operation instance.
        new_node_def = deepcopy(op._node_def)
        #Change the name
        new_node_def.name = new_name

        #Copy the other inputs needed for initialization
        output_types = op._output_types[:]
        input_types = op._input_types[:]

        #Make a copy of the op_def too.
        #Its unique to every _type_ of Operation.
        op_def = deepcopy(op._op_def)

        #Initialize a new Operation instance
        new_op = tf.python.framework.ops.Operation(new_node_def,
                                                   to_graph,
                                                   new_inputs,
                                                   output_types,
                                                   new_control_inputs,
                                                   input_types,
                                                   new_original_op,
                                                   op_def)
        #Use Graph's hidden methods to add the op
        to_graph._add_op(new_op)
        to_graph._record_op_seen_by_control_dependencies(new_op)
        for device_function in reversed(to_graph._device_function_stack):
            new_op._set_device(device_function(new_op))

        return new_op

    else:
        raise TypeError("Could not copy instance: " + str(org_instance))


def get_copied(original, graph, copied_variables={}, namespace=""):
    """
    Get a copy of the instance 'original', present in 'graph', under
    the given 'namespace'.
    'copied_variables' is a dict mapping pertinent variable names to the
    copy instances.
    """

    #The name of the copied instance
    if namespace != '':
        new_name = namespace + '/' + original.name
    else:
        new_name = original.name

    #If a variable by the name already exists, return it
    if new_name in copied_variables:
        return copied_variables[new_name]

    return graph.as_graph_element(new_name, allow_tensor=True,
                                  allow_operation=True)

Working with feeding is pretty simple too:


>>> x = tf.placeholder("float")
>>> a = tf.constant(3, "float")
>>> y = tf.add(x, a)
>>> namespace = "CopiedOps"
>>> to_graph = tf.Graph()
>>> copied_variables = {}
>>> y1 = copy_to_graph(y, to_graph, namespace)
>>> x1 = get_copied(x, to_graph, namespace)
>>> with to_graph.as_default():
    sess = tf.Session()
    print sess.run(y1, feed_dict={x1: 5})

    
8.0

I guess thats all for my hacking around with TensorFlow for the week. If you intend to use this code, please note that it may not be perfect at doing what it says. I haven’t tried it out with all sorts of TensorFlow data structures as yet, so be open to getting an Exception or two that you may have to fix. Infact, do drop me a comment or mail so I can make this code as fool-proof as I can. Cheers!

6 thoughts on “Recursively copying elements from one Graph to another in TensorFlow

    1. Interesting! ‘Training’ a graph in TensorFlow essentially means putting the appropriate value(s) into the required Variable instances. So this code will sadly not copy the info gathered after training. However, you can hack it easily to implement this functionality. Look at line 45 in the big chunk of code. In the place of whats written (which copies the initial value of the Variable), write “init_value = temp_session.run(org_instance)”. This will copy the *current value* into the new Variable. With this, you can use the code to copy over trained graphs 🙂

  1. For a certain usecase, I am building more than one DNN model based on Tensorflow’s skflow library. I partition my data into minibatches and use partial_fit for fitting. After every cycle of partial_fit, I would like to copy the weights of the first n-hidden layers of one TensorFlowDNNClassifier model to another TensorFlowDNNClassifier model. Then continue learning/copying using partial_fit. (The topology of the first n-hidden layers for both models are identical.)

    I know how to retrieve weights from classifier1:

    classifier1.get_tensor_value(‘dnn/layer0/Linear/Matrix:0’)

    But I don’t know how to copy their values to a classifier2!

    I don’t know how to add ops into the graph to update the weights. Could you please help me do this?

  2. Dear Joscha,
    I found your code suiting another problem, where I had to deeply copy a tensor within the same graph, as to force the tensor to be recalculated after variables changed. As such I am about to use your code (modified) in my thesis and would also like to publish the result under GPL. Would you be fine with that? I would certainly name, where the appropriate function was derived from.

    Thank you!

  3. * Sachin, not Joscha. I somehow mixed the “J” of your second name with Sachin (-;

Leave a reply to perrito Cancel reply