I have recently been revisiting my study of Deep Learning, and I thought of doing some experiments with Wave prediction using LSTMs. This is nothing new, just more of a log of some tinkering done using TensorFlow.

**The Problem**

The basic input to the model is a 2-D vector – each number corresponding to the value attained by the corresponding wave. Each wave in turn is: (a constant + a sine wave + a cosine wave). The waves themselves have different magnitudes, initial phases and frequencies. The goal is to predict the values that will be attained a certain (I chose 23) steps ahead on the curve.

So first off, heres the **wave-generation code**:

##Producing Training/Testing inputs+output from numpy import array, sin, cos, pi from random import random #Random initial angles angle1 = random() angle2 = random() #The total 2*pi cycle would be divided into 'frequency' #number of steps frequency1 = 300 frequency2 = 200 #This defines how many steps ahead we are trying to predict lag = 23 def get_sample(): """ Returns a [[sin value, cos value]] input. """ global angle1, angle2 angle1 += 2*pi/float(frequency1) angle2 += 2*pi/float(frequency2) angle1 %= 2*pi angle2 %= 2*pi return array([array([ 5 + 5*sin(angle1) + 10*cos(angle2), 7 + 7*sin(angle2) + 14*cos(angle1)])]) sliding_window = [] for i in range(lag - 1): sliding_window.append(get_sample()) def get_pair(): """ Returns an (current, later) pair, where 'later' is 'lag' steps ahead of the 'current' on the wave(s) as defined by the frequency. """ global sliding_window sliding_window.append(get_sample()) input_value = sliding_window[0] output_value = sliding_window[-1] sliding_window = sliding_window[1:] return input_value, output_value

Essentially, you just need to call `get_pair`

to get an ‘input, output’ pair – the output being 23 time intervals ahead on the curve. Each have the NumPy dimensionality of [1, 2]. The first value ‘1’ means that the batch size is 1 – we will feed one input at a time while training/testing.

Now, I don’t pass the input directly into the LSTM. I try to improve the LSTM’s *understanding* of the input, by providing its first and second derivatives as well. So, if the input at time *t* is *x(t)*, the derivative is *x'(t) = (x(t) – x(t-1))*. Following the analogy, *x”(t) = (x'(t) – x'(t-1))*. Here’s the code for that:

#Input Params input_dim = 2 #To maintain state last_value = array([0 for i in range(input_dim)]) last_derivative = array([0 for i in range(input_dim)]) def get_total_input_output(): """ Returns the overall Input and Output as required by the model. The input is a concatenation of the wave values, their first and second derivatives. """ global last_value, last_derivative raw_i, raw_o = get_pair() raw_i = raw_i[0] l1 = list(raw_i) derivative = raw_i - last_value l2 = list(derivative) last_value = raw_i l3 = list(derivative - last_derivative) last_derivative = derivative return array([l1 + l2 + l3]), raw_o

So the overall input to the model becomes a concatenated version of *x(t), x'(t), x”(t)*. The obvious question to ask would be- Why not do this in the TensorFlow Graph itself? I did try it, and for some reason (which I don’t understand yet), there seems to seep in some noise into the Variables that act as memory units to maintain state.

But anyways, here’s the code for that too:

#Imports import tensorflow as tf from tensorflow.models.rnn.rnn import * #Input Params input_dim = 2 ##The Input Layer as a Placeholder #Since we will provide data sequentially, the 'batch size' #is 1. input_layer = tf.placeholder(tf.float32, [1, input_dim]) ##First Order Derivative Layer #This will store the last recorded value last_value1 = tf.Variable(tf.zeros([1, input_dim])) #Subtract last value from current sub_value1 = tf.sub(input_layer, last_value1) #Update last recorded value last_assign_op1 = last_value1.assign(input_layer) ##Second Order Derivative Layer #This will store the last recorded derivative last_value2 = tf.Variable(tf.zeros([1, input_dim])) #Subtract last value from current sub_value2 = tf.sub(sub_value1, last_value2) #Update last recorded value last_assign_op2 = last_value2.assign(sub_value1) ##Overall input to the LSTM #x and its first and second order derivatives as outputs of #earlier layers zero_order = last_assign_op1 first_order = last_assign_op2 second_order = sub_value2 #Concatenated total_input = tf.concat(1, [zero_order, first_order, second_order])

If you have an idea of what might be going wrong, do leave a comment! In any case, the core model follows.

**The Model**

So heres the the **TensorFlow model**:

**1)** The Imports:

#Imports import tensorflow as tf from tensorflow.models.rnn.rnn import *

**2)** Our **input layer**, as always, will be a `Placeholder`

instance with the appropriate type and dimensions:

#Input Params input_dim = 2 ##The Input Layer as a Placeholder #Since we will provide data sequentially, the 'batch size' #is 1. input_layer = tf.placeholder(tf.float32, [1, input_dim*3])

**3)** We then define out **LSTM layer**. If you are new to Recurrent Neural Networks or LSTMs, here are two excellent resources:

- This blog post by Christopher Olah
- This deeplearning.net post. It defines the math behind the LSTM cell pretty succinctly.

If you like to see implementation-level details too, then heres the relevant portion of the TensorFlow source for you.

Now the LSTM layer:

##The LSTM Layer-1 #The LSTM Cell initialization lstm_layer1 = rnn_cell.BasicLSTMCell(input_dim*3) #The LSTM state as a Variable initialized to zeroes lstm_state1 = tf.Variable(tf.zeros([1, lstm_layer1.state_size])) #Connect the input layer and initial LSTM state to the LSTM cell lstm_output1, lstm_state_output1 = lstm_layer1(input_layer, lstm_state1, scope="LSTM1") #The LSTM state will get updated lstm_update_op1 = lstm_state1.assign(lstm_state_output1)

We only use 1 LSTM layer. Providing a scope to the LSTM layer call (on line 8) helps in avoiding variable-scope conflicts if you have multiple LSTM layers.

The LSTM layer is followed by a simple linear regression layer, whose output becomes the final output.

##The Regression-Output Layer1 #The Weights and Biases matrices first output_W1 = tf.Variable(tf.truncated_normal([input_dim*3, input_dim])) output_b1 = tf.Variable(tf.zeros([input_dim])) #Compute the output final_output = tf.matmul(lstm_output1, output_W1) + output_b1

We have finished defining the model itself. But now, we need to initialize the **training components**. These help fine-tune the parameters/state of the model to make it ready for deployment. We won’t be using these components post training (ideally).

**4)** First, a `Placeholder`

for the **correct output** associated with the input:

##Input for correct output (for training) correct_output = tf.placeholder(tf.float32, [1, input_dim])

Then, the error will be computed using the LSTM output and the correct output as the *Sum-of-Squares* loss.

##Calculate the Sum-of-Squares Error error = tf.pow(tf.sub(final_output, correct_output), 2)

Finally, we initialize an `Optimizer`

to adjust the weights for the LSTM layer. I tried Gradient Descent, RMSProp as well as Adam Optimization. Adam works best for this model. Gradient Descent works really bad on LSTMs for some reason (that I can’t grasp right now). If you want to read more about **Adam-Optimization**, read this paper. I decided on the learning rate of 0.0006 after a lot of trial-and-error, and it seems to work best for the number of iterations I use (100k).

##The Optimizer #Adam works best train_step = tf.train.AdamOptimizer(0.0006).minimize(error)

**5)** Finally, we initialize the Session and all required Variables as always.

##Session sess = tf.Session() #Initialize all Variables sess.run(tf.initialize_all_variables())

**The Training**

Here’s the rudimentary code I used for training the model:

##Training actual_output1 = [] actual_output2 = [] network_output1 = [] network_output2 = [] x_axis = [] for i in range(80000): input_v, output_v = get_total_input_output() _, _, network_output = sess.run([lstm_update_op1, train_step, final_output], feed_dict = { input_layer: input_v, correct_output: output_v}) actual_output1.append(output_v[0][0]) actual_output2.append(output_v[0][1]) network_output1.append(network_output[0][0]) network_output2.append(network_output[0][1]) x_axis.append(i) import matplotlib.pyplot as plt plt.plot(x_axis, network_output1, 'r-', x_axis, actual_output1, 'b-') plt.show() plt.plot(x_axis, network_output2, 'r-', x_axis, actual_output2, 'b-') plt.show()

Training takes almost a minute on my Intel i5 machine.

Consider the first wave. Initially, the network output is far from the correct one(The red one is the LSTM output):

But by the end, it fits pretty well:

Similar trends are seen for the second wave:

**Testing**

In practical scenarios, the state at which you end training would rarely be the state at which you deploy. Therefore, prior to testing, I ‘fastforward’ both the waves first. Then, I flush the contents of the LSTM cell (mind you, the learned matrix parameters for the individual functions don’t change).

##Testing for i in range(200): get_total_input_output() #Flush LSTM state sess.run(lstm_state1.assign(tf.zeros([1, lstm_layer1.state_size])))

And here’s the rest of the testing code:

actual_output1 = [] actual_output2 = [] network_output1 = [] network_output2 = [] x_axis = [] for i in range(1000): input_v, output_v = get_total_input_output() _, network_output = sess.run([lstm_update_op1, final_output], feed_dict = { input_layer: input_v, correct_output: output_v}) actual_output1.append(output_v[0][0]) actual_output2.append(output_v[0][1]) network_output1.append(network_output[0][0]) network_output2.append(network_output[0][1]) x_axis.append(i) import matplotlib.pyplot as plt plt.plot(x_axis, network_output1, 'r-', x_axis, actual_output1, 'b-') plt.show() plt.plot(x_axis, network_output2, 'r-', x_axis, actual_output2, 'b-') plt.show()

Its pretty similar to the training one, except for one small difference: I don’t run the training op anymore. Therefore, those components of the Graph don’t work at all.

Here’s the correct output with the model’s output for the first wave:

Thats all for now! I am not a deep learning expert, and I still experimenting with RNNs, so do leave comments/suggestions if you have any! Cheers!

First, thank you for providing this well explained guide on how to use an LSTM on time series data. I’m just about to learn TensorFlow and LSTMs. So, this post is a perfect to get started, in contrast to the official RNN tutorial on tensorflow.org, which is pretty expandable.

I just wanted to leave one comment for a possible way to improve this. In this paper (https://arxiv.org/abs/1604.08880), they suggest a novel way to regularize a RNN model by resetting the state after a mini-batch by 50% chance. I know, that you are not really using mini-batches (or lets say just a size of 1), but when I do not use the *lstm_update_op1*, but keep the state locally and clear it in every interation (or with a chance to clear it with zeros), I get better results. Especially better in the peaks of the sin/cos curves.

This was exactly what I was playing around with a few weeks back! Or something on those lines – basically randomly resetting the state once in a while, to (possibly) achieve better results. Will have a look at the paper. Thanks!

thanks for the wonderful tutorial! i found your code very easy to understand. However, for some reasons I am unable to get the results….the network_outputs do not improve at all over time. I literally just copied and pasted your code. any reason why this may be the case?

tracking my output_W1, it does not seem to get updated at all!

oh silly me! i forgot to run the optimiser! lol I thought just copied and pasted 🙂

anyway thanks a lot for this code!