# Predicting Trigonometric Waves few steps ahead with LSTMs in TensorFlow

I have recently been revisiting my study of Deep Learning, and I thought of doing some experiments with Wave prediction using LSTMs. This is nothing new, just more of a log of some tinkering done using TensorFlow.

### The Problem

The basic input to the model is a 2-D vector – each number corresponding to the value attained by the corresponding wave. Each wave in turn is: (a constant + a sine wave + a cosine wave). The waves themselves have different magnitudes, initial phases and frequencies. The goal is to predict the values that will be attained a certain (I chose 23) steps ahead on the curve.

So first off, heres the wave-generation code:

```
##Producing Training/Testing inputs+output
from numpy import array, sin, cos, pi
from random import random

#Random initial angles
angle1 = random()
angle2 = random()

#The total 2*pi cycle would be divided into 'frequency'
#number of steps
frequency1 = 300
frequency2 = 200
#This defines how many steps ahead we are trying to predict
lag = 23

def get_sample():
"""
Returns a [[sin value, cos value]] input.
"""
global angle1, angle2
angle1 += 2*pi/float(frequency1)
angle2 += 2*pi/float(frequency2)
angle1 %= 2*pi
angle2 %= 2*pi
return array([array([
5 + 5*sin(angle1) + 10*cos(angle2),
7 + 7*sin(angle2) + 14*cos(angle1)])])

sliding_window = []

for i in range(lag - 1):
sliding_window.append(get_sample())

def get_pair():
"""
Returns an (current, later) pair, where 'later' is 'lag'
steps ahead of the 'current' on the wave(s) as defined by the
frequency.
"""

global sliding_window
sliding_window.append(get_sample())
input_value = sliding_window
output_value = sliding_window[-1]
sliding_window = sliding_window[1:]
return input_value, output_value

```

Essentially, you just need to call `get_pair` to get an ‘input, output’ pair – the output being 23 time intervals ahead on the curve. Each have the NumPy dimensionality of [1, 2]. The first value ‘1’ means that the batch size is 1 – we will feed one input at a time while training/testing.

Now, I don’t pass the input directly into the LSTM. I try to improve the LSTM’s understanding of the input, by providing its first and second derivatives as well. So, if the input at time t is x(t), the derivative is x'(t) = (x(t) – x(t-1)). Following the analogy, x”(t) = (x'(t) – x'(t-1)). Here’s the code for that:

```
#Input Params
input_dim = 2

#To maintain state
last_value = array([0 for i in range(input_dim)])
last_derivative = array([0 for i in range(input_dim)])

def get_total_input_output():
"""
Returns the overall Input and Output as required by the model.
The input is a concatenation of the wave values, their first and
second derivatives.
"""
global last_value, last_derivative
raw_i, raw_o = get_pair()
raw_i = raw_i
l1 = list(raw_i)
derivative = raw_i - last_value
l2 = list(derivative)
last_value = raw_i
l3 = list(derivative - last_derivative)
last_derivative = derivative
return array([l1 + l2 + l3]), raw_o

```

So the overall input to the model becomes a concatenated version of x(t), x'(t), x”(t). The obvious question to ask would be- Why not do this in the TensorFlow Graph itself? I did try it, and for some reason (which I don’t understand yet), there seems to seep in some noise into the Variables that act as memory units to maintain state.

But anyways, here’s the code for that too:

```
#Imports
import tensorflow as tf
from tensorflow.models.rnn.rnn import *

#Input Params
input_dim = 2

##The Input Layer as a Placeholder
#Since we will provide data sequentially, the 'batch size'
#is 1.
input_layer = tf.placeholder(tf.float32, [1, input_dim])

##First Order Derivative Layer
#This will store the last recorded value
last_value1 = tf.Variable(tf.zeros([1, input_dim]))
#Subtract last value from current
sub_value1 = tf.sub(input_layer, last_value1)
#Update last recorded value
last_assign_op1 = last_value1.assign(input_layer)

##Second Order Derivative Layer
#This will store the last recorded derivative
last_value2 = tf.Variable(tf.zeros([1, input_dim]))
#Subtract last value from current
sub_value2 = tf.sub(sub_value1, last_value2)
#Update last recorded value
last_assign_op2 = last_value2.assign(sub_value1)

##Overall input to the LSTM
#x and its first and second order derivatives as outputs of
#earlier layers
zero_order = last_assign_op1
first_order = last_assign_op2
second_order = sub_value2
#Concatenated
total_input = tf.concat(1, [zero_order, first_order, second_order])

```

If you have an idea of what might be going wrong, do leave a comment! In any case, the core model follows.

### The Model

So heres the the TensorFlow model:

1) The Imports:

```
#Imports
import tensorflow as tf
from tensorflow.models.rnn.rnn import *

```

2) Our input layer, as always, will be a `Placeholder` instance with the appropriate type and dimensions:

```
#Input Params
input_dim = 2

##The Input Layer as a Placeholder
#Since we will provide data sequentially, the 'batch size'
#is 1.
input_layer = tf.placeholder(tf.float32, [1, input_dim*3])

```

3) We then define out LSTM layer. If you are new to Recurrent Neural Networks or LSTMs, here are two excellent resources:

1. This blog post by Christopher Olah
2. This deeplearning.net post. It defines the math behind the LSTM cell pretty succinctly.

If you like to see implementation-level details too, then heres the relevant portion of the TensorFlow source for you.

Now the LSTM layer:

```
##The LSTM Layer-1
#The LSTM Cell initialization
lstm_layer1 = rnn_cell.BasicLSTMCell(input_dim*3)
#The LSTM state as a Variable initialized to zeroes
lstm_state1 = tf.Variable(tf.zeros([1, lstm_layer1.state_size]))
#Connect the input layer and initial LSTM state to the LSTM cell
lstm_output1, lstm_state_output1 = lstm_layer1(input_layer, lstm_state1,
scope=&quot;LSTM1&quot;)
#The LSTM state will get updated
lstm_update_op1 = lstm_state1.assign(lstm_state_output1)

```

We only use 1 LSTM layer. Providing a scope to the LSTM layer call (on line 8) helps in avoiding variable-scope conflicts if you have multiple LSTM layers.

The LSTM layer is followed by a simple linear regression layer, whose output becomes the final output.

```
##The Regression-Output Layer1
#The Weights and Biases matrices first
output_W1 = tf.Variable(tf.truncated_normal([input_dim*3, input_dim]))
output_b1 = tf.Variable(tf.zeros([input_dim]))
#Compute the output
final_output = tf.matmul(lstm_output1, output_W1) + output_b1

```

We have finished defining the model itself. But now, we need to initialize the training components. These help fine-tune the parameters/state of the model to make it ready for deployment. We won’t be using these components post training (ideally).

4) First, a `Placeholder` for the correct output associated with the input:

```
##Input for correct output (for training)
correct_output = tf.placeholder(tf.float32, [1, input_dim])

```

Then, the error will be computed using the LSTM output and the correct output as the Sum-of-Squares loss.

```
##Calculate the Sum-of-Squares Error
error = tf.pow(tf.sub(final_output, correct_output), 2)

```

Finally, we initialize an `Optimizer` to adjust the weights for the LSTM layer. I tried Gradient Descent, RMSProp as well as Adam Optimization. Adam works best for this model. Gradient Descent works really bad on LSTMs for some reason (that I can’t grasp right now). If you want to read more about Adam-Optimization, read this paper. I decided on the learning rate of 0.0006 after a lot of trial-and-error, and it seems to work best for the number of iterations I use (100k).

```
##The Optimizer

```

5) Finally, we initialize the Session and all required Variables as always.

```
##Session
sess = tf.Session()
#Initialize all Variables
sess.run(tf.initialize_all_variables())

```

The Training

Here’s the rudimentary code I used for training the model:

```
##Training

actual_output1 = []
actual_output2 = []
network_output1 = []
network_output2 = []
x_axis = []

for i in range(80000):
input_v, output_v = get_total_input_output()
_, _, network_output = sess.run([lstm_update_op1,
train_step,
final_output],
feed_dict = {
input_layer: input_v,
correct_output: output_v})

actual_output1.append(output_v)
actual_output2.append(output_v)
network_output1.append(network_output)
network_output2.append(network_output)
x_axis.append(i)

import matplotlib.pyplot as plt
plt.plot(x_axis, network_output1, 'r-', x_axis, actual_output1, 'b-')
plt.show()
plt.plot(x_axis, network_output2, 'r-', x_axis, actual_output2, 'b-')
plt.show()

```

Training takes almost a minute on my Intel i5 machine.

Consider the first wave. Initially, the network output is far from the correct one(The red one is the LSTM output): But by the end, it fits pretty well: Similar trends are seen for the second wave:

### Testing

In practical scenarios, the state at which you end training would rarely be the state at which you deploy. Therefore, prior to testing, I ‘fastforward’ both the waves first. Then, I flush the contents of the LSTM cell (mind you, the learned matrix parameters for the individual functions don’t change).

```
##Testing

for i in range(200):
get_total_input_output()

#Flush LSTM state
sess.run(lstm_state1.assign(tf.zeros([1, lstm_layer1.state_size])))

```

And here’s the rest of the testing code:

```
actual_output1 = []
actual_output2 = []
network_output1 = []
network_output2 = []
x_axis = []

for i in range(1000):
input_v, output_v = get_total_input_output()
_, network_output = sess.run([lstm_update_op1,
final_output],
feed_dict = {
input_layer: input_v,
correct_output: output_v})

actual_output1.append(output_v)
actual_output2.append(output_v)
network_output1.append(network_output)
network_output2.append(network_output)
x_axis.append(i)

import matplotlib.pyplot as plt
plt.plot(x_axis, network_output1, 'r-', x_axis, actual_output1, 'b-')
plt.show()
plt.plot(x_axis, network_output2, 'r-', x_axis, actual_output2, 'b-')
plt.show()

```

Its pretty similar to the training one, except for one small difference: I don’t run the training op anymore. Therefore, those components of the Graph don’t work at all.

Here’s the correct output with the model’s output for the first wave: Thats all for now! I am not a deep learning expert, and I still experimenting with RNNs, so do leave comments/suggestions if you have any! Cheers!

## 6 thoughts on “Predicting Trigonometric Waves few steps ahead with LSTMs in TensorFlow”

1. Benjamin Sautermeister says:

First, thank you for providing this well explained guide on how to use an LSTM on time series data. I’m just about to learn TensorFlow and LSTMs. So, this post is a perfect to get started, in contrast to the official RNN tutorial on tensorflow.org, which is pretty expandable.

I just wanted to leave one comment for a possible way to improve this. In this paper (https://arxiv.org/abs/1604.08880), they suggest a novel way to regularize a RNN model by resetting the state after a mini-batch by 50% chance. I know, that you are not really using mini-batches (or lets say just a size of 1), but when I do not use the *lstm_update_op1*, but keep the state locally and clear it in every interation (or with a chance to clear it with zeros), I get better results. Especially better in the peaks of the sin/cos curves.

1. srjoglekar246 says:

This was exactly what I was playing around with a few weeks back! Or something on those lines – basically randomly resetting the state once in a while, to (possibly) achieve better results. Will have a look at the paper. Thanks!

2. Jim Ang says:

thanks for the wonderful tutorial! i found your code very easy to understand. However, for some reasons I am unable to get the results….the network_outputs do not improve at all over time. I literally just copied and pasted your code. any reason why this may be the case?

3. Jim Ang says:

tracking my output_W1, it does not seem to get updated at all!

1. Jim Ang says:

oh silly me! i forgot to run the optimiser! lol I thought just copied and pasted 🙂

anyway thanks a lot for this code!

4. Sai Keerthana Krishna Kumar says:

lstm_output1, lstm_state_output1 = lstm_layer1(input_layer, lstm_state1,
scope="LSTM1")
This LSTM layer does not run for me.. When I give this, an error setting an array element with a sequence pops.. Any idea how I can solve this