The lower the gradient of the network, the slower it is going to update the weights. If we were to train the original neurons on 1,000 epochs, through each pass back in time those neurons would take longer to train meaning your whole network wouldn’t be trained properly, providing inaccurate results. If your weight recurring is too high, this is called an exploding gradient problem.

We can resolve these problems by doing the following:

Exploding Gradient Problem
- Truncated Backpropagation - you stop backpropagating after a set point.
- Penalties - the gradient can be artificial reduced.
- Gradient Clipping - maximum limit of the gradient, never go over this value. If it does, it will stay at that value.
Vanishing Gradient Problem
- Weight Initialization - setting your weights to minimise the likely of vanishing gradient.
- Echo State Networks
- Long Short-Term Memory Networks (LSTMs)

Long Short-Term Memory Network (LSTM)

Long Short-Term Memory Networks make the Wrec from all neural networks at each time step add up to 1.

An amazing article from Colah for more information can be found here: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

LSTM

See the code here for an example of a RNN.

# Part 2 - Building the RNN

# Importing the Keras libraries and packages
from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout

# Initialising the RNN
regressor = Sequential()

# Adding the first LSTM layer and some Dropout regularisation
"""
50 neurons in our layer, return sequences is used when having additional layers.
Input shape only needs the timesteps and input_dim as the batch_size is taken into account automatically.
"""
regressor.add(LSTM( units = 50, return_sequences = True, input_shape = (X_train.shape[1], 1) ))
# Ignore 20% of the neurons
regressor.add(Dropout(0.2))

# Adding the second LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.2))

# Adding the third LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.2))

# Adding the fourth LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50))
regressor.add(Dropout(0.2))

# Adding the output layer
regressor.add(Dense(units = 1))

# Compiling the RNN
regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')

# Fitting the RNN to the Training set
regressor.fit(X_train, y_train, epochs = 100, batch_size = 32)

Table of Contents

Table of Contents

Recurrent Neural Networks (RNN)

The Vanishing Gradient Problem

Long Short-Term Memory Network (LSTM)