What is a recurrent neural network (RNN)

This post explains what a recurrent neural network – or RNN model – is in machine learning.

RNN is a model that works with input sequences. Difference with sequences and normal data is that for sequence, order matters. Kind of like time-series but this is discrete time. This also means inputs are not independent of each other.

RNN takes input as a sequence of words. The output at each step, is a hidden state vector and an output.

Hidden state vector is a function of previous hidden state s_{t-1} and current input $x$. It is the memory of the network. The formula is s_t = f(Ux_t + Ws_{t-1}).

Output is a possibility across vocabulary o_t = \text{softmax} (Vs_t) .

Here size of parameters:

  • U is a matrix m * n. m is the size of the hidden state vector, n is the size of input vector.
  • W is a matrix m * m. m is the size of the hidden state vector.
  • V is a matrix of size h * m. h is the size of the output vector, m is the size of the hidden state vector.

Depending on the context, the output of an RNN can also be a single vector called context. This will be discussed in the next post.

Reference:

Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs

Share this post

Leave a Reply

Your email address will not be published. Required fields are marked *