This post explains what a recurrent neural network – or RNN model – is in machine learning.
RNN is a model that works with input sequences. Difference with sequences and normal data is that for sequence, order matters. Kind of like time-series but this is discrete time. This also means inputs are not independent of each other.
RNN takes input as a sequence of words. The output at each step, is a hidden state vector and an output.
Hidden state vector is a function of previous hidden state and current input $x$. It is the memory of the network. The formula is .
Output is a possibility across vocabulary .
Here size of parameters:
- is a matrix . m is the size of the hidden state vector, n is the size of input vector.
- is a matrix . m is the size of the hidden state vector.
- is a matrix of size h * m. h is the size of the output vector, m is the size of the hidden state vector.
Depending on the context, the output of an RNN can also be a single vector called context. This will be discussed in the next post.