Explanation of min-char-rnn.py
In this post, we will try to explain the
function in Karpathy's min-char-rnn implementation. The source code can be found here: https://gist.github.com/karpathy/d4dee566867f8291f086
There are two parts in the
- forward pass
The weight updates for
are to some extent straightforward. On the other hand, the updates on
is more complicated. One of the reasons is that this weight is shared by all the hidden layers.
From the implementation code here(https://gist.github.com/karpathy/d4dee566867f8291f086#file-min-char-rnn-py-L48-L58
), we definitely can see some kind of "accumulation of gradient". In the section below, we will explain how we can derive the expression of the
. For the illustration purpose, we will assume there is only one neuron in the hidden layer. Therefore, the weight matrix becomes a scalar.