Explanation of min-char-rnn.py
In this post, we will try to explain the
lossFun
function in Karpathy's min-char-rnn implementation. The source code can be found here:
https://gist.github.com/karpathy/d4dee566867f8291f086.
There are two parts in the
lossFun
function:
- forward pass
- backpropagation
The weight updates for
Wxh
,
Why
are to some extent straightforward. On the other hand, the updates on
Whh
is more complicated. One of the reasons is that this weight is shared by all the hidden layers.
From the implementation code here(
https://gist.github.com/karpathy/d4dee566867f8291f086#file-min-char-rnn-py-L48-L58), we definitely can see some kind of "accumulation of gradient". In the section below, we will explain how we can derive the expression of the
dh
and the
dhnext
. For the illustration purpose, we will assume there is only one neuron in the hidden layer. Therefore, the weight matrix becomes a scalar.