`lossFun`

function in Karpathy's min-char-rnn implementation. The source code can be found here: https://gist.github.com/karpathy/d4dee566867f8291f086.
There are two parts in the `lossFun`

function:
- forward pass
- backpropagation

`Wxh`

, `Why`

are to some extent straightforward. On the other hand, the updates on `Whh`

is more complicated. One of the reasons is that this weight is shared by all the hidden layers.
From the implementation code here(https://gist.github.com/karpathy/d4dee566867f8291f086#file-min-char-rnn-py-L48-L58), we definitely can see some kind of "accumulation of gradient". In the section below, we will explain how we can derive the expression of the `dh`

and the `dhnext`

. For the illustration purpose, we will assume there is only one neuron in the hidden layer. Therefore, the weight matrix becomes a scalar.
----- END -----

©2019 - 2021 all rights reserved