Explanation of min-char-rnn.py

Subscribe Send me a message home page tags


In this post, we will try to explain the lossFun function in Karpathy's min-char-rnn implementation. The source code can be found here: https://gist.github.com/karpathy/d4dee566867f8291f086. There are two parts in the lossFun function:
  1. forward pass
  2. backpropagation
The weight updates for Wxh, Why are to some extent straightforward. On the other hand, the updates on Whh is more complicated. One of the reasons is that this weight is shared by all the hidden layers. From the implementation code here(https://gist.github.com/karpathy/d4dee566867f8291f086#file-min-char-rnn-py-L48-L58), we definitely can see some kind of "accumulation of gradient". In the section below, we will explain how we can derive the expression of the dh and the dhnext. For the illustration purpose, we will assume there is only one neuron in the hidden layer. Therefore, the weight matrix becomes a scalar.

code_img_3.png


min-char-rnn-1.png
min-char-rnn-2.png

----- END -----

If you have questions about this post, you could find me on Discord.
Send me a message Subscribe to blog updates

Want some fun stuff?

/static/shopping_demo.png