System Design - Large Scale Distributed Deep Networks

Subscribe Send me a message home page tags


Introduction

This post is a reading note of the paper Large Scale Distributed Deep Networks. We will present the general idea of how we can train a large deep networks with billions of parameters.

Related Readings

Setup

There are not many strategies to scale a system. The most important techniques are partition and duplication, which are kind of special forms of the more general strategy divide-and-conquer. Training large deep neural network is no exception and the challenge is how we can do it efficiently and correctly.

There are two types of data in a neural network:

Training data is more static and persistent. It's similar to user data in a web application. We would expect that it's relatively easy to partition training data.

Parameters are the state of the neural networks and they will be shared among the "application servers". We cannot store billions of parameters in one single machine, therefore, parameters are saved to multiple servers.

parameter_servers.png

In the context of large deep networks, application servers are the machines that perform the training task. Because the neural network is so large and we cannot store the whole network in one single machine, the training task will be performed by a group of servers. Each server contains a subset of parameters and they need to communicate with each other during the training process so that the teammate servers can get the updates of parameters.

model_data_partition.png

Recall that training neural networks is a iterative process because we need to go through the training data multiple times so that the model can converge. This provides an opportunity to apply the replication strategy: A model in two different epoch could be considered as two different model instances. This is the "Model Replicas" concept mentioned in the paper.

Put Everything Together

everything_together.png

Summary & Comment

----- END -----

Welcome to join reddit self-learning community.
Send me a message Subscribe to blog updates

Want some fun stuff?

/static/shopping_demo.png