There are two steps in distributing the incoming traffic:
- Map the incoming data to a numerical value. This can be done by applying a hash function to the input data.
- Map the hash value to server.
It's the second step that requires special attention. The problem that consistent hashing aims to solve is the following: when we add or remove servers from a cluster, we don't want to severely impact the traffic distribution (e.g. mapping from hash value to servers). The naive hashing that uses modular operators cannot achieve this goal. When the number of servers in the cluster changes, we need to redistribute incoming traffic for almost all the data because the mapping between hash values and the server ID changes significantly for all the input.
Consistent hashing is designed to solve the issue. Here are the steps to construct a consistent hashing:
- Select a hash function and make the range of hash values to form a ring.
- Uniformly place virtual nodes on the ring.
- Map virtual nodes to physical servers in a random and uniform way.
In order to route the incoming data, we first calculate the hash value of it and locate it on the ring. Then we go clockwise and find the first virtual node. The physical server that is associated with that virtual node will handle the data.
In the figure below, the ring represents all the hash values. Each small circle represents a virtual node and the color represents a physical server. In this example, we have 12 virtual nodes and 3 physical servers (red, blue and green). The segment of the ring represents a set of hash values and each physical server handles segments that have the same color.
Suppose we want to remove the red server from the cluster. This translates to removing the red circles from the ring. As the figure below illustrates, only hash values in the red segments are redistributed.
----- END -----
©2019 - 2021 all rights reserved