- Paxos Made Simple
- Original Paxos Paper - Lamport
- Paxos Made Live - An Engineering Perspective
- Understanding Paxos Algorithm
Paxos algorithm has three components
These are called agents. A process can be any agent(s).
The goal of Paxos algorithm is to make participants in the cluster reach a consensus on a single value. The value is arbitrary but it should be a value proposed by participants. If we consider Paxos algorithm a process, it has different states in different stages. At the beginning of the process, the state consists of \(N\) arbitrary values proposed by participants, where \(N\) is the number of participants in the cluster. Eventually (or at the end of the process), the state should consist of a single value that all participants agree on.
Now the question is how this state is represented in Paxos algorithm? In fact, the ensemble of the state of the acceptors is the state of the Paxos algorithm.
The state of an acceptor consists of
- The most recently accepted proposal.
- The current position of the acceptor.
According to the paper, a proposal is a pair of proposal number and proposed value. The proposal number is crucial in Paxos algorithm. It introduces a concept of progress. More specifically, if an acceptor acknowledges a prepare mesasge with proposal number \(m\) then it can no longer accept any proposals whose proposal number is strictly less than \(m\). To some extent, the prepare message with proposal number \(m\) "moves" acceptors to "position" \(m\) and acceptors can never go backward.
Now let's have a look at the overall process. For the states of acceptors represent the state of Paxos algorithm process, we only present acceptors in the diagrams below. Learners and proposers are omitted. For the sake of simplicity, we suppose there are 3 acceptors and any town of them forms a majority.
Initially, all acceptors are at position 0.
Acceptors receive prepare messages thus are moved to a new position.
Suppose acceptor 1 receives an accept message with proposal number 4 and a value \(v\). Because acceptor is at position 4 so it can acknowledge this accept message and it should. According to the Paxos algorithm acceptors can acknowledge accept messages whenever they can. The green check in the diagram below indicates that the message Accept(n=4, v) is acknowledged by acceptor 1.
Suppose accept 3 receives an accept message with proposal number 1 and a value \(w\). In this case, it cannot acknowledge the accept message because it's already at position 3 and it cannot react to anything that is behind it.
Meanwhile, accept 2 receives another prepare message and it moves to position 6.
In this round, acceptor 1 receives another prepare message and is moved to position 6. Acceptor 2 receives an accept message but the proposal number is less than its current position so it doesn't do anything and the message is not acknowledged. Acceptor 3 also receives a prepare message and is now moved to position 4.
Suppose acceptor 1 receives another accept message. According to the Paxos protocol, the value included in this accept message should be \(v\) because this is the most recent accepted value and it is sent to the proposer as part of the acknowledgment of the prepare message. (Recall that every accept message has an associated prepare message.)
Acceptor 2 receives an accept message with proposal number equal to 6, which means it's the same message as the one received by acceptor 1. It can happily acknowledge the message.
Suppose acceptor 3 receives an accept message with proposal number equal to 4. This means this is the same message accepted by acceptor 1. As acceptor 3 is already at position 4, it can acknowledge the message and accept the value.
Now, the proposal number 4 with value \(v\) is accepted by a majority of the acceptors (i.e. acceptor 1 and 3) and value \(v\) becomes the consensus value.
Here the induction reasoning mentioned in the paper comes to play. Once the value \(v\) at position 4 becomes the consensus value, it is "locked". According to the Paxos protocol, the proposer needs to collect responses from a majority of acceptors. It follows that the proposer will receive at least one response from the group (acceptor 1, accept 3), of which the most recent accepted value is \(v\). Therefore, by induction, all values in the accept messages from now one contains the value \(v\) and the state of the Paxos algorithm process becomes stable in the sense that a consensus is reached and no further changes will be applied to the value.
Leader Election Algorithm
This is one of the confusing parts of the discussion. The confusion comes from the fact that
- Paxos can be used to select a leader. And
- Paxos algorithm needs a distinguished proposer, which basically is a leader.
So we have a circular dependency here. The problem is that the definition of "leader election algorithm" is not clear. Leader election algorithms can have different flavors. For example, some leader election algorithms may elect multiple leaders at one point in time while Paxos algorithm guarantees that at most one leader is elected at any time. This is a valuable property in practice. For example, some distributed architectures adopt single-writer approach and in order to improve the availability of the system, there may be backup writer nodes. In this case, we could use Paxos algorithm so that servers in the cluster can have consensus on the writer node and at any time there is at most one single writer.
With respect to the distinguished proposer in the Paxos algorithm, all it requires is eventually there is only one distinguished proposer. Having multiple proposers does not break the safety guarantee of the Paxos algorithm but it has an impact on liveness.
To elect a distinguished proposer, we could use timeout-based leader election algorithm. The diagram below illustrates the general idea:
At the beginning, node 1 is the leader and it broadcasts leader messages so that other nodes know node 1 is the leader. Later node 1 fails. Now other nodes can no longer receive the leader message. They wait for a pre-defined period of time (i.e. timeout) and they automatically become leaders. At this point, there are multiple leaders in the cluster and all of them broadcast leader messages to the cluster. In our example, node 2 and node 3 start to broadcast the leader message and at some point, both of them realize there are other leaders in the system. A resolution is needed at this point. All we need is to break the symmetry of nodes. We could use node ID or IP address. After the resolution process, only one leader remains. In our example, node 2 is elected as the leader after the resolution process and only node 2 keeps sending leader messages.
----- END -----
©2019 - 2022 all rights reserved