Distributed Algorithms (Work In Progress...)

  • These notes are not lecture notes!
  • They are notes that summarize the specifications, properties and pseudo-code of the main distributed algorithms used in industry, from a theoretical point of view (we look at the existence of these algorithms, not their efficiency).
  • These notes may help in developping the building blocks of communication in a distributed system.
  • The order in which the algorithms are presented follow the one given in Rachid Guerraoui's Distributed Algorithms course at EPFL.

Assumptions

  • Timing assumptions: In what follows, we consider the system to be fully asynchronous, unless stated explicitly. For example, one cannot make any time bounds assumptions.

  • Failure detection: To detect process crashes we rely on an object called a failure detector. It can be either perfect, or eventually perfect. In practice one cannot have access to a perfect failure detector under a fully asynchronous regime.

    • Perfect Failure Detector:
      • PFD1. Strong Completeness: Eventually, every process that crashes is permanently suspected by every correct process.
      • PFD2. Strong accuracy: If a process is detected by any process, then has crashed.
      Module:
          Name: PerfectFailureDetector (P)
      
      Events:
          Indication: <crash, p>: indicates that process p has crashed
      
      Properties:
          PFD1, PFD2
      
    • Eventually Perfect Failure Detector:
      • EPFD1. Strong completeness = PFD1.
      • EPFD2. Eventual Strong Accuracy: Eventually, no correct process is ever suspected by any correct process.
      Module:
          Name: EventualPerfectFailureDetector (EP)
      
      Events:
          Indication: <suspect, p>: suspects that process p has crashed
          Indication: <restore, p>: restores process p as not crashed
      
      Properties:
          EPFD1, EPFD2
      
  • Notation: Processes are denoted as , , , . Messages are denoted as , , , . If a process is correct, then it never crashes.

  • FLL1. Fair Loss: If a correct process sends a message to infinitely often, then delivers an infinite number of times.
  • FLL2. Finite Duplication: If a correct process sends message to a finite number of times, then cannot be delivered an infinite number of times by .
  • FLL3. No Creation: If some process delivers message with sender , then was sent to by .

These are assumptions on the network link we are working with. This can be seen as properties coming from UDP. The following gives the interface of the link:

Module:
    Name: FairLossLink (flp2p)

Events:
    Request: <flp2pSend, dest, m>: requests to send message m to process dest
    Indication: <flp2pDeliver, src, m>: delivers messages m sent by src

Properties:
    FLL1, FLL2, FLL3
  • SL1. Stubborn Delivery: If a correct process sends a message once to correct process , then delivers an infinite number of times.
  • SL2. No Creation: If some process delivers a message with sender , then was previously sent to by .
Module: 
    Name: StubbornLink (sp2p)
Uses:
    FairLossLink (flp2p)

Events:
    Request: <sp2pSend, dest, m>: requests to send message m to dest
    Indication: <sp2pDeliver, src, m>: delivers message m sent by src

Properties:
    SL1, SL2
upon event <sp2pSend, dest, m> do:
    while (true) do:
        trigger <flp2pSend, dest, m>;

upon event <flp2pDeliver, src, m> do:
    trigger <sp2pDeliver, src, m>;

Note that in the above, although the algorithm sends each message an infinite number of times and practically this is extremely inefficient, it still satisfies the properties defined above, so the algorithm is correct. Remember that we concentrate on the existence of algorithms satisfying our properties, not on their performance.

  • PL1. Reliable Delivery: If a correct process sends a message to a correct process , then eventually delivers .
  • PL2. No Duplication: No message is delivered by a process more than once.
  • PL3. No Creation: If some process delivers a message with sender , then was previously sent to by .
Module: 
    Name: PerfectLink (pp2p)
Uses:
    StubbornLink (sp2p)

Events:
    Request: <pp2pSend, dest, m>: requests to send message m to dest
    Indication: <pp2pDeliver, src, m>: delivers message m sent by src

Properties:
    PL1, PL2, PL3
upon event <pp2p, Init> do:
    delivered := ∅;

upon event <pp2pSend, dest, m> do:
    trigger <sp2pSend, dest, m>;

upon event <sp2pDeliver, src, m> do:
    if m ∉ delivered:
        delivered := delivered ∪ {m};
        trigger <pp2pDeliver, src, m>;

Broadcasts

Best-Effort Broadcast (BEB)

  • BEB1. Validity: If and are correct, then every message broadcast by is eventually delivered by
  • BEB2. No Duplication: No message is delivered more than once.
  • BEB3. No Creation: If a process delivers a message with sender , then was previously broadcast by .
Module: 
    Name: BestEffortBroadcast (beb)
Uses: 
    PerfectLink (pp2p)

Events:
    Request: <bebBroadcast, m>: broadcasts a message m to all processes
    Indication: <bebDeliver, src, m>: delivers a message m sent by src

Properties:
    BEB1, BEB2, BEB3
upon event <bebBroadcast, m> do:
    forall q ∈ Π do:
        trigger <pp2pSend, q, m>;

upon event <pp2pDeliver, src, m> do:
    trigger <bebDeliver, src, m>;

Reliable Broadcast (RB)

  • RB1. Validity = BEB1
  • RB2. No Duplication = BEB2
  • RB3. No Creation = BEB3
  • RB4. Agreement: For any message , if a correct process delivers , then every correct process delivers .

Uniform Reliable Broadcast (URB)

  • URB1. Validity = BEB1
  • URB2. No duplication = BEB2
  • URB3. No creation = BEB3
  • URB4. Uniform Agreement: For any message , if a process delivers , then every correct process delivers .

Causal Order Broadcast (CB)

Causal Order

causally precedes (denoted as ) if any of the following properties hold:

  • FIFO Order. Some process broadcasts before broadcasting .
  • Causal Order: Some process delivers and then broadcasts .
  • Transitivity: There is a message such that and .

Properties:

  • CB1. Validity = RB1 = BEB1
  • CB2. No Duplication = RB2 = BEB2
  • CB3. No Creation = RB3 = BEB3
  • CB4. Agreement = RB4
  • CB5. Causal Order: If then any process delivering has already delivered .

No-Waiting Version

Waiting Version

Total Order Broadcast (TOB) (Consensus-Based)

  • TOB1. Validity = RB1 = BEB1
  • TOB2. No Duplication = RB2 = BEB2
  • TOB3. No Creation = RB3 = BEB3
  • (U)TOB4. (Uniform) Agreement = (U)RB4
  • (U)TOB5. (Uniform) Total Order: Let and be any two messages. Let be any (correct) process that delivers without having delivered before. Then no (correct) process delivers before .

Consensus (CONS)

  • C1. Validity: If a value is decided, then it has been proposed.
  • (U)C2. (Uniform) Agreement: No two correct processes decide differently.
  • C3. Termination: Every correct process eventually decides.
  • C4. Integrity: Every process decides at most once.

Fail-Stop Consensus

Fail-stop == when a process fails, it crashes (no byzantine behavior) - it's an assumption

Fail-Stop Uniform Consensus

Fail-Stop Uniform Consensus With Timing Assumptions

Atomic Commit

Non-Blocking Atomic Commit (NBAC)

  • NBAC1. Uniform Agreement: No two processes decide differently.
  • NBAC2. Termination: Every correct process eventually decides.
  • NBAC3. Commit-Validity: 1 can only be decided if all processes propose 1.
  • NBAC4. Abort-Validity: 0 can only be decided if some process crashes or votes 0.

2-Phase Commit (2PC)

Blocking algorithm.

  • 2PC1. Uniform Agreement = NBAC1
  • 2PC2. Weak Termination: If a certain process is correct, then every correct process eventually decides.
  • 2PC3. Commit-Validity = NBAC3
  • 2PC4. Abort-Validity = NBAC4

Terminating Reliable Broadcast (TRB)

  • TRB1. Integrity: If a process delivers a message , then either , or is the message that was broadcast by .
  • TRB2. Validity: If the sender is correct and broadcasts a message , then eventually delivers .
  • (U)TRB3. (Uniform) Agreement: For any message , if any correct process delivers , then every correct process delivers
  • TRB4. Termination: Every correct process eventually delivers exactly one message.

Group Membership (GM)

  • GM1. Local Monotonicity: If a process installs view after , then and (the only reason to change a view is to remove a process from the set when it crashes).
  • GM2. Uniform Agreement: No two processes install views and such that .
  • GM3. Completeness: If a process crashes, then there is an integer such that every correct process installs view in which .
  • GM4. Accuracy: If some process installs a view and , then has crashed.

View-Synchronous Broadcast (VS)

  • VS1. Validity = RB1
  • VS2. No Duplication = RB2
  • VS3. No Creation = RB3
  • VS4. Agreement = RB4
  • VS5. Local Monotonicity = GM1
  • VS6. Uniform Agreement = GM2
  • VS7. Completeness = GM3
  • VS8. Accuracy = GM4
  • VS9. View Inclusion: A message is vsDelivered in the view where it is vsBroadcast.

Shared Memory (SM)

(1,N) Regular Register

  • RR1. Termination: If a correct process invokes an operation, then the operation eventually completes.
  • RR2. Validity:
    • Any read not concurrent with a write returns the last value written.
    • Reads concurrent with a write return the last value written or the value concurrently being written.

(1,N) Atomic Register

  • AR1. Termination = RR1
  • AR2. Validity = RR2
  • AR3. Ordering: If a read returns a value and a subsequent read returns a value , then the write of does not precede the write of .
Last Updated: