How to Make Chord Correct

The Chord distributed hash table (DHT) is well-known and often used to implement peer-to-peer systems. Chord peers find other peers, and access their data, through a ring-shaped pointer structure in a large identifier space. Despite claims of proven correctness, i.e., eventual reachability, "Using lightweight modeling to understand Chord" (Pamela Zave; ACM SIGCOMM Computer Communication Review, 42(2):50-57, April 2012) has shown that the Chord ring-maintenance protocol is not correct under its original operating assumptions. Not one of the seven claimed invariants is actually an invariant.

Until recently it was not known whether Chord could be made correct under the same operating assumptions, and with the same efficiency. A correct and efficient version of Chord is presented in "Reasoning about identifier spaces: How to make Chord correct" (Pamela Zave; IEEE Transactions on Software Engineering, 43(12):1144-1156, December 2017, DOI 10.1109/TSE.2017.2655056). See erratum below. The contributions of the paper include:

Here is a talk about how to make and prove Chord correct.

"Experiences with protocol description" (Pamela Zave; 1st International Workshop on Rigorous Protocol Engineering, Vancouver, Canada, October 2011) explains the pitfalls of designing and documenting protocols without the use of modeling tools. The examples used are Chord and SIP (the IETF Session Initiation Protocol).

This work has also led to "A practical comparison of Alloy and Spin" (Pamela Zave; Formal Aspects of Computing 27: 239-253, 2015).

For those interested in the comparison, chordfull.als is the Alloy model of the original Chord protocol. A Promela model (for the Spin model-checker) of a version similar to the original, but with some bugs fixed, is in chordbestccr.pml and chordbestccr.c, with execution instructions in spindocu.txt.

Erratum

In "Reasoning about identifier spaces: How to make Chord correct", Section VI-B, the proof that Invariant implies OrderedSuccessorLists should read as follows, after the first two paragraphs:

From the picture, the ESL segment [x, ..., y, ..., z] must include r + 1 principal nodes. If it did not, some principal nodes would be skipped, or there would not be sufficient principals.

x cannot be a principal node, because from NoDuplicates it cannot be duplicated in the segment [y, ..., z], so it is skipped by that segment. From similar reasoning, z cannot be a principal node.

Consequently the length of [x, ..., y, ..., z] is at least r + 3. But the length of an entire ESL is r + 1, which yields a contradiction.