Time, clocks, and the ordering of events in a distributed system.

In this post, we are discussing “Time, clocks, and the ordering of events in a distributed system” from Communications of the ACM 21, 7 (July 1978), 558-565 published by Leslie Lamport.

This was a very fundamental paper in the field of distributed systems. It started off as a correction in someone else’s paper but soon Leslie realized that the partial ordering could lead to a total ordering and revolutionize the field of distributed systems. The author himself is known for a lot of works before and after this paper. He was the original developer of LaTex. Following up on this paper, he also authored papers which tackled the problem of snapshots and byzantine failures in distributed systems.

The paper itself has been described as “trivial yet brilliant” at multiple ocassions. This unique property of this paper shows how easy it is to understand the idea behind this paper. Despite seeming trivial, it solved some of the most well known problems in distributed systems. Due to this, the paper won the ACM’s Distributed Computing influential award in 2000. Coming to the issue of paper itself, the paper was about ordering events in a distributed system.

The problem was that there was no good way to order events in a distributed system. One obvious way is to use timestamps but that requires syncing the clocks between different systems and even the widely adapted algorithms could result in a lag of a few millesconds. Lamport’s solution to this was to remove the clocks altogether and use logical timestamps. The idea behind logical timestamps is that we already know the ordering of any two events if they happen in the same process or those two events correspond to the sending and recieving of a message.

Using this logic, we can assign logical timestamps to events and then obtain a partial ordering of the events. Thereafter, an arbitrary ordering of the processes can help us obtain a total ordering. The issue here was that this total ordering was not unique and it depends on what criteria user chooses for ordering the processes. Thereafter, to solidify this concept, we see the concurrency problem being tackled using this proposed ordering algorithm.

Finally, Lamport suggests that we do need real clocks after all to account for dependencies outside the system but suggests how the above described algorithm could be used to minimize the latency between any two clocks. In conclusion, the paper has its fair share of issues as this approach was controversial in the years after. Though, it has been widely adapted now and this approach is even used in systems as big as AWS. Future work on this includes removing the dependency on physical clocks altogether or devising a mechanism to reduce the maximum lag further.

Get the presentation here.

Written on September 16, 2018