Network Requirements for Resource Disaggregation
In this post we are discussing about the paper Gao, Peter Xiang, et al. “Network Requirements for Resource Disaggregation.” OSDI. Vol. 16. 2016.
If we look back at how modern server systems evolved, it is easy to see their roots in the personal computers. Each server rack can braodly be seen as a powerful PC with some additional networking hardware. However, this poses significant technical challenges in two major ways: (1) A homogeneous distribution of compute and memory resources across the racks leads to sub-optimal utilization as tasks vary in their requirement, (2) All components of a signle server node do not advance technologically in the same pace, however replacing them modularly is not possible. This paper looks at an interesting alternative to current design – disaggregation – to solve the above mentioned difficulties.
In a disaggregated system, all the resources are seaparated and a high-performance interconnect connects all the resources. For example,this would mean CPUs, memory are pooled seaparately and are connected via the network. As the paper mentions, we are already seeing several impelementations of such disaggregated systems in parctice such as ‘The Machine’ by HP, Facebook’s Disaggregated Rack, Intel’s Rack Scale Architecture, SeaMicro by AMD and academic efforts such as Berkeley’s Firebox.
Initial impression of such radical rethinking of the datacentre is that, there is no way one can find an interconnect that is both high throughput and low latency to support this kind of communication requirement. This indeed was my impression as soon as I realized what disaggregation means. I believe the most important contribution of this paper is that it presents a systematic study of the requirements of a disaggregated server. The authors find that the current interconnection technologies might suffice to build a disaggregated server! Specifically, it is worthwhile noting the final numbers the paper provides for a disaggregated datacentre: network bandwidth in the range of 40-100Gbps, network latency in the range of 3-5ms.
For a legacy application to run in a disaggregated scenario, these are the main sources of latency: the transport layer, netowrk stacks of the OS, the NICs and the interconnect itself. The paper analyzes the bottlenecks and chalks out the requirements each of these layers need to meet to hit the above mentioned targets. Authors show that how the commodity solutions/ideas can be used to hit the target. To reduce the latency by transoport layer, authors show how current solutions are not sufficient and advicate using recently proposed protocols such as pHost and pFabric. To reduce the overhead due to the networking stack in the OS, authors suggest the use of RDMA. NIC is a major contributor to the latency and using CPU-NIC integration is a suggested alternative technology to reduce that.
With the above, authors demonstrate that it is indeed possible to run several chose real-life application on a disaggreagated datacentre. It is an interesting future, with workloads increasingly demanding more compute and the increasing datacentre sizes - radical rethinking of datacentre design is necessary. While there are several open questions such as optimziing workloads for disaggreagtion, scheduling difficulties, security concerns etc, when would a disaggreagated data centre become a reality at scale? Only time can answer.
Find the OSDI presentation by the authors here.