Datacenter congestion control: identifying what is essential and making it practical

TitleDatacenter congestion control: identifying what is essential and making it practical
Publication TypeConference Paper
Year of Publication2019
AuthorsMushtaq, A., Mittal R., McCauley J., Alizadeh M., Ratnasamy S., & Shenker S. J.
Published inACM SIGCOMM Computer Communication Review
Date Published11/2019

Recent years have seen a slew of papers on datacenter congestion control mechanisms. In this editorial, we ask whether the bulk of this research is needed for the common case where congestion control involves hosts responding to simple congestion signals from the network and the performance goal is reducing some average measure of flow completion time. We raise this question because we find that, out of all the possible variations one could make in congestion control algorithms, the most essential feature is the switch scheduling algorithm. More specifically, we find that congestion control mechanisms that use Shortest-Remaining-Processing-Time (SRPT) achieve superior performance as long as the rate-setting algorithm at the host is reasonable. We further find that while SRPT's performance is quite robust to host behaviors, the performance of schemes that use scheduling algorithms like FIFO or Fair Queuing depend far more crucially on the rate-setting algorithm, and their performance is typically worse than what can be achieved with SRPT. Given these findings, we then ask whether it is practical to realize SRPT in switches without requiring custom hardware. We observe that approximate and deployable SRPT (ADS) designs exist, which leverage the small number of priority queues supported in almost all commodity switches, and require only software changes in the host and the switches. Our evaluations with one very simple ADS design shows that it can achieve performance close to true SRPT and is significantly better than FIFO. Thus, the answer to our basic question - whether the bulk of recent research on datacenter congestion control algorithms is needed for the common case - is no.