EE Associate Professor Ethan Katz-Bassett, EE Adjunct Assistant Professor Matt Calder Receive Best Short Paper Award at the 22nd ACM Internet Measurement Conference (IMC 2022)

November 29, 2022

EE Associate Professor Ethan Katz-Bassett, EE Adjunct Assistant Professor Matt Calder, CS PhD student Jiangchen Zhu, and their collaborators (former Columbia postdoctoral scholar Kevin Vermeulen and former Columbia visiting professor Italo Cunha) received the Best Short Paper Award at the 22nd ACM Internet Measurement Conference (IMC 2022) for their paper “The Best of Both Worlds: High Availability CDN Routing Without Compromising Control”. IMC is the leading research publication venue focused on understanding the Internet through measurement studies.

Content delivery networks (CDNs) improve the performance of websites and video streaming by replicating content to geographically distributed sets of web servers called sites. For example, if a user is in New York, a website will load faster if parts of the page (e.g., images) are downloaded from a site in New York. The assignment of users to appropriate sites is critical for CDNs to provide good service. However, even though there are several methods for achieving this, each has a serious limitation.

To assign users to appropriate sites, most CDNs use one of the two methods called “anycast” and “unicast with DNS-based redirection”. In anycast, sites are assigned the same IP address, causing the Internet to treat them as different routes to the same destination. In doing so, the CDN delegates site selection to decisions made by other networks using BGP (the protocol that selects Internet routes), and thus compromises its control of user-to-site mapping (and hence performance). In unicast, sites are assigned distinct IP addresses, and the CDN assigns users to specific sites by returning specific IP addresses to users through DNS (the protocol that translates domains to IP addresses). For example, when Columbia students query for www.google.com, Google might return the IP address of a site in New York. However, using DNS to steer clients to unicast addresses compromises availability in the presence of site failures because user networks and applications save the IP addresses and frequently continue to use them beyond their DNS TTL values (which specifies how long the IP address is valid). Neither of these approaches provides both precise control of user-to-site mapping and high availability in the face of failures, two fundamental goals of CDNs. “CDNs want both control of user-to-site mapping and fast failure recovery, but existing techniques have to trade off one or the other”, says co-author Matt Calder, who worked on Microsoft’s CDN for four years. 

The team's paper presented new traffic engineering techniques that the authors developed to allow CDNs to assign users to optimal sites while remaining able to quickly recover after a failure occurs – improving user experience and CDN revenue. To tackle this problem, the team relied on two unique strengths: real operational experience running a commercial CDN and a deep understanding of Internet routing. insights for the team’s solution came from realizing that 1) DNS caches recent results which improves performance but delays the propagation of updates, and 2) BGP is slow to converge after updates and failures. The team solves this problem by combining variations of existing techniques in a way that maximizes availability and traffic control at the same time. 

Demonstrating that such a solution would work on the Internet is challenging. The team utilizes the PEERING testbed, a testbed team members operate that connects to the Internet through multiple points of presence. Ethan Katz-Bassett reports, "Matt and I had an idea for a technique that combined anycast and unicast that we thought would work great, but Jiangchen demonstrated that, in the wild, it responded to failures much slower than we expected. He then investigated why and came up with new approaches that have much better tradeoffs than any existing technique." The team implemented each new technique and demonstrated via experiments on the real Internet that these techniques provide both a high level of traffic control and fast failover following site failures that existing techniques cannot match. “For future work, we are attempting to capture the fundamental trade-offs in CDN operation and test our solutions on real cloud providers,” says Jiangchen Zhu. 

The Best of Both Worlds: High Availability CDN Routing Without Compromising Control Jiangchen Zhu Columbia University, Kevin Vermeulen LAAS-CNRS, Italo Cunha Universidade Federal de Minas Gerais, Ethan Katz-Bassett Columbia University, Matt Calder Columbia University

Jiangchen Zhu is a CS PhD student at Columbia University. He is interested in Internet routing and network measurement. He has recently worked on improving routing performance, reliability and security for CDNs.

Matt Calder is adjunct assistant professor in the Columbia EE department and engineer at Meta where he works on optimizing performance for Meta's edge network. His research primarily focuses on Internet measurement and data-driven network systems on the Internet such as content delivery and traffic engineering. He received a Ph.D. in computer science from the University of Southern California.