Timely, Accurate and Scalable Network Management For Data Centers

Date: 10:00am, April 15, 2016
Location: CEPSR 414
Speaker:  Masoud Moshref Javadi is a Computer Engineering PhD candidate at the University of Southern California

Abstract:  Network management is critical to keeping cloud services always available, fast and cheap on a shared network with thousands of unreliable devices. Today, network operators have limited tools that provide a delayed and inaccurate view of the network. This causes hours of service disruption before resolving a failure and huge service delay, which can cost millions of dollars. The key problem is the design of these tools: They are designed at the device level instead of network-wide level and are driven by device limitations instead of high-level goals. This forces operators to always worry about limited capabilities (memory, CPU, bandwidth) at each device and compromise on timeliness and accuracy.

I have built a new class of monitoring systems in which users specify monitoring tasks or events and the desired accuracy in high-level, and the system employs novel algorithms and device-level optimizations to hide resource constraints from users. This talk focuses on two monitoring systems designed in collaboration with Google, one on switches and another at end-hosts. DREAM leverages the diminishing return of accuracy for resources and dynamic resource requirement of tasks to reach a sufficient accuracy in order to support many accurate tasks. DREAM dynamically multiplexes resources of OpenFlow switches among monitoring tasks over time and across switches and can support 80% more accurate tasks than fixed allocation. Trumpet leverages the programmability of end-hosts to aggregate and filter events in an accurate and timely fashion and is optimized for the cache architecture at CPUs. Trumpet can detect thousands of network-wide events in 10ms on 10G links and only uses a portion of a CPU core. In the future, I plan to leverage my measurement methods to explore timely and accurate network control to significantly reduce network delay and improve its availability.

Biography:   Masoud Moshref Javadi is a Computer Engineering PhD candidate at the University of Southern California. Before that, he completed his Bachelor and Master degree in Computer Engineering Department at Sharif University of Technology in Iran. His research interest includes building networked systems for managing large networks using new algorithms, abstractions and system optimizations. He built systems both for hardware switches and end-hosts and focused on Software-defined Networking (SDN). He was awarded a Google PhD fellowship in Networking in 2015.

Hosted by Professor Predrag Jelenkovic 

500 W. 120th St., Mudd 1310, New York, NY 10027    212-854-3105               
©2014 Columbia University