I am Andrew and Erna Viterbi Early Career Chair in the Computer Science Department at the University of Southern California. With my colleagues Ramesh Govindan, and Wyatt Lloyd, I run a networking and systems research group.

My primary interests are in networks and distributed systems. My goal is to improve the reliability and performance of Internet services. To understand the problem space, I look to the needs of operators and providers, and I conduct detailed network measurements. Based on what I learn, I design deployable systems to improve the Internet and services that run over it. I focus on the two components needed for reliably fast Internet services: (1) the Internet must provide reliable, high performance routes for traffic, and (2) we need to architect quality services and protocols to use these routes, to take advantage of the Internet's strengths and mask its limitations. These days, I am particularly interested in the problems that affect some of the dominant players on the Internet, including cloud providers, large content providers and content delivery networks, and mobile providers. The properties of these networks allow for tailored solutions, and I want to understand the properties and design the solutions.

Selected Research Projects

Jump to research area:

Internet Measurement: Our Internet experience depends on the performance and availability of routes that cross multiple networks, but service providers and operators have little visibility into the other networks on which they rely. We have developed a number of techniques to give unprecedented visibility into these routes. Here are some highlights:
Sibyl: A Practical Internet Route Oracle (NSDI 2016): Existing tools support only one query--"what is the path from here (my host) to there (any destination)?" This limited interface makes it difficult to troubleshoot problems. Sibyl supports queries such as "find routes that traverse from Sprint to Level3 in NYC but do not pass through LA." However, most vantage points can only issue measurements at a slow rate, and so Sibyl may have never previously measured a matching path or, even if it did, the path may have changed since. To smartly allocate the constrained measurement budget, Sibyl uses previous measurements and knowledge of Internet routing to reason about which unissued measurements are likely to satisfy queries.
Don't Trust Traceroute (ACM CoNEXT Student Workshop 2013, Best Paper Award): Traceroute is the most used tool by operators and researchers to measure Internet routes. The common interpretation is that, if two traceroute measurements include different IP addresses, they represent different paths of routers. In fact, we show that, frequently, two traceroutes may show different IP addresses even while measuring the same underlying path, calling into question previous work that used traceroutes to identify path changes and load balancing.
Reverse Traceroute (NSDI 2010, Best Paper Award): Most communication on the Internet is two-way, and most paths are asymmetric, but traceroute and other existing tools only provide the path from the user to a destination, not the path back. We addressed this key limitation of traceroute by building a system to measure the path taken by an arbitrary destination to reach the user, without control of the destination.
PoiRoot (SIGCOMM 2013): It is difficult to identify the cause of an observed routing change. A change results from the complex interplay among opaque policies of multiple autonomous networks and local decisions at many routers. A decision or reconfiguration can cause rippling changes across seemingly unconnected networks. We developed PoiRoot, the first system to definitively locate the source of a route change. PoiRoot infers routing policies from observed routes, an approach we also used to measure real-world routing policies (IMC 2015) and to predict routing decisions (NSDI 2009).

Internet Routing: Despite rapid innovation in many other areas of networking, BGP, the Internet's interdomain routing protocol, has remained nearly unchanged for decades, even though it is known to contribute to a range of problems. I work to improve Internet routing, including:
PEERING: Researchers usually lack easy means to conduct realistic experiments, creating a barrier to impactful routing research. To remedy this problem, we administer a BGP testbed that allows researchers to connect to real ISPs around the world and conduct experiments that exchange routes and traffic on the Internet. We continue to expand the functionality of the testbed, including peering at one of the biggest Internet exchanges in the world and adding the ability to emulate the AS topology of your choice (HotNets 2014).
Are We One Hop Away from a Better Internet? (IMC 2015): The Internet remains hamstrung by known routing problems including failures, circuitous routes, congestion, and hijacks. Proposed improvements stumble on barriers to adoption. We identified a possible foothold for deployable solutions: much of our Internet activity centers on popular content and cloud providers, and they connect directly to networks hosting most end-users. These direct paths open the possibility of solutions that sidestep headaches of Internet-wide deployability.
LIFEGUARD (SIGCOMM 2012): Internet connectivity can be disrupted despite the existence of an underlying valid path, and our measurements show that long-lasting outages contribute significantly to unavailability. We built LIFEGUARD, a system to locate persistent Internet failures, coupled with protocol-compliant BGP techniques to force other networks to reroute around the failure.

Internet Content Delivery: Increasingly, most Internet traffic comes from a small number of content providers, content delivery networks, and cloud providers. We work on a number of projects to understand and improve these services, including:
Anycast Performance (IMC 2015): Content delivery networks (CDNs) host services at locations around the world to try to serve clients from nearby, and they can use a number of mechanisms to map a client to a particular server. One popular mechanism is anycast. We examined the performance implications of using anycast for Bing, which uses a global CDN to deliver its latency-sensitive service. We found that anycast usually performs well, but that it directs 20% of clients to suboptimal servers. We showed that the performance of these clients can be improved using a simple prediction scheme.
Mapping Google (IMC 2013): We developed techniques to locate all Google servers, as well as the mapping between servers and clients. In serendipitous timing, we started mapping daily just as Google embarked on a major change in their serving strategy, and so our ten month measurement campaign observed a sevenfold increase in the number of Google sites.
SPANStore (SOSP 2013): Many cloud providers offer similar services, but different clients may receive better performance from different providers, and the providers may have different prices for a given workload. With collaborators at UC Riverside, we developed a key-value store that presents a unified view of multiple cloud providers, then distributes an application's data in order to minimize the cost necessary to meet desired performance goals.
Peering at the Internet's Frontier (PAM 2014): While the Internet provides new opportunities in developing regions, performance lags in these regions. The performance to commonly visited destinations is dominated by the network latency, which in turn depends on the connectivity from ISPs in these regions to the locations that host popular sites and content. With collaborators at various institutions, we took a first look at ISP interconnectivity between various regions in Africa and discovered many Internet paths that should remain local but instead detour through Europe.

TCP Performance: TCP is the workhorse of the Internet, delivering most services. Perhaps surprisingly, given how much study it has received, it is still possible to modify the protocol for significant gains. We use measurements to understand TCP problems in modern settings and tailor solutions to those settings. We have a number of ongoing projects in this area. Since loss slows TCP performance, we have developed new techniques to deal with congestion and loss in different settings:
Studying Internet traffic policing: Some ISPs actively manage high volume video traffic with techniques like policing, which enforces a flow rate by dropping excess traffic. In collaboration with Google (SIGCOMM 2016), we found that loss rates average six times higher when a connection is policed, hurting video playback quality. We showed that alternatives to policing, like pacing and shaping, can achieve traffic management goals while avoiding the deleterious effects of policing. We then analyzed data collected over a six years period (USC tech report), finding that that the use of policers in developing nations has dropped over time, as Internet infrastructure became more widely deployed. Finally, we studied T-Mobile's BingeOn service for cellular users (Workshop on Internet QoE, 2016). We found that by default BingeOn throttled all video traffic but only charged user data plans for video from services not participating in BingeOn, there were no video- or screen-specific optimizations being used, and this policy can have a negative impact on user quality-of-experience. We also found that BingeOn is easily subverted to free-ride on T-Mobile.
Gentle Aggression (SIGCOMM 2013, IETF Applied Networking Research Prize): In collaboration with Google, we designed new TCP loss recovery mechanisms tailored towards the different stages of Google's split TCP architecture, resulting in a 23% average decrease in Google client latency.
DIBS (EuroSys 2014): In collaboration with Microsoft Research, we designed a loss avoidance mechanism for data centers. Since congestion in data centers is generally transient and localized, we propose that switches randomly detour traffic that encounters a hot spot, allowing the congestion to dissipate.

Mobile Web Performance: As we continue to spend more of our time accessing richer services on the Web from mobile devices, performance from these devices becomes more important and, often, fails to meet expectations.
Making the Mobile Web Fast (with Google): Before joining USC, I worked at Google on a team dedicated to making the Web fast on mobile devices. You should try out the team's data compression proxy for Chrome for Android and iOS.
Path Inflation of Mobile Traffic (PAM 2014): In collaboration with my former team at Google, my students and I classified the causes of circuitous paths between mobile clients and Web content. We now work with the MobiPerf project on ongoing related measurements.
Investigating Proxies in Cellular Networks (PAM 2015): While it is well known that cellular network operators employ middleboxes, the details of their behavior and their impact on Web performance are poorly understood. We developed a methodology to characterize the behavior of proxies deployed in the major US cellular carriers, including their (often negative) impact on performance.



I am currently funded by Google Faculty Research Awards, an M-Lab Network Research Grant, Facebook, a Comcast Innovation Fund Research Grant, an NSF CAREER Award, and by the NSF. I am very grateful for their generous support.

Brief Biography

In 2012, I completed my Ph.D. in the Department of Computer Science at the University of Washington, advised by Tom Anderson and Arvind Krishnamurthy. For my dissertation, I built systems that can help service providers improve Internet availability and performance and that are deployable on today's Internet. After that, I worked for half a year at Google's Seattle office, as part of a great team tasked with making the mobile web fast. I greatly enjoyed the opportunity and learned a lot. I joined USC in 2012, and I was named Andrew and Erna Viterbi Early Career Chair in 2016.