Research Summary

We build and deploy practical systems by incorporating theory in real-world settings. Our main research focuses on how to make (network) systems more sustainable in terms of performance, scalability and reliability. We tackle these research thrusts by not only building systems internals, but also applying appropriate theoretical tools (optimization and machine learning) to inform their design. In particular, our approach to research is to begin with a problem of practical impact and then design, prototype, and deploy the systems to solve it. Over the past several years, we have taken our research results from problem discovery to industry adoption. Our research interests are broad and cover several applications areas: fog computing and networking, mobile systems and wireless networks, cloud computing, Internet protocols and multimedia, storage systems, and smart data pricing.

Research Areas

Fog Computing and Networking

Fog computing is an end-to-end horizontal architecture that distributes computing, storage, control, and network functions close to users along the cloud-to-thing continuum. This fog architecture includes the cloud, core, metro, edge, clients, and things. The fog architecture will further enable pooling, orchestrating, managing, and securing the resources and functions that are distributed in the cloud, anywhere along the cloud-to-thing continuum, and on the things or edge devices to support end-to-end services and applications.

Research challenges in fog span a wide range: from computation decomposition over heterogeneous and constrained nodes to defining the cloud-fog interface, from state consistency in dispersive computing to elastic storage over volatile substrates, from pricing and economic incentives to scalable security measures.

Crystal: A Distributed Computing Model for Fog

We have developed Crystal, a simple, loosely coupled, distributed computing framework for fog, which provides an easy abstraction for fog application development while supporting location-transparency, self-healing, auto scaling and mobility support. As a proof-of-concept demonstration, we implemented a MapReduce application on top of the Crystal framework and evaluated its performance with MapReduce on Spark.

T-Chain: An Incentive Scheme for Fog

While client functions are often more nimble and easier to evolve, individual clients may require additional incentives to participate in fog-based systems. If carefully designed, such incentives may even steer client actions towards ones that globally optimize the network. For this thrust, we recently proposed a simple, distributed, but highly efficient fairness-enforcing incentive mechanism, called triangle chaining (T-Chain), for cooperative computing.

Mobile Systems and Security

The recent PCAST (President’s Council on Advanced Science and Technology) report on spectrum sharing proposed a new spectrum architecture in which “the norm for spectrum use should be sharing, not exclusivity.” Indeed, successful collaborative wireless networks can fundamentally change how spectrum is managed, increasing the reuse of spectrum and creating networks that are more resistant to interference. Furthermore, understanding the hidden details of wireless and health signals can enable a wide range of useful applications that have never been possible before.

Distributed Scheduling for Cellular Data Transmissions

We present a fully distributed scheduling framework called CASTLE (Client-side Adaptive Scheduler That minimizes Load and Energy), which jointly optimizes the spectral efficiency of cellular networks and battery consumption of smart devices. Our comprehensive experimental results show that CASTLE’s load estimation is up to 91% accurate, and that CASTLE achieves higher spectral efficiency with less battery consumption, compared to existing centralized scheduling algorithms as well as a distributed CSMA-like protocol. [Project website] [Video]

Spoofing Emergency Alerts in 4G LTE Networks

Modern cell phones are required to receive and display alerts via the Wireless Emergency Alert (WEA) program. These alerts include AMBER alerts, severe weather alerts, and (unblockable) Presidential Alerts, intended to inform the public of imminent threats. In this work, we investigate the details of this system, and develop and demonstrate the first practical spoofing attack on Presidential Alerts. We find that with only four malicious portable base stations of a single Watt of transmit power each, almost all of a 50,000-seat stadium can be attacked with a 90% success rate. The true impact of such an attack would of course depend on the density of cell phones in range; fake alerts in crowded cities or stadiums could potentially result in cascades of panic. Fixing this problem will require a large collaborative effort between carriers, government stakeholders, and cell phone manufacturers. [Project website] [Video]

Towards Resilient Public Safety Networks

In recent years, a range of industries and enterprises has begun deploying custom-designed private cellular networks. For example, FirstNet, the public safety network (PSN) in the U.S., uses the 700 MHz Band to create a dedicated wireless (LTE) network using both fixed and mobile infrastructure as a modern replacement for voice-only land mobile radio (LMR) networks for first responders. With this high mobility, coordinating multiple independent cellular networks such that first responders can enjoy seamless, reliable, efficient communication across these networks is becoming important. We envision that a user switches between multiple independent cellular networks just like the current practice of seamless handover between cell towers in a single cellular network.
[Project website] [Video]

Phase Noise Calibration Techniques for COTS WiFi Devices

Human motion and position tracking are the core technologies enabling a wide range of useful applications, including health care, smart homes, security, gaming, and so on. As a result, there is a large body of research addressing this problem. Compared the high cost of dedicated SDR devices, WiFi NIC cards in COTS devices are much cheaper, more pervasive and easily available. In our recent work, we developed an effective phase noise calibration technique that can be broadly applicable to COTS WiFi based motion sensing.

Cloud Computing

Cloud computing has been rapidly growing and is forecast to have a market size of $112 billion in 2018. One important goal of cloud computing is to allow users to take advantage of consolidated resources (e.g., computing, networks, services, ..) without understanding the internal details of how they were built. Our group has been working on enhancing the key characteristics of cloud computing: performance, scalability and elasticity, availability and reliability, and security.

Improving Cloud Availability Using Overbooking

Ensuring high availability for applications despite unpredictable cloud component failure events is a well-known problem in managing cloud infrastructure. An often-proposed solution is to use replication or redundancy, reserving cloud resources for backup virtual machines (VMs) that can substitute for primary ones in case of a failure event. We propose to overbook the backup VMs to minimize the reduction in cloud resource utilization while still improving application availability. Realizing this solution requires us to address many question, e.g., how many backup VMs are required to guarantee a certain application availability, and then where to place these backup VMs.

FluidMem: Memory as a Service for the Datacenter

Disaggregating resources in data centers is an emerging trend. Recent work has begun to explore memory disaggregation, but suffers limitations including lack of consideration of the complexity of cloud-based deployment, including heterogeneous hardware and APIs for cloud users and operators. We present FluidMem, a complete system to realize disaggregated memory in the datacenter. [arXiv] [ElasticOS website]

Internet Protocols and Multimedia

We innovate the current state of the art on network protocols and algorithms to share limited network resources in an efficient, fair manner, while pushing the limits of network performance in heterogeneous network environments.

ELEMENT: User-level TCP latency measurement and control

We present ELEMENT, a latency diagnosis framework that decomposes end-to-end TCP latency into endhost and network delays, without requiring admin privileges. We validate that ELEMENT achieves more than 90% accuracy. To demonstrate ELEMENT’s potential impact on real-world applications, we implement a relatively simple user-level library that uses ELEMENT to minimize delays. We integrate ELEMENT with legacy TCP applications and show that it can reduce latency by up to 10 times while maintaining throughput and fairness.

ExLL: An Extremely Low-Latency Congestion Control for Mobile Cellular Networks

Since the diagnosis of severe bufferbloat in mobile cellular networks, a number of low-latency congestion control algorithms have been proposed. However, due to the need for continuous bandwidth probing in dynamic cellular channels, existing mechanisms are designed to cyclically overload the network. As a result, it is inevitable that their latency deviates from the smallest possible level (i.e., minimum RTT). To tackle this problem, we propose a new low-latency congestion control, ExLL, which can adapt to dynamic cellular channels without overloading the network. To do so, we develop two novel techniques that run on the cellular receiver: 1) cellular bandwidth inference from the downlink packet reception pattern and 2) minimum RTT calibration from the inference on the uplink scheduling interval. Furthermore, we incorporate the control framework of FAST into ExLL’s cellular specific inference techniques. Hence, ExLL can precisely control its congestion window to not overload the network unnecessarily. Our implementation of ExLL on Android smartphones demonstrates that ExLL reduces latency much closer to the minimum RTT compared to other low-latency congestion control algorithms in both static and dynamic channels of LTE networks.

Project website.

Rate Adaptation Algorithms in HTTP Adaptive Streaming

The HTTP-based Adaptive Streaming (HAS) techniques are widely used in Internet video streaming services including YouTube and Netflix. The rate adaptation algorithms, however, are not part of the standard and the details of the algorithms are left to vendors. As a result, there are many different algorithms adopted in both commercial and open source players while the detailed algorithms and their performance are barely understood. In this research, we investigate the detailed operations of the different players by code level analysis and through reverse engineering.

FLARE: A Coordinated HTTP Rate Adaptation

We propose FLARE, a coordinated HTTP rate adaptation approach that incorporates both client- and network-side information (fog-cloud interface) and guarantees coordination between network- and client-side bitrate selection. For ease of deployment, FLARE is developed as a plugin-style module that can be easily embedded on video players.


CUBIC is a congestion control protocol for TCP (transmission control protocol) and the current default TCP algorithm in Linux and Windows. The protocol modifies the linear window growth function of existing TCP standards to be a cubic function in order to improve the scalability of TCP over fast and long distance networks. It also achieves more equitable bandwidth allocations among flows with different RTTs (round trip times) by making the window growth to be independent of RTT – thus those flows grow their congestion window at the same rate.

SNN-Cache: A Practical Machine Learning-baed Caching System

An efficient caching algorithm needs to exploit the inter-relationships among requests. We introduce SNN, a practical machine learning-based relation analysis system, which can be used in different areas that require the analysis of relationships among sequenced data such as market basket analysis and online recommendation systems. We present SNN-Cache that leverages SNN to utilize the inter-relationships among sequenced requests in caching decision.

CYRUS: Towards Client-Defined Cloud Storage

We proposed a distributed, client-defined architecture that integrates multiple autonomous cloud storage providers (CSPs) into one unified cloud that allows individual clients to specify their desired performance levels and share files. We developed CYRUS (Client-defined privacY protected Reliable cloUd Service), a practical system that realizes this architecture. CYRUS ensures user privacy and reliability by scattering files into smaller pieces across multiple CSPs, so that no one CSP can read users’ data.

Smart Data Pricing

Demand for data in both wired and wireless broadband networks is doubling every year, inducing Internet Service Providers (ISPs) to use pricing both as a congestion management tool and a revenue generation model.  This changing landscape is evidenced by the elimination of flat-rate plans in favor of $10/GB usage based fees in the US and various other countries in Asia and Europe. More recently, new monetization approaches are taking off, such as the Sponsored Content plans from AT&T and other ISPs in Asia, Africa, and South America. Consequently, Smart Data Pricing (SDP) is now playing a major role in the future of mobile, broadband, and content. SDP can refer to many types of pricing plans for Internet data transmission, with the goal of creating less congestion, better quality-of-experience for users, lower CapEx/OpEx, higher revenue/profit margins, less churn, and more usage and revenue for content/app providers. It requires developing pricing models that capture the interplay between technical and economic factors, interfaces among network providers and content/app providers, field trials, and a combination of smart ideas, smart execution, and smart policy.

Project website.

AMUSE: A Practical Mobile Data Offloading System

As wireless Internet service providers (ISPs) are increasingly changing their pricing plans and deploying Wi-Fi hotspots to offload their mobile traffic, users face a complex, multi-dimensional tradeoff between cost, throughput, and delay in making their offloading decisions. To navigate this tradeoff, we develop Adaptive bandwidth Management through USer-Empowerment (AMUSE), a functional prototype of a practical, cost-aware Wi-Fi offloading system that takes into account a user’s throughput-delay tradeoffs and cellular budget constraint.

Sponsoring Mobile Data

In January 2014, AT&T introduced sponsored data to the U.S. mobile data market, allowing content providers (CPs) to subsidize users’ cost of mobile data. Sponsored data is a new data pricing model that allows CPs to subsidize this cost. It thus offers the potential to benefit multiple Internet stakeholders: users can experience lower data costs, CPs can attract more users by subsidizing their data access, and ISPs (Internet service providers) can maintain their revenue flows by charging both users and CPs for data usage. As sponsored data gains traction in industry, it is important to understand its implications. This work considers CPs’ choice of how much content to sponsor and the implications for users, CPs, and ISPs.

TUBE: Time-Dependent Pricing for Mobile Data

TUBE is an end-to-end system for offering day-ahead time-dependent pricing (TDP) to users. The basic idea is to offer lower prices in less congested periods, encouraging users to shift some of their traffic from congested to less congested periods, thus relieving the peak load on ISP networks. TUBE’s architecture takes TDP from an economic theory to a system implementation. TUBE creates a price-based feedback control loop between an ISP and its end users. On the ISP side, it computes TDP prices so as to balance the cost of congestion during peak periods with revenue losses from offering lower prices in less congested periods. On mobile devices, it provides a graphical user interface that allows users to respond to the offered prices either by themselves or using an “autopilot” mode.