A central theme of data-driven networking is answering what-if questions -- what would be the impact of changing the design of a networked system, given data obtained from a real-world deployment of an existing system. For instance, trace data from past video streaming sessions may be used to analyze the impact on performance if a new bit rate choice were added (e.g., introducing a 8K resolution), an existing bit rate choice were removed (e.g., during the COVID crisis, many video publishers restricted the maximum bitrate), or a new Adaptive Bitrate (ABR) algorithm were used. With the growing interest in Edge Computing, a network designer may seek to understand the benefits of reducing round-trip time by moving servers closer to the end users given \performance traces collected from video servers in existing locations. Answering what-if questions of this nature is also known as causal reasoning. Causal reasoning considers the effect of events that did not occur while the data was being recorded, and is often used in fields such as epidemiology.
Several widely used ML tools are inadequate for causal reasoning. Many approaches (e.g., neural networks) merely capture correlations in collected data. While they work well about answering questions about existing systems, they suffer from biases when answering causal questions which pertain to changes in the system design. Other approaches such as Reinforcement Learning and Randomized Control Trials allow reasoning about a redesigned system but require active interventions that involve changing a system, and observing its performance among real users, which could be disruptive to the performance of real users. In this project, we are investigating the use of causal reasoning approaches to answer “what-if questions” using data collected from prior deployments of these systems. We are initially focusing our explorations on video streaming, given the importance of the domain, although we believe the issues are more general across networking.
Some example contributions from the project include (i) Veritas, a system for answering what-if questions related to video streaming without requiring Randomized Control Trial Data; (i) Xatu, a system that uses LSTMs to achieve high prediction accuracies for throughput in video streaming systems (a pre-requisite for the design of video streaming algorithms). Xatu achieves prediction accuracies of over 24% relative to state-of-the-art; and (ii) Oboe, a system for auto-tuning a wide range of Adaptive Bit Rate algorithms (a key building block for Internet video) to network conditions. Oboe significantly outperforms state-of-the-art approaches including a reinforcement learning method. We have released two large-scale datasets of real video sessions to the research community as part of the project.