Over the past ten years, the Internet has grown from a small-scale research network to an immense world-wide infrastructure. But its great success is also the source of overwhelming problems; it is very hard to design scalable algorithms for, or analyze and predict the performance of such a large-scale, heterogeneous network.

Traditional methods often yield complex, non-scalable solutions to these problems. To reduce the implementation complexity, one may employ probabilistic algorithms that require only a small random sample of the ``state'' and input. When this random sample has enough information for achieving good performance, the outcome is a large reduction in implementation complexity for a small sacrifice in performance. This thesis uses this approach in two important areas of Internet research: web caching and performance prediction of networks and web farms.

The first part the thesis investigates how to improve the performance and reduce the complexity of web caching. First, a method to randomize any web cache replacement policy is designed, leading to reduced-complexity implementations at a small performance cost. The novelty of the method is a general procedure to improve the quality of a sample without increasing its size. Second, a parsimonious model for generating synthetic web traces is introduced, which provides insights for optimally designing replacement and prefetching algorithms. Third, a scalable scheme for exploiting the redundancy in web documents that change frequently over time is designed.

The second part of the thesis reduces the complexity of performance prediction of large systems by introducing a scalable, widely applicable method called SHRiNK (Small-scale High-fidelity Reproduction of Network Kinetics). The method uses a sample of the actual traffic to drive a small-scale replica of the original system. Then, it exploits a scaling law to extrapolate from the performance of the replica to that of the original system. Network simulations and web farm experiments show that SHRiNK accurately predicts the distribution of a large number of performance measures of the original system by the small-scale replica. A theoretical argument reveals that the method is widely applicable for any network topology, flow transfer protocol, and queue management scheme.

thesis in  ps, ps.gz