Causality Analysis in Large-scale Time Series Data


Yan Liu

Computer Science Department

Viterbi School of Engineering

University of Southern California





In the era of data deluge, we are confronted with large-scale time series data, i.e., sequences of observations of concerned variables over a period of time. For example, terabytes of neural activity time series data are produced to record the collective response of neurons to different stimuli; petabytes of climate and meteorological data, such as temperature, solar radiation, and precipitation, are collected over the years; and exa-bytes of social media contents are generated over time on the Internet.


A major task for time series data analysis is to uncover the temporal causal relationships among the time series. For example, in the climatology, we want to identify the factors that impact the climate patterns of certain regions. In social networks, we are interested in identification of the patterns of influence among users and how topics activate or suppress each other. Therefore developing effective and scalable data mining algorithms to uncover temporal dependency structures between time series and reveal insights from data has become a key problem in machine learning and data mining.


This tutorial aims to provide the participants a broad and comprehensive coverage on the foundations and recent developments on causality analysis for large-scale time series data. We will provide both theoretical and practical results as well as illustrative demos. In contrast to previous tutorials on causality analysis, we will focus on presenting and discussing a broad coverage of the emerging approaches of causality analysis for time series data in the context of scalability and practicability. We will also offer useful and complementary information to the CIKM community for whom prepare for pursuing this research area.


To summarize, we will

      Present a balanced review of the area of causality analysis for large-scale time series data by presenting topics of both practical and theoretical interest

      Describe state-of-the-art and emerging analysis technologies on massive large-scale time series data in order to identify the recent and future trends

      Provide a good starting point, including tutorial slides, supplementary survey paper, implementation packages and data repository with real application datasets, for researchers entering this active research area by looking at both system- and algorithmic-level developments.





Slides [PDF]


Sample dataset #1 [CSV]


Sample dataset #2 [CSV]




The tutorial will consist of two lectures and a break in the middle:


Lecture 1: Introduction to Granger Causality (90 mins)


Overview for Causality Analysis from Time Series Data (20 mins)

Granger causality (40 mins)


        Identification and learning


Known Issues of Granger causality compared with true causality analysis (30 mins)

        Non-linear extensions

        Latent factors

        Instantaneous causation


Break (30min)


Lecture 2: Alternative Approaches and New Trends (80 mins)


Practical Issues in Granger causality (30 mins)

        Time lag

        Group effect





Alternative Approaches (30 mins)

        Randomization test

        Auto-correlation and cross-correlation

        Transfer entropy


Illustration Examples (20 mins)




In addition, we plan to distribute the following materials:

      Lecture slides


      Survey paper for details on the topic

      Implementation packages

      Data repository


Format of tutorial (1/2 day or 1 day)

½ day


Prerequisite knowledge of audience


Linear algebra, basic statistical regression analysis


Relevant references


A general survey paper on the topic: