**Causality Analysis in
Large-scale Time Series Data**

Computer
Science Department

Viterbi
School of Engineering

University
of Southern California

**Overview**

In the era of data deluge, we
are confronted with large-scale time series data, i.e., sequences of
observations of concerned variables over a period of time. For example,
terabytes of neural activity time series data are produced to record the
collective response of neurons to different stimuli; petabytes of climate and
meteorological data, such as temperature, solar radiation, and precipitation,
are collected over the years; and exa-bytes of social
media contents are generated over time on the Internet.

A major task for time series data analysis is to uncover the
temporal causal relationships among the time series. For example, in the
climatology, we want to identify the factors that impact the climate patterns
of certain regions. In social networks, we are interested in identification of
the patterns of influence among users and how topics activate or suppress each
other. Therefore developing effective and scalable data mining algorithms to
uncover temporal dependency structures between time series and reveal insights
from data has become a key problem in machine learning and data mining.

This tutorial
aims to provide the participants a broad and comprehensive coverage on the
foundations and recent developments on causality analysis for large-scale time
series data. We will provide both theoretical and practical results as well as
illustrative demos. In contrast to previous tutorials on causality analysis, we
will focus on presenting and discussing a broad coverage of the emerging
approaches of causality analysis for time series data in the context of
scalability and practicability. We will also offer useful and complementary
information to the CIKM community for whom prepare for pursuing this research
area.

To summarize,
we will

á
Present a balanced review of the area of causality analysis for
large-scale time series data by presenting topics of both practical and
theoretical interest

á
Describe state-of-the-art and emerging analysis technologies on
massive large-scale time series data in order to identify the recent and future
trends

á
Provide a good starting point, including tutorial slides,
supplementary survey paper, implementation packages
and data repository with real application datasets, for researchers entering
this active research area by looking at both system- and algorithmic-level
developments.

**Materials**

** **

Slides [PDF]

Sample dataset #1 [CSV]

Sample dataset #2 [CSV]

**Scope**

The tutorial will consist of two lectures and a break in the
middle:

*Lecture 1: Introduction to Granger Causality (90 mins)*

*¥ Overview for Causality Analysis from Time Series Data (20 mins)*

*¥ Granger causality (40 mins)*

–
*Definition*

–
*Identification and learning*

–
*Applications*

*¥ Known Issues of Granger causality compared with true causality
analysis (30 mins)*

–
*Non-linear extensions*

–
*Latent factors*

–
*Instantaneous causation *

*Break (30min) *

*Lecture 2: Alternative Approaches and New Trends (80 mins)*

*¥ Practical Issues in Granger causality (30 mins)*

–
*Time lag*

–
*Group effect*

–
*Non-stationary*

–
*Collinearity*

–
*Scalability*

*¥ Alternative Approaches (30 mins)*

–
*Randomization test*

–
*Auto-correlation and
cross-correlation*

–
*Transfer entropy*

*¥ Illustration Examples (20 mins)*

–
*Demo*

In addition, we plan to distribute the following materials:

–
Lecture slides

–
Demo

–
Survey paper for details on the topic

–
Implementation packages

–
Data repository

**Format of
tutorial (1/2 day or 1 day)**

**½
day **

**Prerequisite
knowledge of audience**

Linear algebra, basic statistical regression analysis

**Relevant
references **

A general survey
paper on the topic: http://www-bcf.usc.edu/~liu32/granger.pdf