USC Time Series Project

A Novel Framework for Knowledge Discovery From Time Series Data In Biology and Climate Science


Dr. Yan Liu

Graduate Student

Dave Kale


Recent advances in technologies have enabled us to collect massive amount of time series data in many scientific domains, which urgently demands effective and scalable algorithms to automatically analyze the data and extract insights. This CAREER project aims to develop novel machine learning models based on Granger causality to uncover the complex dependence structures from high-dimensional time series.

The project will address three fundamental challenges of data analysis from time series data, including: (1) data-to-knowledge: theoretical foundations of causality analysis from time series data to quantify the gap between Granger causality and true causality, (2) knowledge-to-model: a unified framework to incorporate different types of domain knowledge in data analysis, and (3) data-and-model: effective solutions to important but usually overlooked practical issues, including irregularities, data snooping and scalability. The resulting algorithms will be evaluated on two real applications, i.e., gene regulatory network discovery in immune systems and climate change attribution.

The project is expected to advance the theoretical foundations of data analytic techniques for time-series data and provides a unified framework that can easily integrate domain knowledge and solve practical challenges. In addition to the core research advances, this CAREER project contributes easy-to-use education software based on workflows to teach practical machine learning to non-CS major students, researchers and practitioners. It also supports the PI in outreach activities, including developing new interdisciplinary courses, organizing workshops, offering tutorials, and organizing high-school visits. The code of computational tools and the data sets collected in the climate and biology applications will be freely disseminated to the broader research and educational community. Additional information about the project can be found at:


Under construction

Data sets

Under construction


Under construction

This material is based upon work supported by the National Science Foundation under Grant No. 1254206. Any opinions, findings and conclusions or recomendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).

Sponsor: NSFNSF