Yingying Fan  

Yingying Fan

Associate Professor
Data Sciences and Operations Department
Marshall School of Business
University of Southern California
Los Angeles, CA 90089

Associate Professor of Economics and Computer Science
University of Southern California

Associate Fellow
USC Dornsife Institute for New Economic
Thinking (INET)

fanyingy (at) marshall.usc.edu
Office: BRI 307B
Phone: (213) 740-9916
Fax: (213) 740-7313


Short bio [CV]

Yingying Fan is an Associate Professor in Data Sciences and Operations Department of the Marshall School of Business at the University of Southern California, Associate Professor in Departments of Economics and Computer Science at USC, and an Associate Fellow of USC Dornsife Institute for New Economic Thinking (INET). She received her Ph.D. in Operations Research and Financial Engineering from Princeton University in 2007 under the supervision of Professor Jianqing Fan. Her research interests include high-dimensional statistics, big data problems, high-dimensional classification, large-scale inference and false discovery rate control, statistical machine learning, networks, causal inference, nonparametric statistics, financial econometrics and business applications, and deep learning.

Her papers have been published in journals in statistics, economics, and computer science. She serves as an associate editor of Journal of the American Statistical Association (2014-present), Journal of Econometrics (2015-present), The Econometrics Journal (2012-present), and Journal of Multivariate Analysis (2013-2016). She is the recipient of the Royal Statistical Society Guy Medal in Bronze (2017), USC Marshall Dean's Award for Research Excellence (2017), the USC Marshall Inaugural Dr. Douglas Basil Award for Junior Business Faculty (2014), the American Statistical Association Noether Young Scholar Award (2013), NSF Faculty Early Career Development (CAREER) Award (2012), Zumberge Individual Award from USC's James H. Zumberge Faculty Research and Innovation Fund (2010), and USC Marshall Dean's Award for Research Excellence (2010), as well as a Plenary Speaker at the 2011 Institute of Mathematical Statistics Workshop on Finance, Probability, and Statistics held at Columbia University.

Representative Publications
  • Fan, Y., Demirkaya, E., Li, G. and Lv, J. (2017). RANK: large-scale inference with graphical nonlinear knockoffs. Manuscript. [PDF]

    [Power and reproducibility are key to refined scientific discoveries in big data applications with general high-dimensional nonlinear models. The recently introduced general framework of model-free knockoffs provides an effective way of controlling the fraction of false discoveries for high-dimensional nonlinear models. Can the power of feature selection procedures be retained when one intends to ensure the reproducibility? How can we establish the robustness theory for knockoffs inference under unknown covariate distribution? This paper provides some surprising insights into these questions.]

  • Fan, Y., Demirkaya, E. and Lv, J. (2017). Nonuniformity of p-values can occur early in diverging dimensions. Manuscript. [PDF]

    [The tool of p-values is fundamental to statistical inference. Conventional p-values in Gaussian linear model are valid even when dimensionality is a non-vanishing fraction of sample size, but can break down when design matrix becomes singular in higher dimensions or when error is non-Gaussian. When can conventional p-values in generalized linear models become invalid in diverging dimensions? This paper provides some surprising insights into this question.]

  • Candès, E. J., Fan, Y., Janson, L. and Lv, J. (2017). Panning for gold: Model-X knockoffs for high-dimensional controlled variable selection. Journal of the Royal Statistical Society Series B, to appear. [PDF]

    Finding the key causal factors in large-scale applications is much beyond the task of prediction. Quantifying the variability, reliability, and reproducibility of a set of discovered factors is central to enabling valid and credible scientific discoveries and investigations. How can we design a variable selection procedure for high-dimensional nonlinear models with statistical guarantees that the fraction of false discoveries can be controlled? This paper provides some surprising insights into this open question.]

  • Ren, Z., Kang, Y., Fan, Y. and Lv, J. (2017). Tuning-free heterogeneity pursuit in massive networks. Manuscript. [PDF]

    Heterogeneity is a major feature of large-scale data sets in the big data era, powering meaningful scientific discoveries through the understanding of important differences among subpopulations of interest. How can we uncover the heterogeneity among a large collection of networks in a tuning-free yet statistically optimal fashion? This paper provides some surprising insights into this question.]

  • Fan, Y., Kong, Y., Li, D. and Lv, J. (2017). Interaction pursuit with feature screening and selection. Manuscript. [PDF]

    Understanding how features interact with each other is of paramount importance in many scientific discoveries and contemporary applications. To discover important interactions among features in high dimensions, it has been a convention to resort to some structural constraints such as the heredity assumption. Yet some key causal factors can become active only when acting jointly, but not so when acting alone. How can we go beyond such structural assumptions for better flexibility in real applications? This paper provides some surprising insights into this question.]

  • Uematsu, Y., Fan, Y., Chen, K., Lv, J. and Lin, W. (2017). SOFAR: large-scale association network learning. Manuscript. [PDF]

    How are memory states with different time constants encoded in different brain regions? How can we determine the number of key memory components? Understanding the meaningful associations among a large number of responses and predictors is key to many such contemporary scientific studies and investigations. This paper provides a unified framework that enables us to probe the large-scale response-predictor association networks through different layers of latent factors with interpretability and orthogonality.]

  • Fan, Y. and Lv, J. (2016). Innovated scalable efficient estimation in ultra-large Gaussian graphical models. The Annals of Statistics 44, 2098-2126. [PDF]

    Large precision matrix estimation has long been perceived fundamentally different from large covariance matrix estimation. What if we can innovate the data matrix and convert the former into the latter? This paper provides a surprisingly simple procedure for such a purpose that comes with extreme scalability and statistical guarantees.]

  • Fan, Y., Kong, Y., Li, D. and Zheng, Z. (2015). Innovated interaction screening for high-dimensional nonlinear classification. The Annals of Statistics 43, 1243–1272. [PDF]

    Identifying key interactions among features is of fundamental importance to high-dimensional nonlinear classifications. It is conventional to construct scalable quadratic discriminant rule following the main effect screening in high dimensions, implicitly positing the heredity assumption. How can we design an interaction screening procedure for high-dimensional nonlinear classifications that is scalable but free of such a constraint for better flexibility? This paper provides some surprising insights into this question using the idea of innovating the data matrix.]

  • Fan, Y. and Lv, J. (2013). Asymptotic equivalence of regularization methods in thresholded parameter space. Journal of the American Statistical Association 108, 1044–1061. [PDF]

    There has been a long debate on whether convex or nonconvex regularization methods may dominate one another. What if both classes of methods can be close to each other when viewed from a new angle? This paper unveils some surprising insights of a small-world phenomenon into this question.]

  • Fan, Y. and Tang, C. (2013). Tuning parameter selection in high dimensional penalized likelihood. Journal of the Royal Statistical Society Series B 75, 531–552. [PDF]

    [Tuning parameter selection is crucial to high-dimensional regularization methods. It is well-known that the BIC principle can enjoy model selection consistency in low or moderate dimensions. What if the dimensionality of the feature space becomes very large? This paper provides some surprising insights into this question and unveils a new dimensionality-adaptive model selection principle with a guarantee on model selection consistency in ultra-high dimensions.]

  • Fan, J. and Fan, Y. (2008). High-dimensional classification using features annealed independence rules. The Annals of Statistics 36, 2605–2637. [PDF]

    The noise accumulation phenomenon has been well-known in the regression setting. What are the formal characterizations of such a phenomenon in the classification setting? This paper provides some surprising insights into this question and unveils that the noise accumulation in high dimensions can render a classifier as discriminative as flipping a coin, motivating independence learning with feature selection for high-dimensional classifications.]

  • Fan, J., Fan, Y. and Lv, J. (2008). High dimensional covariance matrix estimation using a factor model. Journal of Econometrics 147, 186–197. [PDF]

    The simplest framework of low-rank plus sparse structure on the covariance matrix is induced by the use of a factor model. What are the fundamental differences between large covariance matrix estimation and large precision matrix estimation in such a context? This paper provides some surprising insights into this question.]