**Optimal Screening and Discovery of Sparse Signals with Applications **

**to**** Multistage
High-throughput Studies**

Tony Cai and Wenguang Sun

Summary. A common feature in large-scale
scientific studies is that signals are sparse and it is desirable to
significantly narrow down the focus to a much smaller subset in a sequential
manner. In this paper, we consider two related data screening problems: One is
to find the smallest subset such that it virtually contains all signals and
another is to find the largest subset such that it essentially contains only
signals. These screening problems are closely connected to but distinct from
the more conventional signal detection or multiple testing problems. We develop
data-driven screening procedures that control the error rates with near
optimality properties and study how to design the experiments efficiently to
achieve the goals in data screening. A class of new phase diagrams is developed
to characterize the fundamental limitations in simultaneous inference. An
application to multistage high-throughput studies is given to illustrate the
merits of the proposed screening methods.

The
paper and web appendix can be downloaded here.

The
R
code for implementing the proposed FPR and MDR procedures. Here
is a description
of the code.