#### Rediscovery rate estimation for assessing the validation of significant findings in high-throughput studies

We introduced two measures: the rediscovery rate (RDR) and the false discovery rate in a validation population (vFDR).

-The RDR is the expected proportion of findings validated among those declared significant in the training sample.

-The vFDR is the expected proportion of false validated features among all those taken forward in the validation study.

RDR and vFDR are obtained by just using the training sample. These measures can also be obtained using both the training and validation sample (if available). In this case they are defined as observed RDR and observed vFDR.

In the example (in STEP 1 select 'Use example') I select all the features from the training set with a P-value < 0.001 [c.t.=-log10(P-value)=3] to be taken forward in the validation set. In the validation set, I declare significant and validated all the features with a P-value < 0.1 [c.v.=-log10(P-value)=1].

By using these settings we expect 80% (RDR=0.80) of the feature taken forward to validation to be validated (i.e. having a P-value < c.v.). The number of false positives among the features taken forward to validation approaches 0 (vFDR=0).

Since we collected a validation set and tested all the features also in the validation set, we can calculate the observed RDR and observed vFDR. They are 0.79 and 0, respectively. These values are smilar to that estimated just using the training sample, indicating that the inference is correct.

The t-mixture approach is NOT well suited for GWAS data or any other data where the proportion of true null hypotesis is close to 1.

*Reference: Andrea Ganna, Donghwan Lee, Erik Ingelsson and Yudi Pawitan. Rediscovery rate estimation for assessing the validation of significant findings in high-throughput studies, Briefings in Bioinformatics 2014.*
#### RDR and vFDR

#### RDR plot

#### Histogram

If the density function nicely follow the histogram then the distribution of t-statistics in the training set is well fitted to the t-mixture distribution.