Rediscovery rate

STEP 1. Select the type of analysis

You can decide to upload your data or use an available example (the same used in the paper). You should also specify if the outcome is dichotomous or continuous. If you use the example, the data files contain test statistics from a case-control study. See the first few lines in the output box panel.

Analysis type

Outcome type

STEP 1-a. Upload your data

You can upload data from the training set ONLY or from BOTH the training and the validation set. Uploading the validation dataset allows the calculation of the observed RDR and observed vFDR.The validation set should contain the same number of features as the training set. The data should look like one of these two scenarios: A.Just one column with t-statistics values; B.Two columns, one containing coefficients (beta) values and the other containing standard errors values. Columns should be space separated.

Training data (.txt File)

Browse...

Validation data (.txt File)

Browse...

STEP 1-b. Select data format

Here you can select how your data look like:

Header

Data type

STEP 1-c. Determine the number of components for the mixture model

You can manually select the number of components or you can check the box 'Select nq automatically' and the program will calculate the number of components automatically using AIC. See also the histogram on the right for more info.

The number of components:

Select nq automatically (using AIC)

STEP 2. Calculate the RDR or the sample size needed for a given RDR

You can fix the sample size and get the expected RDR or fix the RDR and get the expected sample size.

caculation type

STEP 2-a. Set the sample sizes

Set the sample size in the training and validation set

Sample size of control group in training study (nt0):

Sample size of case group in training study (nt1):

Sample size of control group in validation study (nv0):

Sample size of case group in validation study (nv1):

Sample size of training study (nt):

Sample size of validation study (nv):

STEP 2-a. Set the sample size and RDR

Set the sample size in the training set, targeted value of RDR and ratio of case and control in validation

Sample size of control group in training study (nt0):

Sample size of case group in training study (nt1):

Targeted value of RDR:

Ratio of case/control in validation study (case/contol):

STEP 2-a. Set the sample size and RDR

Set the sample size in the training set and targeted value of RDR

Sample size of training study (nt):

Targeted value of RDR:

STEP 3. Decide the critical values

Define the significance threshold (in -Log10 P-values) to select features for the validation study (c.t.) and to determine which features are validated (c.v.)

Significance threshold in training study (c.t):

Significance threshold in validation study (c.v):

STEP 4. Plot

Plot the RDR graph and decide the components to visualize. You can also decide if you want to visualize the measures as function of -log10 (p-value) in the training or validation set.

Plot: y-variable

RDR

vFDR

STEP 4. Plot

Plot the RDR graph as function of sample size and decide the components to visualize.

Plot: y-variable

RDR

vFDR

observed RDR

observed vFDR

Plot: x-variable

x-axis

Rediscovery rate estimation for assessing the validation of significant findings in high-throughput studies

We introduced two measures: the rediscovery rate (RDR) and the false discovery rate in a validation population (vFDR).

-The RDR is the expected proportion of findings validated among those declared significant in the training sample.

-The vFDR is the expected proportion of false validated features among all those taken forward in the validation study.

RDR and vFDR are obtained by just using the training sample. These measures can also be obtained using both the training and validation sample (if available). In this case they are defined as observed RDR and observed vFDR.

In the example (in STEP 1 select 'Use example') I select all the features from the training set with a P-value < 0.001 [c.t.=-log10(P-value)=3] to be taken forward in the validation set. In the validation set, I declare significant and validated all the features with a P-value < 0.1 [c.v.=-log10(P-value)=1].

By using these settings we expect 80% (RDR=0.80) of the feature taken forward to validation to be validated (i.e. having a P-value < c.v.). The number of false positives among the features taken forward to validation approaches 0 (vFDR=0).

Since we collected a validation set and tested all the features also in the validation set, we can calculate the observed RDR and observed vFDR. They are 0.79 and 0, respectively. These values are smilar to that estimated just using the training sample, indicating that the inference is correct.

The t-mixture approach is NOT well suited for GWAS data or any other data where the proportion of true null hypotesis is close to 1.

Reference: Andrea Ganna, Donghwan Lee, Erik Ingelsson and Yudi Pawitan. Rediscovery rate estimation for assessing the validation of significant findings in high-throughput studies, Briefings in Bioinformatics 2014.

RDR and vFDR

RDR plot

Histogram

If the density function nicely follow the histogram then the distribution of t-statistics in the training set is well fitted to the t-mixture distribution.

Rediscovery rate

STEP 1. Select the type of analysis

STEP 1-a. Upload your data

STEP 1-b. Select data format

STEP 1-c. Determine the number of components for the mixture model

STEP 2. Calculate the RDR or the sample size needed for a given RDR

STEP 2-a. Set the sample sizes

STEP 2-a. Set the sample size and RDR

STEP 2-a. Set the sample size and RDR

STEP 3. Decide the critical values

STEP 4. Plot

STEP 4. Plot

Rediscovery rate estimation for assessing the validation of significant findings in high-throughput studies

RDR and vFDR

RDR plot

Histogram

Estimates of t-Mixture