rdr-package {rdr}R Documentation

Rediscovery rate estimation for assessing the validation of significant findings

Description

Rediscovery rate estimation for assessing the validation of significant findings in high-throughput studies

Usage

rdr(c.t, c.v, nt0, nt1, nv0, nv1, D, p0, p1, ng)	
rdr.samplesize(RDR, c.t, c.v, nt0, nt1, cc.ratio=1, D, p0, p1, plot.it=TRUE)
rdr.est(tstat, nq, c.t, c.v, nt0, nt1, nv0, nv1)

Arguments

c.t

The critical value in the training study

c.v

The critical value in the validation study

nt0

Sample size of the control group in the training study

nt1

Sample size of the case group in the training study

nv0

Sample size of the control group in the validation study

nv1

Sample size of the case group in the validation study

D

The vector of the non-zero effect sizes, in units of the standard deviation (sigma)

p0

The proportion of the true null hypothesis (D=0)

p1

The vector of the proportions of the true alternative hypothesis. The length of p1 should be same to that of D.

ng

The number of tests

RDR

Pre-specified RDR value

cc.ratio

case-control ratio (=nv1/nv0)

tstat

ng-by-1 vector of t-statistics

nq

number of mixtures

Details

Package: rdr
Type: Package
Version: 1.0
Date: 2013-08-27

The rediscovery rate (RDR) quantifies the expected proportion of significant findings from a training sample that are replicated in a validation sample. rdr compute the RDR as a theoretical function of the significance level and power in both the training and validation studies. rdr.est estimate the RDR based on the t-statistics calculated in the training sample, assuming that these t-statistics follow a mixture distribution. For a given RDR value, rdr.samplesize compute the sample sizes in the validation study.

Value

For rdr and rdr.est,

RDR

rediscovery rate (RDR)

alpha.t

The significance level in the training study

alpha.v

The significance level in the validation study

sens.t

The sensitivity (power) in the training study

sens.v

The sensitivity (power) in the validation study

FDR

false discovery rate (FDR) in the training study

vFDR

false discovery rate (FDR) in the validation study

Vt

Expected number of false positives in the training study

Vv

Expected number of false positives in the validation study

Rt

Expected number of significant findings in the training study

Rv

Expected number of significant findings in the validation study

Additionally, for rdr.est,

D

The estimate of the non-zero effect sizes

p0

The estimate of proportion of the true null hypothesis

p1

The estimate of proportions of the true alternative hypothesis

For rdr.samplesize,

nv0

Sample size of the control group in the validation study

nv1

Sample size of the case group in the validation study

Author(s)

Andrea Ganna, Donghwan Lee, Erik Ingelsson and Yudi Pawitan

Maintainer: Donghwan Lee <donghwan.lee@ki.se>

References

Andrea Ganna, Donghwan Lee, Erik Ingelsson and Yudi Pawitan. (2013). Rediscovery rate estimation for assessing the validation of significant findings in high-throughput studies, manuscript.

Examples

## Examples for rdr R package

library(OCplus)
## Inputs
ng<-10000
nt0<-nt1<-nv0<-nv1<-20
D<-c(-1,1)
p0<-0.9;p1<-rep(0.1/2,2)
c.t<-5; c.v<-2

## Compute the RDR for given Inputs
rdr(c.t=c.t, c.v=c.v, nt0 = nt0, nt1 = nt1, nv0=nv0, nv1=nv1, 
D=D, p0=p0, p1=p1, ng=ng )

## Compute the sample size for given RDR value
rdr.samplesize(RDR=0.8, c.t=c.t, c.v=c.v, nt0 = nt0, nt1 = nt1, cc.ratio=1, 
D=D, p0=p0, p1=p1,plot.it=TRUE)
	
	
####################################
## RDR estimation from the training sample
####################################


## The modification of MAsim in OCplus
## This function allows the sample generation from the mixture model with multiple components. 

MAsimK<-function (ng = 10000, n = 10, nt0 = n, nt1 = n, D = c(1), p0 = 0.9, p1=rep(c(1-p0)/length(D),length(D)), sigma = 1) 
{
    nn = nt0+nt1
    
    K<-length(D)
    if (K!=length(p1)){print("No!! the lengths of p1 and D should be same!"); break}
    
    p1<-p1/sum(p1)*(1-p0) # normailized
    
    group = rep(c(1, 0), c(nt1, nt0))
    xdat = matrix(rnorm(nn * ng, mean = 0, sd = sigma), nrow = ng, 
        ncol = nn)
    fc = t(rmultinom(ng, 1, c(p0,p1)))

    for (k in 1:K){
    	idx<-which(fc[,(k+1)]==1)
    	xdat[idx, group == 1] = xdat[idx, group == 1] + D[k] *sigma
    attr(xdat, paste("DE",k,sep="")) = idx
    }
    attr(xdat, "DE") = which(fc[,1]==0)
    colnames(xdat) = as.character(group)
    

return(list(x=xdat,group=group,nt0=nt0,nt1=nt1))
      
}

## Inputs
ng<-10000
nt0<-nt1<-nv0<-nv1<-20
D<-c(-1,1)
p0<-0.9;p1<-rep(0.1/2,2)
c.t<-5; c.v<-2

x<-MAsimK(ng=ng,nt0=nt0,nt1=nt1,D=D,p0=p0,p1=p1,sigma=1)	
tstat <- tstatistics(x$x, grp=x$group)$tstat


# Estimation of RDR 

RDR.est<-rdr.est(tstat,nq=(length(D)+1),c.t=c.t,c.v=c.v,nt0=nt0,nt1=nt1,nv0=nv0,nv1=nv1)

RDR.est

## Compute the sample size for given RDR value
rdr.samplesize(RDR=0.8, c.t=c.t, c.v=c.v, nt0 = nt0, nt1 = nt1, cc.ratio=1, 
D=RDR.est$D, p0=RDR.est$p0, p1=RDR.est$p1,plot.it=TRUE)
	

	

[Package rdr version 1.02 Index]