rdr-package {rdr} | R Documentation |
Rediscovery rate estimation for assessing the validation of significant findings in high-throughput studies
rdr(c.t, c.v, nt0, nt1, nv0, nv1, D, p0, p1, ng) rdr.samplesize(RDR, c.t, c.v, nt0, nt1, cc.ratio=1, D, p0, p1, plot.it=TRUE) rdr.est(tstat, nq, c.t, c.v, nt0, nt1, nv0, nv1)
c.t |
The critical value in the training study |
c.v |
The critical value in the validation study |
nt0 |
Sample size of the control group in the training study |
nt1 |
Sample size of the case group in the training study |
nv0 |
Sample size of the control group in the validation study |
nv1 |
Sample size of the case group in the validation study |
D |
The vector of the non-zero effect sizes, in units of the standard deviation (sigma) |
p0 |
The proportion of the true null hypothesis (D=0) |
p1 |
The vector of the proportions of the true alternative hypothesis. The length of |
ng |
The number of tests |
RDR |
Pre-specified RDR value |
cc.ratio |
case-control ratio (= |
tstat |
|
nq |
number of mixtures |
Package: | rdr |
Type: | Package |
Version: | 1.0 |
Date: | 2013-08-27 |
The rediscovery rate (RDR) quantifies the expected proportion of significant findings from a training sample that are replicated in a validation sample.
rdr
compute the RDR as a theoretical function of the significance level and power in both
the training and validation studies.
rdr.est
estimate the RDR based on the t-statistics calculated in the training sample, assuming that these t-statistics follow a mixture distribution.
For a given RDR value, rdr.samplesize
compute the sample sizes in the validation study.
For rdr
and rdr.est
,
RDR |
rediscovery rate (RDR) |
alpha.t |
The significance level in the training study |
alpha.v |
The significance level in the validation study |
sens.t |
The sensitivity (power) in the training study |
sens.v |
The sensitivity (power) in the validation study |
FDR |
false discovery rate (FDR) in the training study |
vFDR |
false discovery rate (FDR) in the validation study |
Vt |
Expected number of false positives in the training study |
Vv |
Expected number of false positives in the validation study |
Rt |
Expected number of significant findings in the training study |
Rv |
Expected number of significant findings in the validation study |
Additionally, for rdr.est
,
D |
The estimate of the non-zero effect sizes |
p0 |
The estimate of proportion of the true null hypothesis |
p1 |
The estimate of proportions of the true alternative hypothesis |
For rdr.samplesize
,
nv0 |
Sample size of the control group in the validation study |
nv1 |
Sample size of the case group in the validation study |
Andrea Ganna, Donghwan Lee, Erik Ingelsson and Yudi Pawitan
Maintainer: Donghwan Lee <donghwan.lee@ki.se>
Andrea Ganna, Donghwan Lee, Erik Ingelsson and Yudi Pawitan. (2013). Rediscovery rate estimation for assessing the validation of significant findings in high-throughput studies, manuscript.
## Examples for rdr R package library(OCplus) ## Inputs ng<-10000 nt0<-nt1<-nv0<-nv1<-20 D<-c(-1,1) p0<-0.9;p1<-rep(0.1/2,2) c.t<-5; c.v<-2 ## Compute the RDR for given Inputs rdr(c.t=c.t, c.v=c.v, nt0 = nt0, nt1 = nt1, nv0=nv0, nv1=nv1, D=D, p0=p0, p1=p1, ng=ng ) ## Compute the sample size for given RDR value rdr.samplesize(RDR=0.8, c.t=c.t, c.v=c.v, nt0 = nt0, nt1 = nt1, cc.ratio=1, D=D, p0=p0, p1=p1,plot.it=TRUE) #################################### ## RDR estimation from the training sample #################################### ## The modification of MAsim in OCplus ## This function allows the sample generation from the mixture model with multiple components. MAsimK<-function (ng = 10000, n = 10, nt0 = n, nt1 = n, D = c(1), p0 = 0.9, p1=rep(c(1-p0)/length(D),length(D)), sigma = 1) { nn = nt0+nt1 K<-length(D) if (K!=length(p1)){print("No!! the lengths of p1 and D should be same!"); break} p1<-p1/sum(p1)*(1-p0) # normailized group = rep(c(1, 0), c(nt1, nt0)) xdat = matrix(rnorm(nn * ng, mean = 0, sd = sigma), nrow = ng, ncol = nn) fc = t(rmultinom(ng, 1, c(p0,p1))) for (k in 1:K){ idx<-which(fc[,(k+1)]==1) xdat[idx, group == 1] = xdat[idx, group == 1] + D[k] *sigma attr(xdat, paste("DE",k,sep="")) = idx } attr(xdat, "DE") = which(fc[,1]==0) colnames(xdat) = as.character(group) return(list(x=xdat,group=group,nt0=nt0,nt1=nt1)) } ## Inputs ng<-10000 nt0<-nt1<-nv0<-nv1<-20 D<-c(-1,1) p0<-0.9;p1<-rep(0.1/2,2) c.t<-5; c.v<-2 x<-MAsimK(ng=ng,nt0=nt0,nt1=nt1,D=D,p0=p0,p1=p1,sigma=1) tstat <- tstatistics(x$x, grp=x$group)$tstat # Estimation of RDR RDR.est<-rdr.est(tstat,nq=(length(D)+1),c.t=c.t,c.v=c.v,nt0=nt0,nt1=nt1,nv0=nv0,nv1=nv1) RDR.est ## Compute the sample size for given RDR value rdr.samplesize(RDR=0.8, c.t=c.t, c.v=c.v, nt0 = nt0, nt1 = nt1, cc.ratio=1, D=RDR.est$D, p0=RDR.est$p0, p1=RDR.est$p1,plot.it=TRUE)