Prof. Yudi Pawitan

Prof. Yudi Pawitan
Department of Medical Epidemiology and Biostatistics
PO Box 281 Karolinska Institutet
17177 Stockholm, Sweden

Phone: 46-8-5248 3983 
Fax    : 46-8-314 975
Email: yudi.pawitan@ki.se

Research interest

  • Statistical genetics, microarrays, family data
  • Biostatistics
  • Likelihood inference

Downloadables:

Software:

·         Subtype (zip and tar.gz, Jan 2012): Windows and Unix file for performing molecular subtyping of cancer.

·         SPLS (zip and tar.gz, Jan 2012): Windows and Unix file for performing sparse partical least squares. Start with ?SPLS in R.

·         Sequgio (tar.gz file, Dec 2011): Unix package file for pre-processing of RNAseq data. There is vignette in the manual subdirectory. Note also that in practice you’d need a parallel or cluster computing system in order to get results in reasonable time. Sequgio has been extensively tested to run on parallel system using the multicore package.

·         NEA (zip and tar.gz file, Oct 2011): package files for the network enrichment analysis.  Start with ?nea in R. Also available: RData file containing the merged network used in the paper by Alexeyenko et al.

·         SCCA (zip and tar.gz file, June 2011): package files for sparse canonical covariance analysis. Built using R.12.0.

·         MPSS (zip and tar.gz file, Jan 2011): package files for multi-platform segmentation. Built using R2.12.0.

·         TDNenv (zip and tar.gz files, Nov 2010) package files for estimation of true discovery number (TDN) and its confidence bounds from genome-wide association studies.

·         SSPCA (zip and tar.gz files, Sept 2010) package files for the sparse PCA (Lee et al, BMC Bioinformatics 2010).

·         Cnvpack (cnvpack_0.4.6.zip and cnvpack_0.4.6.tar.gz (Nov 2010) Windows binary and unix source for finding common cnv regions.

·         Slr_0.1.9.zip (17 Feb 2011) Windows binary for performing smoothed logistic regression for CGH data. Unix source slr_0.1.9.tar.gz.

·         Mwt_0.2.6.zip (Oct09) Windows binary for Moderated Welch Test for microarray data (Demissie et al Bioinformatics 2008). Unix source: mwt_0.2.6.tar.gz.

·         FLUSH.LVS.bundle_1.3.1.zip (Sep 09) Windows-binary-installation R package to compute LVS normalization (Calza et al, BMC Bioinformatics 2008) and FLUSH filtering (Calza et al, Nucl Acid Research 2007). FLUSH has been revised to allow various background corrections. The data example in RData format: FLUSH.RData. The Unix source: FLUSH.LVS.bundle_1.3.1.tar.gz. To see the work flow: type vignette(‘FLUSH’) in R.

·         smoothseg_0.0.4.zip (Feb 2011) Windows-binary-installation R package to compute smooth-segmentation of array CGH data, including the estimation of FDR for comparative studies. The Unix version: smoothseg_0.0.4.tar.gz.

·         OCplus.zip: (version 1.3.5, 5-March 2006) Windows-installed R package to compute theoretical and estimated operating characteristics of microarray data such as FDR, sensitivity etc plus sample size requirements, fitting mixture model, and computing local fdr. To use it,

1.      you first need to download and install R, then install the package from inside R (use the Packages menu).

2.      Then type: library(OCplus) to start it.

3.      Type library(help=OCplus) to see the list of available functions, and type, for example ?TOC for help.

4.      The details are given in Pawitan et al ‘FDR, sensitivity and sample size for microarray studies’ in Bioinformatics 2005. This package is now in Bioconductor.

·         OCplus.tar.gz: (version 1.3.5, 5-March 2006) gzipped file for Unix users.

·         ProSpect (Ver 0.3.6 – Dec 11) and rsmooth (Feb 2010): zip files of Windows R packages for processing of SELDI protein spectra. You need to install both in R for Windows; additionally you also need to install ‘quadprog’ package. After running library(ProSpect), type ?ProSpect.README for a short description and an example of a complete run. NOTE: names are case sensitive.

·         ProSpect_0.3.6.tar.gz (Dec 11) and rsmooth.tar.gz (Feb 2010): Unix version of ProSpect. See also the Windows note above.

·         FLUSH.zip (version 1.1.0, Aug 2007) Windows-installed R package for gene filtering. Needs packages affy, affyPLM and quantreg. Run demo(FLUSH.tour) to start.

·         FLUSH_1.1.0.tar.gz (version 1.1.0, Aug 2007) gzipped version for Unix users

·         ELF-example.zip (June 2006): R codes and dataset to run the estimated latent FDR procedure.

Data:

·         The ProSpect package contains a subset of the spike-in data used in Tan et al (Bioinformatics 2006). This is the complete set:

1.      SpikeInFull.zip: preprocessed by Ciphergen for background correction and total ion normalization (what we used in the paper),

2.      spikein_xml.zip raw uncorrected data

3.      blank(corrected)_csv.zip and blank_xml.zip: corrected and raw blank scan data

Related to In all likelihood: Statistical modelling and inference using likelihood.  Oxford University Press, June 2001. You can download: 


Publications

 

Books

Pawitan Y: In All Likelihood: Statistical modeling and inference using likelihood. 525 pages. Oxford University Press. 2001.

Lee Y, Nelder J and Pawitan Y: Generalized linear models with random effects. 396 pages, Chapman and Hall, July 2006.

 

Articles

 

  1. Lee W, Gusnanto A, Salim A, Magnusson PK, Perelman E, Sim X, Tai E, Pawitan Y. Estimating the number of true discoveries in genome-wide association studies. Statistics in Medicine. To appear 2011.
  2. Gusnanto A, Wood HM, Pawitan Y, Rabbitts P, and Berri S. Estimating copy number alterations in cancer genomes from clinical samples using next-generation sequencing. Bioinformatics. To appear 2011.
  3. Lee D, Lee W, Lee Y, Pawitan Y. Sparse partial least-squares regression and its applications to high-throughput data analysis. Chemometrics and Intelligent Laboratory Systems. Available online 29 July 2011.
  4. Lee W, Lee D, Lee Y and Pawitan Y. Sparse Canonical Covariance Analysis for High-throughput Data.  Statistical Applications in Genetics and Molecular Biology, 2011, Vol 10: 1, Article 13.
  5. Teo SM, Pawitan Y, Kumar V, Thalamuthu A, Seielstad M, Chia KS, Salim A. Multi-platform segmentation for joint detection of copy number variants. Bioinformatics. 2011 Jun 1;27(11):1555-61.
  6. Ku CS, Teo SM, Naidoo N, Sim X, Teo YY, Pawitan Y, Seielstad M, Chia KS, Salim A.  Copy number polymorphisms in new HapMap III and Singapore populations. J Hum Genet. 2011 Aug;56(8):552-60.
  7. Frisell T, Pawitan Y, Långström N, Lichtenstein P. Heritability, Assortative Mating and Gender Differences in Violent Crime: Results from a Total Population Sample Using Twin, Adoption, and Sibling Models. Behav Genet. 2011 Jul 15. [Epub ahead of print]
  8. Teo SM, Ku CS, Naidoo N, Hall P, Chia KS, Salim A, Pawitan Y. A population-based study of copy number variants and regions of homozygosity in healthy Swedish individuals. J Hum Genet. 2011 Jul;56(7):524-33.
  9. Penney KL, Sinnott JA, Fall K, Pawitan Y, Hoshida Y, Kraft P, Stark JR, Fiorentino M, Perner S, Finn S, Calza S, Flavin R, Freedman ML, Setlur S, Sesso HD, Andersson SO, Martin N, Kantoff PW, Johansson JE, Adami HO, Rubin MA, Loda M, Golub TR, Andrén O, Stampfer MJ, Mucci LA. mRNA expression signature of Gleason grade predicts lethal prostate cancer. J Clin Oncol. 2011 Jun 10;29(17):2391-6.
  10. Ku CS, Naidoo N, Pawitan Y. Revisiting Mendelian disorders through exome sequencing. Hum Genet. 2011 Apr;129(4):351-70. 
  11. Ku CS, Naidoo N, Teo SM, Pawitan Y. Regions of homozygosity and their impact on complex diseases and traits. Hum Genet. 2011 Jan;129(1):1-15.
  12. Suo C, Salim A, Chia KS, Pawitan Y, Calza S. Modified least-variant set normalization for miRNA microarray.  RNA. 2010 Oct 27. [Epub ahead of print] 
  13. Calza S, Pawitan Y. Normalization of gene-expression microarray data. Methods Mol Biol. 2010; 673:37-52.  
  14. Lee D, Lee W, Lee Y, Pawitan Y. Super-sparse principal component analyses for high-throughput genomic data. BMC Bioinformatics. 2010 Jun 2;11:296. 
  15. Mei TS, Salim A, Calza S, Seng KC, Seng CK, Pawitan Y. Identification of recurrent regions of Copy-Number Variants across multiple individuals. BMC Bioinformatics. 2010 Mar 22;11:147. 
  16. Yip BH, Moger TA, Pawitan Y. Genetic analysis of age-at-onset traits based on case-control family data. Statistics in Medicine, to appear in 2010. 
  17. Yip BH, Reilly M, Cnattingius S, Pawitan Y. Matched Ascertainment of Informative Families for Complex Genetic Modelling. Behav Genet. 2009 Dec 24. [Epub ahead of print] 
  18. Pawitan Y, Seng KC, Magnusson PK. How many genetic variants remain to be discovered? PLoS One. 2009 Dec 2;4(12):e7969.PMID: 199565393. 
  19. Huang J, Salim A, Lei K, O'Sullivan K, Pawitan Y. Classification of array CGH data using smoothed logistic regression model. Stat Med. 2009 Dec 30;28(30):3798-810.
  20. Ku CS, Pawitan Y, Sim X, Ong RT, Seielstad M, Lee EJ, Teo YY, Chia KS, Salim A. Genomic copy number variations in three Southeast Asian populations. Hum Mutat. 2010 Jul;31(7):851-7.
  21. Ku CS, Loy EY, Salim A, Pawitan Y, Chia KS. The discovery of human genetic variations and their use as disease markers: past, present and future. J Hum Genet. 2010 Jul;55(7):403-15. Epub 2010 May 20.
  22. Ku CS, Loy EY, Pawitan Y, Chia KS. The pursuit of genome-wide association studies: where are we now? J Hum Genet. 2010 Mar 19. [Epub ahead of print]
  23. Sboner A, Demichelis F, Calza S, Pawitan Y, Setlur SR, Hoshida Y, Perner S, Adami HO, Fall K, Mucci LA, Kantoff PW, Stampfer M, Andersson SO, Varenhorst E, Johansson JE, Gerstein MB, Golub TR, Rubin MA, Andren O. Molecular sampling of prostate cancer: a dilemma for predicting disease progression. BMC Med Genomics. 2010 Mar 16; 3(1):8. [Epub ahead of print]
  24. Svensson AC, Sandin S, Cnattingius S, Reilly M, Pawitan Y, Hultman CM, Lichtenstein P. Maternal effects for preterm birth: a genetic epidemiologic study of 630,000 families. Am J Epidemiol. 2009 Dec 1;170(11):1365-72. Epub 2009 Oct 23.
  25. Lichtenstein P, Yip BH, Björk C, Pawitan Y, Cannon TD, Sullivan PF, Hultman CM. Common genetic determinants of schizophrenia and bipolar disorder in Swedish families: a population-based study. Lancet. 2009 Jan 17;373(9659):234-9.
  26. Hong MG, Pawitan Y, Magnusson PK, Prince JA. Strategies and issues in the detection of pathway enrichment in genome-wide association studies. Hum Genet. 2009 Aug; 126(2):289-301. Epub 2009 May 1.
  27. Tan CS, Salim A, Ploner A, Lehtiö J, Chia KS, Pawitan Y. Correlating gene and protein expression data using Correlated  Factor Analysis.  BMC Bioinformatics. 2009 Sep 1; 10: 272
  28. Weichselbaum RR, Ishwaran H, Yoon T, Nuyten DS, Baker SW, Khodarev N, Su AW, Shaikh AY, Roach P, Kreike B, Roizman B, Bergh J, Pawitan Y, van de Vijver MJ, Minn AJ. An interferon-related gene signature for DNA damage resistance is a predictive marker for chemotherapy and radiation for breast cancer. Proc Natl Acad Sci U S A. 2008 Nov 25;105(47):18490-5. Epub 2008 Nov 10.
  29. Demissie M; Mascialino B; Calza S; Pawitan Y. Unequal group variances in microarray data analyses. Bioinformatics. 2008 May 1;24(9):1168-74. Epub 2008 Mar 14.
  30. Calza S, Valentini D, Pawitan Y. Normalization of oligonucleotide arrays based on the least-variant set of genes. BMC Bioinformatics. 2008 Mar 5;9(1):140 [Epub ahead of print]
  31. Calza S, Raffelsberger W, Ploner A, Sahel J, Leveillard T, Pawitan Y. Filtering genes to improve sensitivity in oligonucleotide microarray data analysis. Nucleic Acids Research. 2007 Aug 15; [Epub ahead of print]
  32. Huang J, Gusnanto A, O'Sullivan K, Staaf J, Borg A, Pawitan Y. Robust smooth segmentation approach for array CGH data analysis. Bioinformatics. 2007 Sep 15;23(18):2463-9. Epub 2007 Jul 27.
  33. Moger TA, Pawitan Y, Borgan O. Case-cohort methods for survival data on families from routine registers. Stat Med. 2008 Mar 30;27(7):1062-74
  34. Yip BH, Bjork C, Lichtenstein P, Hultman CM, Pawitan Y. Covariance component models for multivariate binary traits in family data analysis. Stat Med. 2008 Mar 30;27(7):1086-1095
  35. Gusnanto A, Calza S, Pawitan Y. Identification of differentially expressed genes and false discovery rate in microarray studies. Current Opinion in Lipidology. 2007 Apr;18(2):187-93.
  36. Salim A, Pawitan Y. Model-Based Maximum Covariance Analysis for Irregularly Observed Climatological Data. Journal of Agricultural, Biological & Environmental Statistics 12: 1-24, 2007.
  37. Ha ID, Lee Y, Pawitan Y. Genetic Mixed Linear Models for Twin Survival Data. Behavior Genetics. 2007Jul;37(4):621-30. Epub 2007 Mar 31.
  38.  Perelman E, Ploner A, Calza S, Pawitan Y. Detecting differential expression in microarray data: comparison of optimal procedures. BMC Bioinformatics. 2007 Jan 26; 8:28.
  39. Pawitan Y, Calza S and Ploner A. Estimation of false discovery proportion under general dependence. Bioinformatics 22: 3025 – 3031, 2006
  40. Finding regions of significance in SELDI measurements for identifying protein biomarkers. Bioinformatics (2006): Advance Access, 27 March 2006.
  41. Multidimensional local false discovery rate for microarray studies. Bioinformatics 22: 556-565, 2006.
  42. An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects and patient survival. Proceedings of the National Academy of Science (PNAS) 2005
  43. Multi-component variance estimation from binary traits in family based-studies. Genetic Epidemiology 2005.
  44. Gene expression profiling spares early breast cancer patients from adjuvant therapy. Breast cancer research 2005
  45. Bias in the estimation of false discovery rate in microarray studies. Bioinformatics 2005.
  46. Robust ascertainment-adjusted parameter estimation. Genetic Epidemiology  2005.
  47. Using correlations to evaluate low-level analysis procedures for high-density oligonucleotide microarray data. BMC Bioifnormatics 2005.     
  48. FDR, sensitivity and sample size for microarray studies. Bioinformatics 2005.
  49. NonGaussian smoothing of short transmission scans for PET whole body studies. IEEE Transaction in Medical Imaging. 2005.
  50. Maximal covariance analysis of two spatio-temporal processes. JRSS(C): Applied Statistics 2005.
  51. Modelling infectious disease transmission with complex exposure pattern and sparse outcome data. Statistics in Medicine. 2004.
  52. Estimation of genetic and environmental factors for binary traits using family data. Statistics in Medicine. 2004.
  53. Gene expression profiling for prognosis using Cox regression. Statistics in Medicine. 2004.
  54. Analysis and prediction of BSE in Ireland. Preventive Veterinary Medicine. 2004.
  55. Maternal and paternal contributions in the risk of preeclampsia. American Journal of Medical Genetics 2004.
  56. Improved grading of breast adenocarcinomas based on genomic instability. Cancer Research 2004.
  57. Risk and protective factors for Parkinson's disease: a study in Swedish twins. Annals of Neorology 2004.
  58. Profound alterations in breast cancer incidence may reflect changes into a westernized lifestyle. International Journal of Cancer 2004.
  59. Variable selection in random calibration of near-infrared instruments: ridge regression and partial least squares regression settings. Journal of Chemometrics. 2003.
  60. Extensions of Bartlett-Lewis model for rainfall processes. Statistical Modelling. 2003.
  61. Constrained clustering of irregularly sampled spatial data. Journal of Statistical Computation and Simulation. 2003.

List of older publications

Likelihood Modelling and Inference

  1. In All Likelihood: modelling and inference using the likelihood. 2001. Oxford University Press. 
  2. Estimating variance components in generalized linear mixed models using quasi-likelihood. Journal of Statistical Computation and Simulation, 2000.
  3. Computing empirical likelihood from the bootstrap. Statistics and Probability Letters, 2000 
  4. Reminder of the fallibility of Wald statistic: likelihood explanation. American Statistician, 2000

Time series analysis

  1. Quasi-likelihood estimation of non-invertible moving average processes. Scandinavian Journal of Statistics, 2000 
  2. Consistent estimation of noncausal nonGaussian autoregressive processes. Journal of Time Series Analysis, 1999.
  3. Whittle likelihood. Encyclopaedia of Statistical Science, 1999.
  4. Change point problems. Encyclopaedia of Biostatistics, 1999
  5. Seasonal time series. Encyclopaedia of Biostatistics, 1999
  6. Coherence between time series. Encyclopaedia of Biostatistics, 1999
  7. Automatic estimation of coherence of bivariate time series. Biometrika, 1996
  8. Penalized Whittle likelihood estimate of spectral density functions. Journal of American Statistical Association, 1994
  9. Efficient bias corrected nonparametric spectral estimation. Biometrika, 1991
  10. Spectral estimation and deconvolution for a linear time series model. Journal of Time Series Analysis, 1989
  11. Modelling mortality fluctuations in Los Angeles as functions of pollution and weather effects. Environmental Research, 1988

Statistical methods in medical imaging

  1. Mixed inverse problems arising in the estimation of PET calibration factors.Journal of the Royal Statistical Society, Series C, 1998
  2. PET system calibration and attenuation correction. IEEE Transaction on Nuclear Science, 1997. 
  3. Bandwidth selection for indirect density estimation. Journal of American Statistical Association, 1996. 
  4. Multivariate density estimation by tomography. Journal of the Royal Statistical Society, Series B, 1993. 
  5. Data dependent bandwidth selection for emission computed tomography. IEEE Transactions on medical Imaging, 1993. 
  6. Reducing negativity artifacts in emission tomography. IEEE Transactions on Medical Imaging, 1993. 
  7. Discussion of ``From image deblurring to optimal investment: maximum likelihood solutions for positive linear inverse problems'' by Y. Vardi and D. Lee. Journal of the Royal Statistical Society, Series B, 1993

Biostatistics: methods and applications

  1. Association between ease of suppression of ventricular arrhythmia and survival. Circulation, 1995. Note: Comment in: Circulation 91(1): 245-7, 1995.
  2. Modelling disease markers in acquired immunodeficiency syndrome. Journal of American Statistical Association, 1993. 
  3. Identification of secondary peak in myocardial infarction onset 11 and 12 hours after awakening. Journal of American College of Cardiology, 1993. 
  4. Methods for assessing quality of life in the Cardiac Arrhythmia Suppression Trial. Quality of Life Research, 1992. 
  5. Effects of advancing age on the efficacy and side effects of antiarrhythmic drugs. Journal of the American Geriatric Society, 1992. 
  6. Modeling a marker of disease progression and onset of disease. AIDS Epidemiology: Methodological Issues, 1992. 
  7. Congestive heart failure with preserved left ventricular function. Journal of American College of Cardiology, 1991. 
  8. Events in Cardiac Arrhythmia Suppression Trial: Analysis of the placebo group. Journal of American College of Cardiology, 1991. 
  9. Prevalence, characteristics and significance of ventricular arrhythmia in the Cardiac Arrhythmia Suppression Trial. American Journal of Cardiology, 1991. 
  10. Increased risk of deaths and cardiac arrests from encainide and flecainide in patients after non-Q-wave myocardial infarction. American Journal of Cardiology, 1991. 
  11. Statistical interim monitoring of the Cardiac Arrhythmia Suppression Trial. Statistics in Medicine 1990. 
  12. Effect of encainide and flecainide on mortality in a randomized trial of arrhythmia suppression after myocardial infarction. New England Journal of Medicine, 1989. 

General

  1. Selecting random numbers for the lotto. Journal of Statistical Education, 1999.
  2. Two-sided P-values from discrete asymmetric distributions. Statistician: Journal of the Royal Statistical Society, Series D, 1997

Powered by counter.bloke.com