Welcome Page

This is the wiki page for bioinformatics tools developed by our group, consisting of Stefano Calza, Trung Nghia Vu, Xia Shen, Zheng Ning, Wenjiang Deng and Yudi Pawitan.

Support: We acknowledge with thanks ongoing support from the Swedish Research Council and Cancer Fonden. In particular, most NGS data analyses are enabled by resources provided by the Swedish National Infrastructure for Computing (SNIC) in Uppsala, which is partially funded by the Swedish Research Council through grant agreement no. 2016-07213.

  1. MAX: quantification of mutant-allele expression in cancer from RNA sequencing data
  2. Circall: A fast and accurate methodology for discovery of circular RNAs from paired-end RNA-sequencing data
  3. XAEM Pipeline to estimate isoform expression from RNA-seq data. XAEM implements the design matrix and alternating EM (AEM) algorithm, which allows a fast and accurate quantification of isoform expression using multiple samples. In differential expression (DE) analysis of single-cell data XAEM substantially outperforms other methods. See Deng et al, Bioinformatics 2019
  4. FuSeq a fast method to discover fusion genes from paired-end RNA sequencing data. FuSeq discovers fusion genes based on quasi-mapping to quickly map the reads, extract initial candidates from split reads and fusion equivalence classes of mapped reads, and finally apply multiple filters and statistical tests to get the final candidates. See Vu et al, BMC Genomics 2018.
  5. SCmut: a robust statistical method for cell-level somatic mutation detection from single-cell RNA-sequencing. SCmut requies RNA-sequencing data of single cells and bulk-cell DNA-sequencing (e.g whole exome sequencing – WES) of matched samples (tumor and normal). If the DNA-sequencing data are not available, the list of somatic mutations can be used. See Vu et al Bioinformatics 2019
  6. BPSC: a beta-Poisson mixture model that can capture the bimodality of the single-cell gene expression distribution. Single-cell RNA-sequencing technology allows detection of gene expression at the single-cell level. One typical feature of the data is a bimodality in the cellular distribution even for highly expressed genes, primarily caused by a proportion of non-expressing cells. The standard and the over-dispersed gamma-Poisson models that are commonly used in bulk-cell RNA-sequencing are not able to capture this property. The BPSC model is into the generalized linear model (GLM)  framework in order to perform differential expression analyses. See Vu et al. Bioinformatics 2018.
  7. ISOP (ISOform Patterns) : a mixture model o characterize the expression patterns of pairs of isoform from the same genes and determine if isoform-level expression patterns are random or signify biological effects. The method allows to investigate single-cell isoform preference and commitment, and assess heterogeneity on the level of isoform expression. It also provides a way to assess biological effects in single-cell RNA-seq data through the isoform patterns, then discover differential-pattern genes (DP genes). See Vu et al .Bioinformatics 2018
  8. Driver Genes
    Pipeline to identify potential driver genes using integrated genomic and transcriptomics tumor and matched-normal tissue profiles. See Suo et al. Bioinformatics 2015 for an application in breast cancer, and Suo et al. Biology Direct  2018 for an application in neuroblastoma.
  9. Sequgio (Old, deprecated)
    Pipeline to estimate isoforms expression from RNA-seq data based on a model that
    does not assume uniform distribution of count within transcripts.
  10. How to Use UPPMAX
    UPPMAX is a high performance computing (HPC) system that our project uses to run heavy computation requiring many processors and/or huge memory. This page explains some basic rules and commands in UPPMAX.
  11. How to Use CINECA HPC
    CINECA is a high performance computing (HPC) system that our project uses to run heavy computation requiring many processors and/or huge memory. This page explains some basic rules and commands in CINECA servers.