Biostat Wiki | Wiki for programs and pipelines for sequencing data analysis

This is the wiki page for bioinformatics tools developed by our group, consisting of Stefano Calza, Trung Nghia Vu, Xia Shen, Zheng Ning, Wenjiang Deng, Quang Thinh Trac, and Yudi Pawitan.

Support: We acknowledge with thanks the ongoing support from the Swedish Research Council and Cancer Fonden. In particular, most NGS data analyses are enabled by resources provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS) at UPPMAX partially funded by the Swedish Research Council through grant agreement no. 2018-05973.

DIPx: Pathway activation model for personalized prediction of drug synergy
MDREAM: A web-based tool of Monotherapy Drug Response prediction for acute myeloid leukemia patients
FuSeq_WES: A method for detection of fusion genes from DNA-sequencing data
DCSP: Discovery of Druggable Cancer-Specific Pathways
MAX: quantification of mutant-allele expression in cancer from RNA sequencing data
Scasa: A method for quantifying isoform expression from single-cell RNA-seq data from tag-based sequencing methods such as 10x Genomics
Circall: A fast and accurate methodology for discovery of circular RNAs from paired-end RNA-sequencing data
CircNetVis: A tool to analyze circular RNAs through circRNA:miRNA:mRNA interaction and circRNA: RNA-binding-protein (RBP) interaction.
AMLSubtypeSpecificDiscovery: An R-shiny tool for discovery of subtype-specific genes in Acute Myeloid Leukemia.
XAEM Pipeline to estimate isoform expression from RNA-seq data. XAEM implements the design matrix and alternating EM (AEM) algorithm, which allows a fast and accurate quantification of isoform expression using multiple samples. In differential expression (DE) analysis of single-cell data XAEM substantially outperforms other methods. See Deng et al, Bioinformatics 2019
FuSeq: a fast method to discover fusion genes from paired-end RNA sequencing data. FuSeq discovers fusion genes based on quasi-mapping to quickly map the reads, extract initial candidates from split reads and fusion equivalence classes of mapped reads, and finally apply multiple filters and statistical tests to get the final candidates. See Vu et al, BMC Genomics 2018.
SCmut: a robust statistical method for cell-level somatic mutation detection from single-cell RNA-sequencing. SCmut requires RNA-sequencing data of single cells and bulk-cell DNA-sequencing (e.g whole exome sequencing – WES) of matched samples (tumor and normal). If the DNA-sequencing data are not available, the list of somatic mutations can be used. See Vu et al Bioinformatics 2019
BPSC: a beta-Poisson mixture model that can capture the bimodality of the single-cell gene expression distribution. Single-cell RNA-sequencing technology allows detection of gene expression at the single-cell level. One typical feature of the data is a bimodality in the cellular distribution even for highly expressed genes, primarily caused by a proportion of non-expressing cells. The standard and the over-dispersed gamma-Poisson models that are commonly used in bulk-cell RNA sequencing are not able to capture this property. The BPSC model is integrated into the generalized linear model (GLM) framework in order to perform differential expression analyses. See Vu et al. Bioinformatics 2018.
ISOP (ISOform Patterns) : a mixture model o characterize the expression patterns of pairs of isoform from the same genes and determine if isoform-level expression patterns are random or signify biological effects. The method allows to investigate single-cell isoform preference and commitment, and assesses heterogeneity on the level of isoform expression. It also provides a way to assess biological effects in single-cell RNA-seq data through the isoform patterns, then discovers differential-pattern genes (DP genes). See Vu et al .Bioinformatics 2018
BRCAsubtypes: An interactive webpage to discover subtype-specific isoforms in breast cancer data.
Driver Genes
Pipeline to identify potential driver genes using integrated genomic and transcriptomics tumor and matched-normal tissue profiles. See Suo et al. Bioinformatics 2015 for an application in breast cancer, and Suo et al. Biology Direct 2018 for an application in neuroblastoma.
Sequgio (Old, deprecated)
Pipeline to estimate isoforms expression from RNA-seq data based on a model that
does not assume uniform distribution of count within transcripts.
How to Use UPPMAX
UPPMAX is a high performance computing (HPC) system that our project uses to run heavy computation requiring many processors and/or huge memory. This page explains some basic rules and commands in UPPMAX.
How to Use CINECA HPC
CINECA is a high performance computing (HPC) system that our project uses to run heavy computation requiring many processors and/or huge memory. This page explains some basic rules and commands in CINECA servers.