{"id":742,"date":"2018-04-19T14:48:54","date_gmt":"2018-04-19T14:48:54","guid":{"rendered":"https:\/\/jira-test.meb.ki.se\/wpsites\/biostatwiki\/?p=742"},"modified":"2023-04-17T05:56:45","modified_gmt":"2023-04-17T05:56:45","slug":"xaem","status":"publish","type":"post","link":"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/xaem\/","title":{"rendered":"XAEM"},"content":{"rendered":"<h1>Contents<\/h1>\n<p style=\"padding-left: 30px\"><a href=\"#sec1\">1. Introduction<\/a><br \/>\n<a href=\"#sec2\">2. Download and installation<\/a><br \/>\n<a href=\"#sec3\">3. XAEM: step by step\u00a0instruction and explanation<\/a><br \/>\n<a href=\"#sec3_1\">3.1 Preparation for the annotation reference<\/a><br \/>\n<a href=\"#sec3_2\">3.2 Quantification of transcripts<\/a><br \/>\n<a href=\"#sec4\">4. A practical copy-paste example of running XAEM<\/a><br \/>\n<a href=\"#sec5\">5. Dataset for differential expression (DE) analysis<\/a><\/p>\n<h1 id=\"sec1\">1. Introduction<\/h1>\n<p>This document shows how\u00a0to use\u00a0XAEM [Deng et al., 2019] to\u00a0quantify isoform expression for <strong>multiple samples<\/strong>.<\/p>\n<p><strong>What are new in version 0.1.2<\/strong><\/p>\n<ul>\n<li>Improve speed and fix bug for building CRP to work with complex annotations such as GENCODE and ENSEMBL, which usually have &gt;200,000 isoforms for hg38. The X-matrix for human Ensembl GRCh38.95 can be downloaded here: <a href=\"https:\/\/www.dropbox.com\/s\/x6a693v1y7must0\/X_matrix.RData\">X_matrix.RData<\/a><\/li>\n<\/ul>\n<p><strong>What are new in version 0.1.1<\/strong><\/p>\n<ul>\n<li>Add standard error for the estimates<\/li>\n<li>Fix a small bug when separe a CRP into more than 1 CRP due to H_thres<\/li>\n<li>Fix a small bug in function crpcount() to avoid the error when having only 1 CRP<\/li>\n<\/ul>\n<p><strong>Older versions<\/strong><\/p>\n<ul>\n<li>Code, data and instruction of most XAEM versions are available on the <span style=\"color: #ff0000\"><strong><a style=\"color: #ff0000\" href=\"https:\/\/github.com\/WenjiangDeng\/XAEM\">XAEM github site<\/a><\/strong><\/span><\/li>\n<li>Webpage of XAEM version 0.1.1: <a href=\"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/xaem_v011\/\">click here to get there<\/a><\/li>\n<li>Webpage of XAEM version 0.1.0: <a href=\"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/xaem-v0-1-0\/\">click here to get there<\/a><\/li>\n<\/ul>\n<p><strong>Software requirements<\/strong> for XAEM:<\/p>\n<ul>\n<li>R version 3.3.0 or later with installed packages:<strong> foreach<\/strong> and <strong>doParallel<\/strong><\/li>\n<li>C++11 compliant compiler (g++ &gt;= 4.7)<\/li>\n<li>XAEM is currently tested in Linux OS environment<\/li>\n<\/ul>\n<p><strong>Annotation reference:<\/strong> XAEM requires a fasta file of transcript sequences and a gtf file of transcript annotation. XAEM supports all kinds of reference and annotation for any species.<\/p>\n<p>The pre-built X-matrix for GRCh38.95 can be downloaded here: <a href=\"https:\/\/www.dropbox.com\/s\/x6a693v1y7must0\/X_matrix.RData\">X_matrix.RData<\/a><\/p>\n<p>In the XAEM paper,\u00a0 we use the UCSC hg19 annotation:<\/p>\n<ul>\n<li>Download the sequences of transcripts:<a href=\"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/XAEM_datasources\/transcripts.fa.gz\">transcripts.fa.gz<\/a><\/li>\n<li>Download the annotation of transcripts:\u00a0<a href=\"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/XAEM_datasources\/genes_annotation.gtf.gz\">genes_annotation.gtf.gz<\/a><\/li>\n<li>Download the design matrix X of this annotation:\u00a0\u00a0<a href=\"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/2022\/09\/X_matrix.rdata\">X_matrix.RData<\/a>\u00a0(X matrix is an essential object for bias correction and isoform quantification, see Section 4.1.2 for more details)<\/li>\n<\/ul>\n<pre>wget https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/XAEM_datasources\/transcripts.fa.gz\ngunzip transcripts.fa.gz\ncontent\/uploads\/sites\/4\/XAEM_datasources\/genes_annotation.gtf.gz\ngunzip genes_annotation.gtf.gz\nwget -O X_matrix.RData https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/2022\/09\/X_matrix.rdata --no-check-certificate\n<\/pre>\n<h1 id=\"sec2\">2. Download and installation<\/h1>\n<p><strong>If you use the binary version of XAEM (recommended):<\/strong><\/p>\n<ul>\n<li>Download the latest binary version\u00a0from XAEM website:<\/li>\n<\/ul>\n<pre>wget https:\/\/github.com\/WenjiangDeng\/XAEM\/releases\/download\/v0.1.2\/XAEM-binary-0.1.2.tar.gz<\/pre>\n<ul>\n<li>Uncompress to folder<\/li>\n<\/ul>\n<pre>tar -xzvf XAEM-binary-0.1.2.tar.gz<\/pre>\n<ul>\n<li>Move to the <em><strong>XAEM_home<\/strong><\/em> directory and do the configuration for\u00a0XAEM<\/li>\n<\/ul>\n<pre>cd XAEM-binary-0.1.2\nbash configure.sh\n<\/pre>\n<ul>\n<li>Add paths of lib folder and bin folder to LD_LIBRARY_PATH and PATH<\/li>\n<\/ul>\n<pre>export LD_LIBRARY_PATH=\/path\/to\/XAEM-binary-0.1.2\/lib:$LD_LIBRARY_PATH\nexport PATH=\/path\/to\/XAEM-binary-0.1.2\/bin:$PATH<\/pre>\n<p><strong>If you want to build XAEM\u00a0from sources:<\/strong><\/p>\n<ul>\n<li>Download XAEM\u00a0\u00a0and move to\u00a0XAEM<em>_home<\/em>\u00a0directory<\/li>\n<\/ul>\n<pre>wget https:\/\/github.com\/WenjiangDeng\/XAEM\/releases\/download\/v0.1.2\/XAEM-source-0.1.2.tar.gz\ntar -xzvf XAEM-source-0.1.2.tar.gz\ncd XAEM-source-0.1.2\nbash configure.sh<\/pre>\n<ul>\n<li>XAEM requires information of flags from Sailfish including DFETCH_BOOST, DBOOST_ROOT, DTBB_INSTALL_DIR and DCMAKE_INSTALL_PREFIX. Please refer to the <a href=\"https:\/\/sailfish.readthedocs.io\/en\/master\/building.html#installation\">Sailfish website<\/a> for more details of these flags.<\/li>\n<li>Do installation by the following command:<\/li>\n<\/ul>\n<pre>DBOOST_ROOT=\/path\/to\/boostDir\/ DTBB_INSTALL_DIR=\/path\/to\/tbbDir\/ DCMAKE_INSTALL_PREFIX=\/path\/to\/expectedBuildDir bash install.sh<\/pre>\n<ul>\n<li>After the installation is finished, remember to add the paths of lib folder and bin folder to LD_LIBRARY_PATH and PATH<\/li>\n<\/ul>\n<pre>export LD_LIBRARY_PATH=\/path\/to\/expectedBuildDir\/lib:$LD_LIBRARY_PATH\nexport PATH=\/path\/to\/expectedBuildDir\/bin:$PATH<\/pre>\n<p><strong><em>Do not forget to replace &#8220;\/path\/to\/&#8221; by your local path.<\/em><\/strong><\/p>\n<h1 id=\"sec3\">3. XAEM: step by step\u00a0instruction and explanation<\/h1>\n<p>XAEM mainly contains the following steps:<\/p>\n<ul>\n<li><em>Preparation for the annotation reference:<\/em>\u00a0 to process the annotation of transcripts to get essential information for transcript quantification. This step includes 1) index transcript sequences and 2) Construct the design matrix X.<\/li>\n<li><em>Quantification of transcripts:<\/em>\u00a0 to get input from multiple RNA-seq samples to do quasi-mapping, generate data for quantifying transcript expression.\u00a0This step consists of 1) generate equivalence class table; 2) create Y count matrix and 3) estimate transcript expression using AEM algorithm to update the X matrix and transcript (isoform) expression.<\/li>\n<\/ul>\n<h3 id=\"sec3_1\">3.1 Preparation for the annotation reference<\/h3>\n<h4 id=\"sec3_1_1\">3.1.1 Indexing transcripts<\/h4>\n<p>Using TxIndexer to index the transcript sequences in the reference file (transcripts.fa). For example:<\/p>\n<pre>wget https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/XAEM_datasources\/transcripts.fa.gz\ngunzip transcripts.fa.gz\nTxIndexer -t \/path\/to\/transcripts.fa -o \/path\/to\/TxIndexer_idx<\/pre>\n<h4 id=\"sec3_1_2\">\u00a03.1.2 Construction of the X\u00a0matrix (design matrix)<\/h4>\n<p>This step constructs the X matrix required\u00a0by the XAEM pipeline. For users working with human\u00a0annotation of UCSC hg19\u00a0 the X matrix can be downloaded here: <a href=\"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/2022\/09\/X_matrix.rdata\">X_matrix.rdata<\/a> (need to rename the file to X_matrix.RData).<\/p>\n<p>Given file\u00a0<strong><em>transcripts.fa<\/em><\/strong> containing the transcript sequences of an annotation reference, we construct the design matrix as follows.<\/p>\n<ul>\n<li>a) Generate simulated RNA-seq data using the R-package &#8220;polyester&#8221;<\/li>\n<\/ul>\n<pre>## R-packages of \"polyester\" and \"Biostrings\" are required\nRscript XAEM_home\/R\/genPolyesterSimulation.R \/path\/to\/transcripts.fa \/path\/to\/design_matrix<\/pre>\n<ul>\n<li>b) Run <em>GenTC<\/em> to generate Transcript Cluster (TC) using the simulated data. GenTC\u00a0will generate an eqClass.txt file as the input for next step.<\/li>\n<\/ul>\n<pre>GenTC -i \/path\/to\/TxIndexer_idx -l IU -1 \/path\/to\/design_matrix\/sample_01_1.fasta -2 \/path\/to\/design_matrix\/sample_01_2.fasta -p 8 -o \/path\/to\/design_matrix<\/pre>\n<ul>\n<li>c) Create a design matrix using\u00a0buildCRP.R. The parameter setting for this function is as follows.\n<ul>\n<li><em><strong>in<\/strong><\/em>: the input file (eqClass.txt) obtained from the last step.<\/li>\n<li><em><strong>out<\/strong><\/em>: the output file name (*.RData) which the design matrix will be saved.<\/li>\n<li><em><strong>H<\/strong><\/em>: (default H=0.025) is the threshold to filter false positive neighbors in each X matrix. (Please refer to the XAEM paper, Section 2.2.1)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<pre>Rscript XAEM_home\/R\/buildCRP.R in=\/path\/to\/design_matrix\/eqClass.txt out=\/path\/to\/design_matrix\/X_matrix.RData H=0.025<\/pre>\n<h3 id=\"sec3_2\">\u00a03.2 Quantification of transcripts<\/h3>\n<p>Suppose we already created a working directory &#8220;<em><strong>XAEM_project<\/strong><\/em>&#8221; (\/path\/to\/XAEM_project\/) for quantification of transcripts.<\/p>\n<h4 id=\"sec3_2_1\">\u00a03.2.1 Generating the equivalence class table<\/h4>\n<p>The command to generate equivalence class table for each sample is similar to &#8220;sailfish quant&#8221;. \u00a0<em>For example, we want to run XAEM for sample1 and sample2 with 4\u00a0cpus:<\/em><\/p>\n<pre>XAEM -i \/path\/to\/TxIndexer_idx -l IU -1 s1_read1.fasta -2 s1_read2.fasta -p 4 -o \/path\/to\/XAEM_project\/sample1\nXAEM -i \/path\/to\/TxIndexer_idx -l IU -1 s2_read1.fasta -2 s2_read2.fasta -p 4 -o \/path\/to\/XAEM_project\/sample2\n<\/pre>\n<ul>\n<li>If the data is compressed in gz format. We can combine with gunzip for a decompression on-fly:<\/li>\n<\/ul>\n<pre>XAEM -i \/path\/to\/TxIndexer_idx -l IU -1 &lt;(gunzip -c s1_read1.gz) -2 &lt;(gunzip -c s1_read2.gz) -p 4 -o \/path\/to\/XAEM_project\/sample1\nXAEM -i \/path\/to\/TxIndexer_idx -l IU -1 &lt;(gunzip -c s2_read1.gz) -2 &lt;(gunzip -c s2_read2.gz) -p 4 -o \/path\/to\/XAEM_project\/sample2\n<\/pre>\n<h5 id=\"sec3_2_2\">3.2.2 Creating Y count matrix<\/h5>\n<p>After running XAEM there will be the output of the equivalence class table for <strong>multiple\u00a0samples<\/strong>. We then create the Y count matrix.\u00a0For example, if we want to run XAEM parallelly using\u00a08 cores, the command is:<\/p>\n<pre>Rscript Create_count_matrix.R workdir=\/path\/to\/XAEM_project core=8<\/pre>\n<h4 id=\"sec3_2_3\">3.2.3 Updating the X matrix and transcript expression using AEM algorithm<\/h4>\n<p>When finish the construction of Y count matrix, we use the AEM algorithm to update the X matrix.\u00a0The updated X matrix is then used to estimate the transcript (isoform) expression. The command is as follows.<\/p>\n<pre>Rscript AEM_update_X_beta.R workdir=\/path\/to\/XAEM_project core=8 design.matrix=X_matrix.RData isoform.out=XAEM_isoform_expression.RData paralog.out=XAEM_paralog_expression.RData merge.paralogs=FALSE isoform.method=average remove.ycount=TRUE<\/pre>\n<p><strong>Parameter setting<\/strong><\/p>\n<ul>\n<li><em><strong>workdir<\/strong><\/em>: the path to working directory<\/li>\n<li><em><strong>core<\/strong><\/em>: the number of cpu cores for parallel computing<\/li>\n<li><em><strong>design.matrix<\/strong><\/em>: the path to the design matrix<\/li>\n<li><em><strong>isoform.out<\/strong><\/em> (default=XAEM_isoform_expression.RData):\u00a0 the output contains the estimated expression of <strong>individual transcripts, where the paralogs are split into separate isoforms<\/strong>. This file contains two objects:\u00a0<strong>isoform_count<\/strong> and <strong>isoform_tpm<\/strong> for estimated counts and normalized values (TPM). The expression of the individual isoforms is calculated with the corresponding setting of parameter &#8220;<em>isoform.method<\/em>&#8221; below.<\/li>\n<li><em><strong>isoform.method<\/strong><\/em> (default=average):\u00a0 to report the expression of the individual members of a paralog as the average\u00a0or total expression of the paralog set (value=average\/total).<\/li>\n<li><em><strong>paralog.out<\/strong><\/em> (default=XAEM_paralog_expression.RData): the output contains the estimated expression of<strong>\u00a0merged paralogs<\/strong>. This file consists of two objects: <strong>XAEM_count<\/strong> and <strong>XAEM_tpm<\/strong>\u00a0 for the estimated counts and normalized values (TPM). The standard error of the estimate is supplied in object <strong>XAEM_se<\/strong> stored in <em>*.standard_error.RData.<\/em><\/li>\n<li><em><strong>merge.paralogs<\/strong><\/em> (default=TRUE) <strong>(*)<\/strong>: the parameter to turn on\/off (value=TRUE\/FALSE) the paralog merging in XAEM. Please see the details of how to use this parameter in the note at the end of this section.<\/li>\n<li><em><strong>remove.ycount<\/strong><\/em> (default=TRUE): to clean all data of Ycount after use.<\/li>\n<\/ul>\n<p>The output in this step will be saved in XAEM_isoform_expression.RData, which is the TPM value and raw read counts of multiple samples.<\/p>\n<p><strong>Note: (*)\u00a0<\/strong>In XAEM pipeline we provide this parameter (merge.paralog) to merge or not merge the paralogs within the updated X matrix (please see XAEM\u00a0paper Section 2.2.3 and Section 2.3).\u00a0 <em><strong>Turning on (default) the paralog merging step produces a more accurate estimation. Turning off this step can produce the same sets of isoforms between different projects.<\/strong><\/em><\/p>\n<h1 id=\"sec4\">4. A practical copy-paste example of running XAEM<\/h1>\n<p>This section presents a tutorial to run XAEM pipeline with a toy example. Suppose that input data contain\u00a0<strong><em>two RNA-seq samples<\/em><\/strong> and server supplies <strong><em>4<strong>\u00a0<\/strong>CPUs<\/em><\/strong> for computation. We can test XAEM by just\u00a0copy and paste of the example commands.<\/p>\n<ul>\n<li>Download the binary version of XAEM and do configuration<\/li>\n<\/ul>\n<pre># Create a working folder\nmkdir XAEM_example\ncd XAEM_example\n# Download the binary version of XAEM\nwget https:\/\/github.com\/WenjiangDeng\/XAEM\/releases\/download\/v0.1.2\/XAEM-binary-0.1.2.tar.gz\n\n# Configure the tool\ntar -xzvf XAEM-binary-0.1.2.tar.gz\ncd XAEM-binary-0.1.2\nbash configure.sh\n\n# Add the paths to system\nexport LD_LIBRARY_PATH=$PWD\/lib:$LD_LIBRARY_PATH\nexport PATH=$PWD\/bin:$PATH\ncd ..\n<\/pre>\n<ul>\n<li>Download\u00a0 annotation files and index the transcripts<\/li>\n<\/ul>\n<pre>## download annotation files\n# Download the design matrix for the human UCSC hg19 annotation \nwget -O X_matrix.RData https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/2022\/09\/X_matrix.rdata --no-check-certificate\n\n# Download the fasta of transcripts in the human UCSC hg19 annotation \nwget https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/XAEM_datasources\/transcripts.fa.gz\ngunzip transcripts.fa.gz\n\n## Run XAEM indexer\nTxIndexer -t transcripts.fa -o TxIndexer_idx\n<\/pre>\n<ul>\n<li>If using GRCh38.95, download the corresponding annotation files (Homo_sapiens.GRCh38.95.cdna.all.fa and Homo_sapiens.GRCh38.95.gtf) from Ensembl (http:\/\/jan2019.archive.ensembl.org\/Homo_sapiens\/Info\/Index) and the X-matrix of GRCh38.95 from here: https:\/\/www.dropbox.com\/s\/x6a693v1y7must0\/X_matrix.RData<\/li>\n<\/ul>\n<ul>\n<li>Download the RNA-seq data of two samples: sample1 and sample2<\/li>\n<\/ul>\n<pre>## Download input RNA-seq samples\n# Create a XAEM project to save the data\nmkdir XAEM_project\ncd XAEM_project\n\n# Download the RNA-seq data\nwget https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/XAEM_datasources\/sample1_read1.fasta.gz\nwget https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/XAEM_datasources\/sample1_read2.fasta.gz\nwget https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/XAEM_datasources\/sample2_read1.fasta.gz\nwget https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/XAEM_datasources\/sample2_read2.fasta.gz\ncd ..\n<\/pre>\n<ul>\n<li>Generate the equivalence class tables for these samples<\/li>\n<\/ul>\n<pre># Number of CPUs\nCPUNUM=4\n\n# Process for sample 1\nXAEM -i TxIndexer_idx -l IU -1 &lt;(gunzip -c XAEM_project\/sample1_read1.fasta.gz) -2 &lt;(gunzip -c XAEM_project\/sample1_read2.fasta.gz) -p $CPUNUM -o XAEM_project\/sample1\n\n# Process for sample 2\nXAEM -i TxIndexer_idx -l IU -1 &lt;(gunzip -c XAEM_project\/sample2_read1.fasta.gz) -2 &lt;(gunzip -c XAEM_project\/sample2_read2.fasta.gz) -p $CPUNUM -o XAEM_project\/sample2\n<\/pre>\n<ul>\n<li>Create Y count matrix<\/li>\n<\/ul>\n<pre><strong># Note: R packages \"foreach\"\u00a0and\u00a0\"doParallel\" are required for parallel computing<\/strong>\nRscript $PWD\/XAEM-binary-0.1.2\/R\/Create_count_matrix.R workdir=$PWD\/XAEM_project core=$CPUNUM design.matrix=$PWD\/X_matrix.RData\n<\/pre>\n<ul>\n<li>Estimate isoform expression using AEM algorithm<\/li>\n<\/ul>\n<pre>Rscript $PWD\/XAEM-binary-0.1.2\/R\/AEM_update_X_beta.R workdir=$PWD\/XAEM_project core=$CPUNUM design.matrix=$PWD\/X_matrix.RData isoform.out=XAEM_isoform_expression.RData paralog.out=XAEM_paralog_expression.RData\n<\/pre>\n<p>The outputs are stored in the folder of &#8220;XAEM_project&#8221; including<em>\u00a0XAEM_isoform_expression.RData<\/em> and\u00a0<em>XAEM_paralog_expression.RData<\/em>.<\/p>\n<h1 id=\"sec5\">5. Dataset for differential expression (DE) analysis<\/h1>\n<p>In XAEM paper we have used the RNA-seq data from the breast cancer cell line (MDA-MB-231) for DE analysis. Since the original data was generated by our collaborators and not published yet, we provide the equivalence class table by running the read-alignment tool Rapmap, which is the same mapper of Salmon and totally independent from XAEM algorithm. We also prepare the R scripts and the guide to replicate the DE analysis results in the paper.<\/p>\n<p>In this section, we present an instruction to download the data and run the scripts. We try to build the pipeline following the copy-paste manner in shell, but the part of R scripts must be run in R console.<\/p>\n<h2>5.1 Download the R-scripts and the design matrix<\/h2>\n<p>This step is to download the R-scripts, change directory to the folder containing the R-scripts and download the design matrix.<\/p>\n<pre># Download R-scripts\nwget https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/XAEM_datasources\/brca_singlecell\/RDR_brca_singlecell.zip\nunzip RDR_brca_singlecell.zip\ncd RDR_brca_singlecell\n\n# Download the design matrix\nwget -O X_matrix.RData https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/2022\/09\/X_matrix.rdata --no-check-certificate\n<\/pre>\n<h2>5.2 Run XAEM from the\u00a0equivalence class tables which are the output of read-alignment tool Rapmap<\/h2>\n<p>Download the data of\u00a0equivalence classes<\/p>\n<pre># Download the table of equivalence classes of the single cells which are the output of read-alignment tool Rapmap\n\nwget https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/XAEM_datasources\/brca_singlecell\/brca_singlecell_eqclassDir.zip\nunzip brca_singlecell_eqclassDir.zip\n<\/pre>\n<p>Run XAEM with the input from the equivalence class table using the R-codes below.\u00a0<em><strong>Note:<\/strong><\/em>\u00a0\u00a0This step takes about 2 hours using a personal computer with 4 CPUs. Users can consider skipping this step and downloading the available XAEM results for the downstream analysis.<\/p>\n<pre># set the project path\nprojPath=getwd();\nsetwd(projPath)\nsource(\"collectDataOfXAEM.R\")\n<\/pre>\n<p>If users want to download the available XAEM results<\/p>\n<pre># Download the available results of XAEM\n\nwget https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/XAEM_datasources\/brca_singlecell\/XAEM_results.zip\nunzip XAEM_results.zip\n<\/pre>\n<h2>5.3\u00a0Differential-expression analysis of XAEM and other methods<\/h2>\n<p>Download the data of cufflinks and salmon. These files contain the read-count data of methods with and without using bias correction.<\/p>\n<pre># Download the results of cufflinks\nwget https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/XAEM_datasources\/brca_singlecell\/cufflinks_results.zip\nunzip cufflinks_results.zip\n\n# Download the results of salmons\nwget https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/XAEM_datasources\/brca_singlecell\/salmon_results.zip\nunzip salmon_results.zip<\/pre>\n<p>Run the codes below in R to do normalization and differential expression analysis.<\/p>\n<pre># set the project path\nprojPath=getwd();\nsetwd(projPath)\n\n# Normalize the data of three methods XAEM, Salmon and Cufflinks\nsource(\"Isoform_Expression_CPM_Normalization.R\")\n\n# Do DE analysis and plot figures\nsource(\"DEanalysis_plots.R\")\n\n# output: DE_Analysis.png<\/pre>\n<p>The results of the\u00a0differential expression analysis (Figure 1 below) are the plots (<em>DE_Analysis.png<\/em>) reproducing Figure 3 of the XAEM paper. Note that\u00a0due to the randomness of 50 times&#8217; run, the figure might be slightly different from the figure in the paper.<\/p>\n<div id=\"attachment_1144\" style=\"width: 635px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/2019\/06\/DE_Analysis_brca_sc_RDA.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-1144\" class=\"wp-image-1144 size-large\" src=\"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/2019\/06\/DE_Analysis_brca_sc_RDA-1024x1024.png\" alt=\"\" width=\"625\" height=\"625\" srcset=\"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/2019\/06\/DE_Analysis_brca_sc_RDA-1024x1024.png 1024w, https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/2019\/06\/DE_Analysis_brca_sc_RDA-150x150.png 150w, https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/2019\/06\/DE_Analysis_brca_sc_RDA-300x300.png 300w, https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/2019\/06\/DE_Analysis_brca_sc_RDA-768x768.png 768w, https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/2019\/06\/DE_Analysis_brca_sc_RDA-624x624.png 624w, https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/2019\/06\/DE_Analysis_brca_sc_RDA.png 1400w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><p id=\"caption-attachment-1144\" class=\"wp-caption-text\"><strong>Figure 1.<\/strong> Detection and validation of differentially expressed (DE) isoforms using the MDA- MB-231 scRNA-seq dataset. XAEM, Salmon and Cufflinks are presented in blue-solid, red-dashed and grey-dotted lines, respectively. The x-axis shows the number of top DE isoforms in the training set; the y-axis is the proportion of rediscovery in the validation set. The rediscovery rate (RDR) is calculated by comparing the top 100, 500 and 1000 DE isoforms from the training set with all the significant DE isoforms from the validation set. The boxplots show the RDR from 50 times\u2019 run. (a) Both training set and validation set are constructed using cells from batch 1. The quantification of XAEM, Salmon and Cufflinks is performed without bias correction. (b) The quantification from the three methods are bias- corrected. (c) The training set is constructed using cells from batch 1, while the validation set uses cells from batch 2. The RDR is calculated for only singleton isoforms. (d) The training set is constructed using cells from batch 1, and the validation set using cells from batch 2. The RDR is calculated using only non-paralogs.<\/p><\/div>\n<p><strong>References:\u00a0<\/strong><\/p>\n<ol>\n<li>Deng, Wenjiang, Tian Mou, Nifang Niu, Liewei Wang, Yudi Pawitan, and Trung Nghia Vu. 2019. \u201cAlternating EM Algorithm for a Bilinear Model in Isoform Quantification from RNA-Seq Data.\u201d Bioinformatics.\u00a0 https:\/\/doi.org\/10.1093\/bioinformatics\/btz640.<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>Contents 1. Introduction 2. Download and installation 3. XAEM: step by step\u00a0instruction and explanation 3.1 Preparation for the annotation reference 3.2 Quantification of transcripts 4. A practical copy-paste example of running XAEM 5. Dataset for differential expression (DE) analysis 1. Introduction This document shows how\u00a0to use\u00a0XAEM [Deng et al., 2019] to\u00a0quantify isoform expression for multiple [&hellip;]<\/p>\n","protected":false},"author":20,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[],"class_list":["post-742","post","type-post","status-publish","format-standard","hentry","category-rna-seq"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-json\/wp\/v2\/posts\/742","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-json\/wp\/v2\/users\/20"}],"replies":[{"embeddable":true,"href":"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-json\/wp\/v2\/comments?post=742"}],"version-history":[{"count":348,"href":"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-json\/wp\/v2\/posts\/742\/revisions"}],"predecessor-version":[{"id":1371,"href":"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-json\/wp\/v2\/posts\/742\/revisions\/1371"}],"wp:attachment":[{"href":"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-json\/wp\/v2\/media?parent=742"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-json\/wp\/v2\/categories?post=742"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-json\/wp\/v2\/tags?post=742"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}