{"id":358,"date":"2014-08-18T04:39:02","date_gmt":"2014-08-18T04:39:02","guid":{"rendered":"https:\/\/jira-test.meb.ki.se\/wpsites\/biostatwiki\/?p=358"},"modified":"2018-04-19T14:37:47","modified_gmt":"2018-04-19T14:37:47","slug":"rna-seq-analysis","status":"publish","type":"post","link":"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/rna-seq-analysis\/","title":{"rendered":"RNA-seq Analysis Using Old Sequgio (deprecated)"},"content":{"rendered":"<h2 style=\"color: #333333;\"><span style=\"color: #000000;\"><span style=\"font-size: medium;\">Welcome<\/span><\/span><\/h2>\n<div>\n<hr \/>\n<\/div>\n<p>This site is for analyzing RNA-sequence using tophat, cufflinks and sequgio. Mostly the examples are given using UPPMAX (<a style=\"color: #ff8f00 !important;\" href=\"http:\/\/www.uppmax.uu.se\/\" rel=\"nofollow\">http:\/\/www.uppmax.uu.se\/<\/a>)\u00a0facilities.<\/p>\n<h2 style=\"color: #333333;\"><a style=\"color: #0000cc;\" name=\"TOC-Data-Preparation\"><\/a><span style=\"font-size: medium;\">Data Preparation<\/span><\/h2>\n<div>\n<hr \/>\n<\/div>\n<div><\/div>\n<div>The following data\/files should be provided:<\/div>\n<div>\n<ul>\n<li>Fastq files<\/li>\n<li>Human Reference Genome (Fasta file)<\/li>\n<li>human\u00a0reference genome annotation\u00a0database\u00a0(B37 from EnsEMBL or hg19) (gtf file)<\/li>\n<\/ul>\n<\/div>\n<div><\/div>\n<div>Useful information the RNA-seq pipepline can be found here:\u00a0<a style=\"color: #ff8f00 !important;\" href=\"http:\/\/nestor.uppnex.se\/twiki\/bin\/view\/Courses\/CM1209\/TranscriptomeMappingFirst\" rel=\"nofollow\">http:\/\/nestor.uppnex.se\/twiki\/bin\/view\/Courses\/CM1209\/TranscriptomeMappingFirst<\/a><\/div>\n<div><\/div>\n<div><\/div>\n<div><\/div>\n<div>\n<div>\n<div>\n<h2 style=\"color: #333333;\"><span style=\"font-size: medium;\">Alignment<\/span><\/h2>\n<\/div>\n<div>\n<hr \/>\n<\/div>\n<div><span style=\"color: #555555;\">For Alignment, we use TopHat that aligns RNA-Seq reads to a genome in order to identify exon-exon splice junctions. It is built on the ultrafast short read mapping program Bowtie. The manual of Tophat can be found here:\u00a0<\/span><a style=\"color: #ff8f00 !important;\" href=\"http:\/\/tophat.cbcb.umd.edu\/manual.shtml\" rel=\"nofollow\">http:\/\/tophat.cbcb.umd.edu\/manual.shtml<\/a>. It is highly recommended to read Tophat&#8217;s manual before running the following examples.<\/div>\n<div><\/div>\n<div><\/div>\n<div><span style=\"font-size: medium;\">Using\u00a0<b>TopHat<\/b>:<\/span><\/div>\n<div><span style=\"font-size: medium;\">\u00a0<\/span><\/div>\n<div><span style=\"font-size: medium;\">Example of a shell code:<\/span><\/div>\n<div><\/div>\n<div class=\"sites-codeblock sites-codesnippet-block\">\n<blockquote>\n<div><span style=\"color: #006000;\">#!\/bin\/bash -l<\/span><\/div>\n<\/blockquote>\n<div>\n<blockquote>\n<div>#SBATCH -A b2012036<\/div>\n<div>#SBATCH -p node -n 8<\/div>\n<div>#SBATCH -t 40:00:00<\/div>\n<div>#SBATCH &#8211;mail-user=user@ki.se<\/div>\n<div>#SBATCH &#8211;mail-type=ALL<\/div>\n<div>#SBATCH -J tophat<\/div>\n<div><\/div>\n<div>module load bioinfo-tools<\/div>\n<div>module load tophat\/1.4.0<\/div>\n<div><\/div>\n<div>tophat -o INBOX\/BRCA\/batch1\/tophat.output.SRR327626\u00a0-p 8 &#8211;no-novel-juncs &#8211;library-type=fr-unstranded -G reference\/genes.gtf reference\/BowtieIndex\/genome INBOX\/BRCA\/batch1\/SRR327626_1.fastq.gz INBOX\/BRCA\/batch1\/SRR327626_2.fastq.gz<\/div>\n<\/blockquote>\n<\/div>\n<\/div>\n<div><span style=\"font-size: medium;\">\u00a0<\/span><\/div>\n<div><span style=\"font-size: medium;\">\u00a0<\/span><\/div>\n<div><span style=\"font-weight: bold; color: #333333;\">Expression Quantification\u00a0<\/span><\/div>\n<div>\n<hr \/>\n<p><span style=\"color: #505050;\"><b>Cufflinks\u00a0<\/b><\/span><\/p>\n<\/div>\n<\/div>\n<div><\/div>\n<div><span style=\"color: #505050; font-family: Verdana, Arial, sans-serif;\">Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one, taking into account biases in library preparation protocols. More detail and user manual of cufflinks can be found here:\u00a0<\/span><a style=\"color: #ff8f00 !important;\" href=\"http:\/\/cufflinks.cbcb.umd.edu\/\" rel=\"nofollow\">http:\/\/cufflinks.cbcb.umd.edu\/<\/a><\/div>\n<div><span style=\"color: #505050; font-family: Verdana, Arial, sans-serif;\"><b>\u00a0<\/b><\/span><\/div>\n<div>Using Cufflinks:<\/div>\n<div><\/div>\n<div class=\"sites-codeblock sites-codesnippet-block\">\n<blockquote>\n<div style=\"color: #006000;\">module load bioinfo-tools<\/div>\n<div style=\"color: #006000;\">module load cufflinks\/2.0.2<\/div>\n<div style=\"color: #006000;\"><\/div>\n<div style=\"color: #006000;\">cufflinks -o INBOX\/BRCA\/batch1\/Cuffres.SRR327626\u00a0-p 8 -G reference\/genes.gtf -b reference\/BowtieIndex\/genome.fa INBOX\/BRCA\/batch1\/tophat.output.SRR327626\/accepted_hits.bam<\/div>\n<\/blockquote>\n<\/div>\n<div><\/div>\n<div><span style=\"color: #006000;\">-G \u00a0<\/span>for option\u00a0<span style=\"color: #555555;\">no novel transcripts\u00a0<\/span><span style=\"color: #555555;\">assembled.<\/span><\/div>\n<div><\/div>\n<div><span style=\"color: #555555;\">\u00a0<\/span><\/div>\n<div>Both\u00a0<b>Tophat\u00a0<\/b>and\u00a0<b>cufflinks\u00a0<\/b>could be run simultaneously. Example for batch 7: (file:\u00a0<code style=\"color: #006000;\">topcuf_batch7sc.txt<\/code>):<\/div>\n<div><\/div>\n<div>\n<div><\/div>\n<div class=\"sites-codeblock sites-codesnippet-block\">\n<blockquote>\n<div><span style=\"color: #006000; font-family: monospace;\">#!\/bin\/bash -l<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">#SBATCH -A b2012036<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">#SBATCH -p node -n 8<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">#SBATCH -t 60:00:00<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">#SBATCH &#8211;mail-user=user@ki.se<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">#SBATCH &#8211;mail-type=ALL<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">#SBATCH -J TCB7sc<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">\u00a0<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">module load bioinfo-tools<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">\u00a0<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">module load tophat\/1.4.0\u00a0<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">\u00a0<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">module load cufflinks\/2.0.2<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">\u00a0<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">tophat -o \/scratch\/tophat.outputsc.$1 \\<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">\u00a0-p 8 &#8211;library-type=fr-unstranded \\<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">\u00a0-G reference\/genes.gtf reference\/BowtieIndex\/genome \\<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">\u00a0INBOX\/BRCA\/batch7\/$1_1.fastq.gz \\<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">\u00a0INBOX\/BRCA\/batch7\/$1_2.fastq.gz<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">\u00a0<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">\u00a0<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">cp -r \/scratch\/tophat.outputsc.$1 INBOX\/BRCA\/batch7\/<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">\u00a0<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">\u00a0<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">cufflinks -o \/scratch\/CuffresGsc.$1 \\<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">\u00a0-p 8 -G reference\/genes.gtf \\<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">\u00a0-b reference\/BowtieIndex\/genome.fa \\<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">\u00a0INBOX\/BRCA\/batch7\/tophat.outputsc.$1\/accepted_hits.bam<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">\u00a0<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">\u00a0<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">\u00a0<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">cp -r \/scratch\/CuffresGsc.$1 INBOX\/BRCA\/batch7\/<\/span><\/div>\n<\/blockquote>\n<div><\/div>\n<\/div>\n<div><\/div>\n<div>We input \u00a0sample names\u00a0<span style=\"color: #006000;\">$1<\/span><span style=\"font-size: small;\">\u00a0 \u00a0in the batch submission:<\/span><\/div>\n<div><\/div>\n<div><\/div>\n<div class=\"sites-codeblock sites-codesnippet-block\">\n<blockquote><p><span style=\"color: #006000;\"><code>sbatch topcuf_batch7sc.txt\u00a0<\/code><\/span><span style=\"color: #006000;\"><code>SRR327626<\/code><\/span><\/p>\n<div><\/div>\n<\/blockquote>\n<\/div>\n<div><span style=\"color: #006000;\">\u00a0<\/span><\/div>\n<div><span style=\"font-size: small;\">Note that the output of each step (T<b>ophat<\/b>\u00a0and\u00a0<b>cufflinks<\/b>) is stored\u00a0temporarily\u00a0in the local disk before copied to the\u00a0project\u00a0<\/span>directory<span style=\"font-size: small;\">\u00a0. Please read explanation about SCRATCH here:\u00a0<\/span><a style=\"color: #ff8f00 !important;\" href=\"http:\/\/www.uppmax.uu.se\/disk-storage-guide\" rel=\"nofollow\">http:\/\/www.uppmax.uu.se\/disk-storage-guide<\/a><\/div>\n<div><\/div>\n<div><\/div>\n<div>To submit all samples, we use R for submitting in parallel:<\/div>\n<div><\/div>\n<div><\/div>\n<div class=\"sites-codeblock sites-codesnippet-block\">\n<blockquote>\n<div><span style=\"color: #006000;\">\u00a0<\/span><\/div>\n<div><span style=\"color: #006000;\"><code>#################################<\/code><\/span><\/div>\n<div><span style=\"color: #006000;\"><code>## Run Tophat and Cufflinks \u00a0 ###<\/code><\/span><\/div>\n<div><span style=\"color: #006000;\"><code>## allowing novel transcripts ###<\/code><\/span><\/div>\n<div><span style=\"color: #006000;\"><code>## \u00a0BATCH 7<\/code> <code>###<\/code><\/span><\/div>\n<div><span style=\"color: #006000;\"><code>################################# \u00a0 \u00a0 \u00a0<\/code><\/span><\/div>\n<div><span style=\"color: #006000;\">\u00a0<\/span><\/div>\n<div><span style=\"color: #006000;\"><code>setwd('\/lynx\/cvol\/v25\/b2012036\/INBOX\/BRCA\/')<\/code><\/span><\/div>\n<div><span style=\"color: #006000;\">\u00a0<\/span><\/div>\n<div><span style=\"color: #006000;\"><code>f.long = dir(recursive=TRUE)<\/code><\/span><\/div>\n<div><span style=\"color: #006000;\"><code>ffastq &lt;- f.long[grep(\".fastq.gz\", f.long)]<\/code><\/span><\/div>\n<div><span style=\"color: #006000;\"><code>SRR.batch7 = unique(substr(ffastq ,1,9))<\/code><\/span><\/div>\n<div><span style=\"color: #006000;\"><code>write.csv2(SRR.batch7,file=\"SRR.batch7.csv\")<\/code><\/span><\/div>\n<div><span style=\"color: #006000;\">\u00a0<\/span><\/div>\n<div><span style=\"color: #006000;\">\u00a0<\/span><\/div>\n<div><span style=\"color: #006000;\"><code>setwd('<\/code><\/span><span style=\"color: #006000;\">\/lynx\/cvol\/v25\/b2012036\/INBOX\/BRCA\/<\/span><span style=\"color: #006000;\"><code>')<\/code><\/span><\/div>\n<div><span style=\"color: #006000;\">\u00a0<\/span><\/div>\n<div><span style=\"color: #006000;\"><code>topcufMC7sc = function(f){<\/code><\/span><\/div>\n<div><span style=\"color: #006000;\"><code>\u00a0cmd =paste('sbatch topcuf_batch7sc.txt', f,sep=' ')<\/code><\/span><\/div>\n<div><span style=\"color: #006000;\"><code>\u00a0system(cmd)<\/code><\/span><\/div>\n<div><span style=\"color: #006000;\"><code>}<\/code><\/span><\/div>\n<div><span style=\"color: #006000;\">\u00a0<\/span><\/div>\n<div><span style=\"color: #006000;\">\u00a0<\/span><\/div>\n<div><span style=\"color: #006000;\"><code>for (i in 1:length(<\/code><\/span><span style=\"color: #006000;\">SRR.batch7<\/span><span style=\"color: #006000;\">) \u00a0) \u00a0topcufMC7sc(SRR.batch7[i])<\/span><\/div>\n<\/blockquote>\n<div style=\"color: #006000;\"><\/div>\n<\/div>\n<div style=\"color: #006000;\"><\/div>\n<div>More R code are in<span style=\"color: #006000;\">: \u00a0\u00a0<\/span><span style=\"color: #006000;\">run paralel.R<\/span><\/div>\n<div><\/div>\n<div><\/div>\n<div><\/div>\n<\/div>\n<div><b><span style=\"color: #505050; font-family: Verdana, Arial, sans-serif; font-size: medium;\">Sequgio<\/span><\/b><\/div>\n<div>\n<hr \/>\n<\/div>\n<div><b>Getting TXDB<\/b>\u00a0(<code style=\"color: #006000;\">runReshape.R<\/code>)<\/div>\n<div><\/div>\n<div>\n<div><\/div>\n<div class=\"sites-codeblock sites-codesnippet-block\">\n<blockquote>\n<div><code style=\"color: #006000;\">library(Sequgio)<\/code><\/div>\n<div><code style=\"color: #006000;\">dbfile &lt;- \"GRCh37.69.sqlite\"<\/code><\/div>\n<div><code style=\"color: #006000;\">mybio &lt;- loadDb(dbfile)<\/code><\/div>\n<div><code style=\"color: #006000;\">mparam &lt;- MulticoreParam(8)<\/code><\/div>\n<div><code style=\"color: #006000;\">txdb &lt;- reshapeTxDb(mybio,probelen=50L,with.junctions=T,mcpar=mparam)<\/code><\/div>\n<div><\/div>\n<div><code style=\"color: #006000;\">save(txdb,file= \"txdb37.RData\")<\/code><\/div>\n<\/blockquote>\n<\/div>\n<\/div>\n<div><\/div>\n<div><\/div>\n<div><\/div>\n<div><b>Make design matrix<\/b>\u00a0(<code style=\"color: #006000;\">makeDesign.R<\/code>):<\/div>\n<div><\/div>\n<div class=\"sites-codeblock sites-codesnippet-block\">\n<blockquote>\n<div><span style=\"color: #006000;\">mparam &lt;- MulticoreParam(16)<\/span><\/div>\n<\/blockquote>\n<div>\n<blockquote>\n<div><code style=\"color: #006000;\">attr(txdb,\"probelen\") = 50L<\/code><\/div>\n<div><code style=\"color: #006000;\">Design &lt;- makeXmatrix(txdb,method=\"PE\",mulen=200,sdlen=80,mcpar=mparam)<\/code><\/div>\n<div><code style=\"color: #006000;\">save(Design, file=\"Design.RData\")<\/code><\/div>\n<\/blockquote>\n<\/div>\n<div><\/div>\n<\/div>\n<div><\/div>\n<div><\/div>\n<div>\n<div><b>* For each bamfile, fix the qname. Different types of header has different regex parameters (-r and -s) for fixQNAME.py.<\/b> For example: header pattern = UNC9-SN296_240:1:1101:10000:104941\/1 and\u00a0UNC9-SN296_240:1:1101:10000:104941\/2\u00a0then regex = -r &#8220;\/\\d+$&#8221; -s &#8220;&#8221;<\/div>\n<div>header pattern =\u00a0SRR039629.1000004\u00a0and\u00a0SRR039628.1000004\u00a0then regex = -r\u00a0&#8220;(SRR)(\\d+)(\\.\\d+)$&#8221; -s &#8220;\\g&lt;1&gt;12829\\g&lt;3&gt;&#8221;<\/div>\n<div><b>\u00a0<\/b><\/div>\n<div><\/div>\n<div class=\"sites-codeblock sites-codesnippet-block\">\n<div><\/div>\n<div>\n<div><span style=\"color: #006000;\">\u00a0<\/span><\/div>\n<blockquote>\n<div><span style=\"color: #006000;\">export PYTHONPATH=\/home\/dhany\/pysam-0.7.5\/lib64\/python2.6\/site-packages\/<\/span><\/div>\n<\/blockquote>\n<\/div>\n<blockquote>\n<div><span style=\"color: #006000; font-family: monospace;\">cd \/home\/dhany\/pysam-0.7.5\/<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">python setup.py install &#8211;prefix \/home\/dhany\/pysam-0.7.5<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">date; python \/home\/dhany\/fixQNAME.py -i yourfile.bam -o yourfile.fixed.bam -r &#8220;\/\\d+$&#8221; -s &#8220;&#8221;; date<\/span><\/div>\n<\/blockquote>\n<div><\/div>\n<\/div>\n<div><\/div>\n<\/div>\n<p><b>Get count<\/b>s\u00a0<code style=\"color: #006000;\">runGetcountsBatch7.R:<\/code><\/p>\n<div><\/div>\n<div>\n<div><\/div>\n<div class=\"sites-codeblock sites-codesnippet-block\">\n<blockquote>\n<div><span style=\"color: #006000;\"><code>library(Sequgio)<\/code><\/span><\/div>\n<div><code style=\"color: #006000;\">mparam &lt;- MulticoreParam(8)<\/code><\/div>\n<div><\/div>\n<div><code style=\"color: #006000;\">load( \"txdb37.RData\")<\/code><\/div>\n<div><\/div>\n<div><code style=\"color: #006000;\">loc &lt;- \u00a0\"\/pica\/h1\/setia\/BRCA\/INBOX\/BRCA\/batch7\/\"<\/code><\/div>\n<div><\/div>\n<div><code style=\"color: #006000;\">files &lt;- dir(path=loc )<\/code><\/div>\n<div><code style=\"color: #006000;\">files &lt;- files[grep(\"tophat.outputsc\",files)]<\/code><\/div>\n<div><\/div>\n<div><code style=\"color: #006000;\">args=(commandArgs(TRUE))<\/code><\/div>\n<div><code style=\"color: #006000;\">args<\/code><\/div>\n<div><code style=\"color: #006000;\">args[[1]]<\/code><\/div>\n<div><code style=\"color: #006000;\">if(length(args)==0)<\/code><\/div>\n<div><code style=\"color: #006000;\">\u00a0 \u00a0 stop(\"No chromosome supplied.\")<\/code><\/div>\n<div><\/div>\n<div><code style=\"color: #006000;\">eval(parse(text=args[[1]]))<\/code><\/div>\n<div><\/div>\n<div><code style=\"color: #006000;\">samples &lt;- substr(files[i],17,25)<\/code><\/div>\n<div><code style=\"color: #006000;\">samples\u00a0<\/code><\/div>\n<div><code style=\"color: #006000;\">target &lt;- data.frame(filenames= paste(loc, \"tophat.outputsc.\", samples, \"\/accepted_hits.bam\", sep=\"\") ,\u00a0<\/code><\/div>\n<div><code style=\"color: #006000;\">samplenames=samples ,<\/code><\/div>\n<div><code style=\"color: #006000;\">index=paste(loc, \"tophat.outputsc.\", samples, \"\/accepted_hits.bam.bai\", sep=\"\"),stringsAsFactors=FALSE)<\/code><\/div>\n<div><\/div>\n<div><\/div>\n<div><code style=\"color: #006000;\">allCounts.bigM &lt;- getCounts(target,txdb ,mcpar=mparam,mapq.filter= 30,use.samtools=T)<\/code><\/div>\n<div><\/div>\n<div><code style=\"color: #006000;\">allCounts &lt;- as.matrix(allCounts.bigM[,,drop=FALSE])<\/code><\/div>\n<div><\/div>\n<div><code style=\"color: #006000;\">save(allCounts, file= paste(\"\/bubo\/home\/h1\/setia\/BRCA\/Sequgio\/batch7\/allCounts.\",samples,\".RData\", sep=\"\") )<\/code><\/div>\n<\/blockquote>\n<div><\/div>\n<\/div>\n<div>The input of that code is an index\u00a0<code style=\"color: #006000;\">i<\/code>\u00a0for a sample defined in (<code style=\"color: #006000;\">Getcountbatch7.txt<\/code>):<\/div>\n<div><\/div>\n<div><\/div>\n<div><\/div>\n<div class=\"sites-codeblock sites-codesnippet-block\">\n<blockquote>\n<div><span style=\"color: #006000;\">#!\/bin\/bash -l<\/span><\/div>\n<\/blockquote>\n<div>\n<blockquote>\n<div><code style=\"color: #006000;\">#SBATCH -A b2012036<\/code><\/div>\n<div><code style=\"color: #006000;\">#SBATCH -p node -n 8<\/code><\/div>\n<div><code style=\"color: #006000;\">#SBATCH -t 5:00:00<\/code><\/div>\n<div><code style=\"color: #006000;\">#SBATCH -C mem72GB<\/code><\/div>\n<div><code style=\"color: #006000;\">#SBATCH --mail-user=setia.pramana@ki.se<\/code><\/div>\n<div><code style=\"color: #006000;\">#SBATCH --mail-type=ALL<\/code><\/div>\n<div><code style=\"color: #006000;\">#SBATCH -J CountB7<\/code><\/div>\n<div><\/div>\n<div><code style=\"color: #006000;\">module load bioinfo-tools<\/code><\/div>\n<div><code style=\"color: #006000;\">module load GATK<\/code><\/div>\n<div><code style=\"color: #006000;\">module load samtools\/0.1.18<\/code><\/div>\n<div><\/div>\n<div><code style=\"color: #006000;\">R &lt; runGetcountsBatch7.R --no-save $1<\/code><\/div>\n<\/blockquote>\n<\/div>\n<\/div>\n<div><\/div>\n<div><\/div>\n<div><\/div>\n<div>For example for the first sample we can run:<\/div>\n<div><\/div>\n<div><\/div>\n<div class=\"sites-codeblock sites-codesnippet-block\">\n<blockquote>\n<div><code style=\"color: #006000;\">sbatch Getcountbatch7.txt '--args i=1'<\/code><\/div>\n<div><\/div>\n<\/blockquote>\n<\/div>\n<div><\/div>\n<div>For submittiong multi samples, use the following sbatch command (<code style=\"color: #006000;\">MultsubmitBatch7.txt<\/code>):<\/div>\n<div><\/div>\n<div>\n<div><\/div>\n<div class=\"sites-codeblock sites-codesnippet-block\">\n<blockquote>\n<div><code style=\"color: #006000;\">#!\/bin\/bash \u00a0-l<\/code><\/div>\n<div><code style=\"color: #006000;\">#SBATCH -A b2012036<\/code><\/div>\n<div><code style=\"color: #006000;\">#SBATCH -p core -n 5<\/code><\/div>\n<div><code style=\"color: #006000;\">#SBATCH -t 15:00 --qos=short<\/code><\/div>\n<div><code style=\"color: #006000;\">#SBATCH -J submit<\/code><\/div>\n<div><code style=\"color: #006000;\">#SBATCH --mail-user=setia.pramana@ki.se<\/code><\/div>\n<div><code style=\"color: #006000;\">#SBATCH --mail-type=ALL<\/code><\/div>\n<div><\/div>\n<div><code style=\"color: #006000;\">##################<\/code><\/div>\n<div><\/div>\n<div><code style=\"color: #006000;\">p=\"--args\"<\/code><\/div>\n<div><code style=\"color: #006000;\">u=\" i=\"<\/code><\/div>\n<div><code style=\"color: #006000;\">v=\"sbatch Getcountbatch7.txt\"<\/code><\/div>\n<div><\/div>\n<div><code style=\"color: #006000;\"># sbatch MultsubmitBatch7.txt #<\/code><\/div>\n<div><\/div>\n<div><code style=\"color: #006000;\">for i in {1..40}<\/code><\/div>\n<div><code style=\"color: #006000;\">do<\/code><\/div>\n<div><\/div>\n<div><code style=\"color: #006000;\">echo $v \\'$p$u$i\\'<\/code><\/div>\n<div><code style=\"color: #006000;\">eval $v \\'$p$u$i\\'<\/code><\/div>\n<div><\/div>\n<div><code style=\"color: #006000;\">done<\/code><\/div>\n<\/blockquote>\n<\/div>\n<\/div>\n<div><\/div>\n<div>Note that 40 is the number of samples in that batch.<\/div>\n<div><\/div>\n<div>\n<blockquote>\n<div class=\"sites-codeblock sites-codesnippet-block\"><span style=\"color: #006000;\"><code>sbatch\u00a0<\/code><\/span><span style=\"color: #006000;\"><code>MultsubmitBatch7.txt<\/code><\/span><\/div>\n<\/blockquote>\n<\/div>\n<div><span style=\"color: #006000;\">\u00a0<\/span><\/div>\n<div><span style=\"color: #006000;\">\u00a0<\/span><\/div>\n<div><span style=\"color: #006000;\">\u00a0<\/span><\/div>\n<div><b>Model fitting<\/b>:\u00a0<code style=\"color: #006000;\">r<\/code><code style=\"color: #006000;\">unGetcountsBatch7.R<\/code><\/div>\n<\/div>\n<div>\n<div><\/div>\n<div class=\"sites-codeblock sites-codesnippet-block\">\n<blockquote>\n<div><code style=\"color: #006000;\">library(Sequgio)<\/code><\/div>\n<div><\/div>\n<div><code style=\"color: #006000;\">setwd('\/pica\/h1\/setia\/BRCA\/INBOX\/Sequgio_TCGA\/')<\/code><\/div>\n<div><code style=\"color: #006000;\">load('\/pica\/h1\/setia\/BRCA\/INBOX\/BRCA\/txdb37.RData')<\/code><\/div>\n<div><code style=\"color: #006000;\">loc &lt;- \u00a0\"\/pica\/h1\/setia\/BRCA\/INBOX\/Sequgio_TCGA\/batch7\/\"<\/code><\/div>\n<div><\/div>\n<div><code style=\"color: #006000;\">files &lt;- dir(path=loc )<\/code><\/div>\n<div><code style=\"color: #006000;\">files &lt;- files[grep(\"allCounts\",files)]<\/code><\/div>\n<div><\/div>\n<div><\/div>\n<div><code style=\"color: #006000;\">allCountsMat &lt;- NULL<\/code><\/div>\n<div><code style=\"color: #006000;\">for (i in 1:length(files)) {<\/code><\/div>\n<div><code style=\"color: #006000;\">load(paste(loc ,files[i],sep=\"\" ) \u00a0 \u00a0)<\/code><\/div>\n<div><code style=\"color: #006000;\">allCountsMat &lt;- cbind(allCountsMat, allCounts )<\/code><\/div>\n<div><code style=\"color: #006000;\">\u00a0 \u00a0 \u00a0 rm(allCounts )<\/code><\/div>\n<div><code style=\"color: #006000;\">cat(i)<\/code><\/div>\n<div><code style=\"color: #006000;\">}<\/code><\/div>\n<div><\/div>\n<div><code style=\"color: #006000;\">load('\/pica\/h1\/setia\/BRCA\/Design37.RData')<\/code><\/div>\n<div><\/div>\n<div><\/div>\n<div><code style=\"color: #006000;\">## Fit Models ##<\/code><\/div>\n<div><\/div>\n<div><code style=\"color: #006000;\">\u00a0library(parallel)<\/code><\/div>\n<div><code style=\"color: #006000;\">\u00a0 \u00a0 \u00a0 \u00a0gNames &lt;- sapply(Design, function(x) strsplit(attributes(x)$dimnames[[2]][1], '__')[[1]][2])<\/code><\/div>\n<div><code style=\"color: #006000;\">\u00a0 \u00a0 \u00a0 \u00a0 names(gNames) &lt;- gNames<\/code><\/div>\n<div><code style=\"color: #006000;\">\u00a0 \u00a0 \u00a0 \u00a0 names(Design) &lt;- gNames<\/code><\/div>\n<div><\/div>\n<div><code style=\"color: #006000;\">Thetas &lt;- mclapply(gNames,fitModels,design=Design,counts=allCountsMat ,maxit=20,verbose=T, useC=F, ls.start=F, Q1=0.9)<\/code><\/div>\n<div><\/div>\n<div><code style=\"color: #006000;\">save(Thetas , file='Thetas.batch7.RData')<\/code><\/div>\n<\/blockquote>\n<\/div>\n<div><\/div>\n<\/div>\n<div><\/div>\n<div><\/div>\n<div>The result of Sequgio is located in:<\/div>\n<div><\/div>\n<div><\/div>\n<div class=\"sites-codeblock sites-codesnippet-block\">\n<blockquote>\n<div><code style=\"color: #006000;\">\/lynx\/cvol\/v25\/b2012036\/INBOX\/Sequgio_TCGA<\/code><\/div>\n<div><\/div>\n<\/blockquote>\n<\/div>\n<div><\/div>\n<div>\n<div><b><span style=\"color: #505050; font-family: Verdana, Arial, sans-serif; font-size: medium;\">Sequgio with Python (alternative way)<\/span><\/b><\/div>\n<div>\n<hr \/>\n<\/div>\n<div><b>\u00a0<\/b><\/div>\n<div><b>\u00a0<\/b><\/div>\n<div><b>Creating TXDB<\/b>\u00a0&#8211;&gt; the 5th line took 9 hours using 8 cores for GRCh reference.<\/div>\n<div><\/div>\n<div>\n<div><\/div>\n<div class=\"sites-codeblock sites-codesnippet-block\">\n<blockquote>\n<div><code style=\"color: #006000;\">library(Sequgio)<\/code><\/div>\n<div><code style=\"color: #006000;\"><span style=\"font-size: medium;\">dbfile &lt;- \"<\/span>\/proj\/b2012036\/GRCh37.69.sqlite<span style=\"font-size: medium;\">\"<\/span><\/code><\/div>\n<div><code style=\"color: #006000;\">mybio &lt;- loadDb(dbfile)<\/code><\/div>\n<div><code style=\"color: #006000;\">mparam &lt;- MulticoreParam(8)<\/code><\/div>\n<div><code style=\"color: #006000;\">txdb &lt;- reshapeTxDb(mybio,probelen=50L,with.junctions=T,mcpar=mparam)<\/code><\/div>\n<div><code style=\"color: #006000;\">save(txdb,file=\"\/proj\/b2012036\/Dhany\/Sequgio\/txdbgrch.RData\")<\/code><\/div>\n<div><code style=\"color: #006000;\">write.table(as.data.frame(txdb@unlistData), \"\/proj\/b2012036\/Dhany\/Sequgio\/txdb.sql\", sep=\"\\t\")<\/code><\/div>\n<div>db &lt;- dbConnect(SQLite(), dbname=&#8221;\/proj\/b2012036\/Dhany\/Sequgio\/grch3769.sqlite&#8221;)<\/div>\n<div>\n<div>dbWriteTable(conn=db, name=&#8221;humangenome&#8221;, value=&#8221;txdb.sql&#8221;, row.names=FALSE, header=FALSE, sep=&#8221;\\t&#8221;)<\/div>\n<\/div>\n<div><\/div>\n<\/blockquote>\n<div><\/div>\n<\/div>\n<\/div>\n<div><\/div>\n<div>\n<div><b>Make design matrix<\/b>\u00a0&#8211;&gt; the 4th line took 7 hours 40 min using 16 cores for\u00a0GRCh reference.<\/div>\n<div><\/div>\n<div><\/div>\n<div class=\"sites-codeblock sites-codesnippet-block\">\n<blockquote>\n<div><span style=\"color: #006000;\">library(Sequgio)<\/span><\/div>\n<div><span style=\"color: #006000;\">load(<\/span><span style=\"color: #006000;\">&#8220;<\/span><span style=\"color: #006000;\">\/proj\/b2012036\/Dhany\/Sequgio\/<\/span><span style=\"color: #006000;\">txdbgrch.RData&#8221;)<\/span><\/div>\n<div><span style=\"color: #006000;\">mparam &lt;- MulticoreParam(16)<\/span><\/div>\n<\/blockquote>\n<div>\n<blockquote>\n<div><code style=\"color: #006000;\">attr(txdb,\"probelen\") = 50L<\/code><\/div>\n<div><code style=\"color: #006000;\">Design &lt;- makeXmatrix(txdb,method=\"PE\",mulen=200,sdlen=80, mcpar=mparam)<\/code><\/div>\n<div><code style=\"color: #006000;\">save(Design, file=\"\/proj\/b2012036\/Dhany\/Sequgio\/DesignGrch.RData\")<\/code><\/div>\n<\/blockquote>\n<\/div>\n<div><\/div>\n<\/div>\n<div><\/div>\n<\/div>\n<div><\/div>\n<div>\n<div><b>Get python count (run in bash, 8 nodes).\u00a0<\/b><span style=\"color: #ff0000;\"><b>28 min<\/b><\/span>\u00a0preprocess (=6 min filtering + 10 min sorting + 12 min sorting multiple aln),\u00a0<b><span style=\"color: #ff0000;\">1.5 min<\/span><\/b>\u00a0separating into chr (line 2-5),\u00a0<b><span style=\"color: #ff0000;\">3 min<\/span><\/b>\u00a0for python getCounts (line 6-9) for a 6 GB bamfile.<\/div>\n<div>PS: You need to download\u00a0<a style=\"color: #ff8f00 !important;\" href=\"https:\/\/github.com\/dhanysaputra\/getExonCount\/blob\/master\/preprocess.sh\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">preprocess.sh<\/a>\u00a0and\u00a0<a style=\"color: #ff8f00 !important;\" href=\"https:\/\/github.com\/dhanysaputra\/getExonCount\/blob\/master\/getPairCounts.py\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">getPairCount.py<\/a>. You can play around by changing parallel=other than 12 and number of nodes= other than 8 to get your own setting for faster implementation.<\/div>\n<div><\/div>\n<div>\n<div><\/div>\n<div><\/div>\n<div><\/div>\n<div class=\"sites-codeblock sites-codesnippet-block\">\n<blockquote>\n<div><span style=\"color: #006000;\">cd\u00a0<\/span><span style=\"color: #006000; font-family: monospace;\">\/proj\/b2012036\/INBOX\/Dhany\/newdata\/tophat.outputsc.SRR328008\/<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace; font-size: small;\">bash \/<\/span><span style=\"color: #006000; font-family: monospace;\">proj\/b2012036\/Dhany\/Sequgio\/preprocess.sh<\/span><span style=\"color: #006000; font-family: monospace; font-size: small;\">\u00a0file=accepted_hits.bam<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">for (( i=1; i&lt;=22; i++ )); do awk -v j=$i &#8216;{ if($3==j) print $0 }&#8217; accepted_hits.bam.sortMA &gt; accepted_hits.$i &amp; done<\/span><\/div>\n<div>\n<div>awk &#8216;{ if($3==&#8221;X&#8221;) print $0 }&#8217; accepted_hits.bam.sortMA &gt; accepted_hits.X &amp;<\/div>\n<div>awk &#8216;{ if($3==&#8221;Y&#8221;) print $0 }&#8217; accepted_hits.bam.sortMA &gt; accepted_hits.Y &amp;<\/div>\n<div>wait<\/div>\n<div>for (( i=1; i&lt;=22; i++ )); do python\u00a0<span style=\"color: #006000; font-family: monospace; font-size: small;\">\/<\/span><span style=\"color: #006000; font-family: monospace;\">proj\/b2012036\/Dhany\/Sequgio\/<\/span>getPairCount.py accepted_hits output.$i\u00a0<span style=\"color: #006000; font-family: monospace; font-size: small;\">\/<\/span><span style=\"color: #006000; font-family: monospace;\">proj\/b2012036\/Dhany\/Sequgio\/<\/span>grch3769.sqlite $i &amp; done<\/div>\n<div>python\u00a0<span style=\"color: #006000; font-family: monospace; font-size: small;\">\/<\/span><span style=\"color: #006000; font-family: monospace;\">proj\/b2012036\/Dhany\/Sequgio\/<\/span>getPairCount.py accepted_hits output.X\u00a0<span style=\"color: #006000; font-family: monospace; font-size: small;\">\/<\/span><span style=\"color: #006000; font-family: monospace;\">proj\/b2012036\/Dhany\/Sequgio\/<\/span>grch3769.sqlite X &amp;<\/div>\n<div>python\u00a0<span style=\"color: #006000; font-family: monospace; font-size: small;\">\/<\/span><span style=\"color: #006000; font-family: monospace;\">proj\/b2012036\/Dhany\/Sequgio\/<\/span>getPairCount.py accepted_hits output.Y\u00a0<span style=\"color: #006000; font-family: monospace; font-size: small;\">\/<\/span><span style=\"color: #006000; font-family: monospace;\">proj\/b2012036\/Dhany\/Sequgio\/<\/span>grch3769.sqlite Y &amp;<\/div>\n<div>wait<\/div>\n<div>cat output.* &gt; outputfinal.txt<\/div>\n<\/div>\n<div>\n<div><span style=\"color: #006000;\">rm -f output.*<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">for (( i=1; i&lt;=22; i++ )); do rm -f accepted_hits.$i &amp; done<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">rm -f accepted_hits.X<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">rm -f accepted_hits.Y<\/span><\/div>\n<\/div>\n<div><\/div>\n<\/blockquote>\n<div><\/div>\n<\/div>\n<\/div>\n<\/div>\n<div><\/div>\n<div><\/div>\n<div>\n<div><b>Writing list of possible exon pairs (in R)<\/b><\/div>\n<div><\/div>\n<div><\/div>\n<div class=\"sites-codeblock sites-codesnippet-block\">\n<blockquote>\n<div><span style=\"color: #006000;\">library(Sequgio)<\/span><\/div>\n<div><span style=\"color: #006000;\">load(<\/span><span style=\"color: #006000;\">&#8220;txdbgrch.RData&#8221;)<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">ex_list &lt;- split(values(txdb@unlistData)$exon_name,values(txdb@unlistData)$tx_name)<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">reg_vec &lt;- sapply(split(values(<\/span><span style=\"color: #006000;\">txdb@unlistData<\/span><span style=\"color: #006000; font-family: monospace;\">)$region_id,values(<\/span><span style=\"color: #006000;\">txdb@unlistData<\/span><span style=\"color: #006000; font-family: monospace;\">)$tx_name),function(x) x[1])<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">sizes_ex_list &lt;- sapply(ex_list,length)<\/span><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">n.exons &lt;- sum((sizes_ex_list^2+sizes_ex_list)\/2)<\/span><\/div>\n<div>\n<div><code style=\"color: #006000;\">exons.names &lt;- unique(.Call(\"makeExNames\",ex_list,reg_vec,as.integer(n.exons)))<\/code><\/div>\n<div><span style=\"color: #006000; font-family: monospace;\">write.table(exons.names, &#8220;exons19.txt&#8221;, row.names=FALSE, col.names=FALSE, quote=FALSE, sep=&#8221;\\t&#8221;)<\/span><\/div>\n<\/div>\n<div><\/div>\n<\/blockquote>\n<\/div>\n<\/div>\n<div><b>\u00a0<\/b><\/div>\n<div>\n<div><b>\u00a0<\/b><\/div>\n<div><b>Joining counts of different samples into 1 file (in bash):<\/b><\/div>\n<div><\/div>\n<div><\/div>\n<blockquote>\n<div class=\"sites-codeblock sites-codesnippet-block\"><span style=\"color: #006000;\">python\u00a0<\/span><span style=\"color: #006000; font-family: monospace;\">\/proj\/b2012036\/Dhany\/Sequgio\/makeAllcountMatrix.py\u00a0<\/span><span style=\"color: #006000;\">\/proj\/b2012036\/Dhany\/Sequgio\/<\/span><span style=\"color: #006000;\">exons.txt 5\u00a0<\/span><span style=\"color: #006000;\">\/proj\/b2012036\/INBOX\/Dhany\/newdata\/<\/span><span style=\"color: #006000;\">tophat.output.SRR327626\/output.txt\u00a0<\/span><span style=\"color: #006000;\">\/proj\/b2012036\/INBOX\/Dhany\/newdata\/<\/span><span style=\"color: #006000;\">tophat.output.SRR327734\/outputfinal.txt\u00a0<\/span><span style=\"color: #006000;\">\/proj\/b2012036\/INBOX\/Dhany\/newdata\/<\/span><span style=\"color: #006000;\">tophat.output.SRR327735\/outputfinal.txt\u00a0<\/span><span style=\"color: #006000;\">\/proj\/b2012036\/INBOX\/Dhany\/newdata\/<\/span><span style=\"color: #006000;\">tophat.output.SRR327736\/outputfinal.txt\u00a0<\/span><span style=\"color: #006000;\">\/proj\/b2012036\/INBOX\/Dhany\/newdata\/<\/span><span style=\"color: #006000;\">tophat.output.SRR327737\/outputfinal.txt SRR327626 SRR327734 SRR327735 SRR327736 SRR327737 &gt; sample.counts;<\/span><\/div>\n<\/blockquote>\n<\/div>\n<blockquote>\n<div><b>\u00a0<\/b><\/div>\n<\/blockquote>\n<div><b>\u00a0<\/b><\/div>\n<div><b>Model fitting:<\/b><\/div>\n<div><b>\u00a0<\/b><\/div>\n<div>Command:<\/div>\n<div><\/div>\n<blockquote>\n<div>bash \/proj\/b2012036\/Dhany\/Sequgio\/batch_fastfitting.sh &lt;size of Design matrix&gt; &lt;Design matrix.RData&gt; &lt;allCounts&gt; &lt;proj number&gt; &lt;emails for slurm status&gt; &lt;number of cores per job&gt; &lt;number of Design matrix size to process per batch&gt; &lt;your uppmax username&gt;<\/div>\n<div>Play around with &lt;number of cores per job&gt; and &lt;number of Design matrix size to process per batch&gt; to get better performance.<\/div>\n<div><b>\u00a0<\/b><\/div>\n<\/blockquote>\n<div>\n<blockquote>\n<div><\/div>\n<div class=\"sites-codeblock sites-codesnippet-block\">\n<div><code style=\"color: #006000;\">bash\u00a0<\/code><span style=\"color: #006000;\">\/proj\/b2012036\/Dhany\/Sequgio\/batch_fastfitting.sh<\/span><span style=\"color: #006000; font-family: monospace;\">31700\u00a0<\/span><span style=\"color: #006000;\">\/proj\/b2012036\/Dhany\/Sequgio\/DesignGrch.RData\u00a0<\/span><span style=\"color: #006000;\">\/proj\/b2012036\/INBOX\/Dhany\/newdata\/sample.count.grch\u00a0<\/span><span style=\"color: #006000; font-family: monospace;\">b2012036\u00a0dhany.saputra@ki.se 4 500 dhany<\/span><\/div>\n<div><span style=\"color: #006000;\">cat myR.* &gt; myfpkm.txt<\/span><\/div>\n<\/div>\n<div><\/div>\n<\/blockquote>\n<\/div>\n<div><\/div>\n<div><strong>Required FIles<\/strong><\/div>\n<div><\/div>\n<div><a href=\"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/2014\/08\/Getcountbatch7.txt\">Getcountbatch7<\/a><\/div>\n<div><\/div>\n<div><a href=\"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/2014\/08\/MultsubmitBatch7.txt\">MultsubmitBatch7<\/a><\/div>\n<div><\/div>\n<div><a href=\"https:\/\/sites.google.com\/site\/biostatinfocore\/rnaseq\/fitsequgio.batch7.R\">fitsequgio.batch7.R<\/a><\/div>\n<div><\/div>\n<div><a href=\"https:\/\/sites.google.com\/site\/biostatinfocore\/rnaseq\/makeDesign.R?attredirects=0&amp;d=1\">makeDesign.R<\/a><\/div>\n<div><\/div>\n<div><a href=\"https:\/\/sites.google.com\/site\/biostatinfocore\/rnaseq\/run%20paralel.R\">run.paralel.R<\/a><\/div>\n<div><\/div>\n<div><a href=\"https:\/\/sites.google.com\/site\/biostatinfocore\/rnaseq\/runGetcountsBatch7.R\">runGetcountsBatch7.R<\/a><\/div>\n<div><\/div>\n<div><a href=\"https:\/\/sites.google.com\/site\/biostatinfocore\/rnaseq\/runReshape.R\">runReshape.R<\/a><\/div>\n<div><\/div>\n<div><a href=\"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-content\/uploads\/sites\/4\/2014\/08\/topcuf_batch7sc.txt\">topcuf_batch7sc<\/a><\/div>\n<div><\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Welcome This site is for analyzing RNA-sequence using tophat, cufflinks and sequgio. Mostly the examples are given using UPPMAX (http:\/\/www.uppmax.uu.se\/)\u00a0facilities. Data Preparation The following data\/files should be provided: Fastq files Human Reference Genome (Fasta file) human\u00a0reference genome annotation\u00a0database\u00a0(B37 from EnsEMBL or hg19) (gtf file) Useful information the RNA-seq pipepline can be found here:\u00a0http:\/\/nestor.uppnex.se\/twiki\/bin\/view\/Courses\/CM1209\/TranscriptomeMappingFirst Alignment For [&hellip;]<\/p>\n","protected":false},"author":10,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[3],"class_list":["post-358","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-rnaseq"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-json\/wp\/v2\/posts\/358","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-json\/wp\/v2\/users\/10"}],"replies":[{"embeddable":true,"href":"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-json\/wp\/v2\/comments?post=358"}],"version-history":[{"count":10,"href":"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-json\/wp\/v2\/posts\/358\/revisions"}],"predecessor-version":[{"id":739,"href":"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-json\/wp\/v2\/posts\/358\/revisions\/739"}],"wp:attachment":[{"href":"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-json\/wp\/v2\/media?parent=358"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-json\/wp\/v2\/categories?post=358"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.meb.ki.se\/sites\/biostatwiki\/wp-json\/wp\/v2\/tags?post=358"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}