0.3 RNA-seq Data Mapping & Gene Quantification
Connect to linux server
Open a terminal and type
ssh username@ieng6-###.ucsd.edu
TOPHAT-CUFFLINK Pipeline
First let's create some target directories with the following commands
Then we can use TOPHAT to align the reads to the genome with the following template command
tophat -p 1 -G /path/to/genes.gtf -o out/dir path/to/genome/index path/to/reads_R1.fastq path/to/reads_R1.fastq
The genome index used with TOPHAT should be bowtie2 index files. To align all the fastq files from the example data at the same time, we can create a shell script
Next we quit and save the script by typing: :wq
Then we run the script by typing: bash alignment.sh
After the alignment step is finished, we use Cufflink to quantify the gene expressions
A template Cufflink command is like the following
cufflink -p 1 -G path/to/genes.gtf -o path/to/outdir path/to/accepted_hits.bam
We can also write a shell script to execute the files all at once
We quit and save the script by typing: :wq
Then we run the script by typing: bash fpkm.sh
Here, genes.fpkm_tracking and isoforms.fpkm_tracking contains gene expression values (measured as FPKM) at the gene and transcript levels.
STAR-Kallisto Pipeline
We can also use STAR to align the reads to the genome. We need to first build index files that is compatible with STAR prior to the alignment step. To build the index, we can run the following template command
STAR --runMode genomeGenerate --genomeDir path/to/starIndex --genomeFastaFiles path/to/genome.fa
Then, we'll be ablt to execute the alignment step with the following template command
STAR --genomeDir path/to/starIndex/ --readFilesIn path/to/read1 path/to/read2 --outFileNamePrefix output/
After the mapping is finished, the mapping statistics can be viewed as Log.final.out
and the detailed mapping results can be viewed at Align.out.sam
Besides Cufflinks, we can also use Kallisto to quantify the gene expressions directly from the raw fastq files. To do this, we need to build the index for Kallisto first with the following template command
kallisto index -i path/to/output.index path/to/transcriptome.fa
With index file built, we are able to quantify the gene expressions with the following template command
kallisto quant -i path/to/output.index -o path/to/outDic path/to/read1.fastq path/to/read2.fastq
The results can be viewed at abundance.tsv
where gene expressions are quantified in terms of TPM values.
Last updated