> For the complete documentation index, see [llms.txt](https://zhonglab.gitbook.io/3dgenome/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://zhonglab.gitbook.io/3dgenome/chap0-preparation/0.3-rna-seq-data-mapping-and-gene-quantification.md).

# 0.3 RNA-seq Data Mapping & Gene Quantification

### Connect to linux server

Open a terminal and type \
&#x20;`ssh username@ieng6-###.ucsd.edu`

### TOPHAT-CUFFLINK Pipeline

First let's create some target directories with the following commands

```
mkdir geneExpression
cd geneExpression
mkdir alignments 
mkdir fpkm       
mkdir diff
```

Then we can use TOPHAT to align the reads to the genome with the following template command \
&#x20;`tophat -p 1 -G /path/to/genes.gtf -o out/dir path/to/genome/index path/to/reads_R1.fastq path/to/reads_R1.fastq` \
&#x20;The genome index used with TOPHAT should be bowtie2 index files. To align all the fastq files from the example data at the same time, we can create a shell script <br>

```
cd alignments
vi alignment.sh
reads=/home/linux/ieng6/be183f/public/bengTutorial/fastq    # address where the fastq files are stored
genes=/home/linux/ieng6/be183f/public/bengTutorial/index_gtf/genes4.gtf   # this contain information about the positions of genes on the genome
# Tophat need to know where is the reference genome, we'll create soft-links in of the reference genome in the current folder
for file in /home/linux/ieng6/be183f/public/bengTutorial/index_gtf/4*; do
       ln -s $file .
done

tophat -p 1 -G $genes -o C1_R1 4 ${reads}/GSM794483_C1_R1_1.ss.fq ${reads}/GSM794483_C1_R1_2.ss.fq &
tophat -p 1 -G $genes -o C1_R2 4 ${reads}/GSM794484_C1_R2_1.ss.fq ${reads}/GSM794484_C1_R2_2.ss.fq &
tophat -p 1 -G $genes -o C2_R1 4 ${reads}/GSM794486_C2_R1_1.ss.fq ${reads}/GSM794486_C2_R1_2.ss.fq &
tophat -p 1 -G $genes -o C2_R2 4 ${reads}/GSM794487_C2_R2_1.ss.fq ${reads}/GSM794487_C2_R2_2.ss.fq &
```

Next we quit and save the script by typing: `:wq` Then we run the script by typing: `bash alignment.sh` \
&#x20;After the alignment step is finished, we use Cufflink to quantify the gene expressions \
&#x20;A template Cufflink command is like the following \
&#x20;`cufflink -p 1 -G path/to/genes.gtf -o path/to/outdir path/to/accepted_hits.bam` We can also write a shell script to execute the files all at once <br>

```
$ cd fpkm  # get into the fpkm folder
$ vi fpkm.sh
genes=/home/linux/ieng6/be183f/public/bengTutorial/index_gtf/genes4.gtf
alignments=../alignments

for condition in C1 C2; do
for replicate in R1 R2; do
   echo ${condition}_${replicate}
   cufflinks -p 1 -G $genes -o ${condition}_${replicate} ${alignments}/${condition}_${replicate}/accepted_hits.bam
done; done
```

We quit and save the script by typing: `:wq` Then we run the script by typing: `bash fpkm.sh` \
&#x20;Here, genes.fpkm\_tracking and isoforms.fpkm\_tracking contains gene expression values (measured as FPKM) at the gene and transcript levels.

### STAR-Kallisto Pipeline

We can also use STAR to align the reads to the genome. We need to first build index files that is compatible with STAR prior to the alignment step. To build the index, we can run the following template command \
&#x20;`STAR --runMode genomeGenerate --genomeDir path/to/starIndex --genomeFastaFiles path/to/genome.fa` \
&#x20;Then, we'll be ablt to execute the alignment step with the following template command \
&#x20;`STAR --genomeDir path/to/starIndex/ --readFilesIn path/to/read1 path/to/read2 --outFileNamePrefix output/` \
&#x20;After the mapping is finished, the mapping statistics can be viewed as `Log.final.out` and the detailed mapping results can be viewed at `Align.out.sam` \
&#x20;Besides Cufflinks, we can also use Kallisto to quantify the gene expressions directly from the raw fastq files. To do this, we need to build the index for Kallisto first with the following template command \
&#x20;`kallisto index -i path/to/output.index path/to/transcriptome.fa` \
&#x20;With index file built, we are able to quantify the gene expressions with the following template command \
&#x20;`kallisto quant -i path/to/output.index -o path/to/outDic path/to/read1.fastq path/to/read2.fastq` \
&#x20;The results can be viewed at `abundance.tsv` where gene expressions are quantified in terms of TPM values.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://zhonglab.gitbook.io/3dgenome/chap0-preparation/0.3-rna-seq-data-mapping-and-gene-quantification.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
