RNASeq Pipeline
RNA-seq data is valuable as it allows the measure of RNA expression levels as a transcriptional readout and the study of RNA structures in order to understand how RNA-based mechanisms impact gene regulation and thus disease and phenotypic variation encodeproject
rRNA databases:
Reference build:
Steps in data processing:
Step | Software/Module | Input | Output |
---|---|---|---|
Assess Data Quality | fastqc | *.fastq.gz files | websummary.html |
Adapter and Quality Trimming of Reads | TrimGalaore! | *.fastq.gz files | Trimmed *.fastq.gz files |
Removal of Ribosomal RNA | SortMeRNA | Trimmed *.fastq.gz files; rRNA databases | Ribosomal RNA removed and trimmed *.fastq.gz files |
Alignment to the Genome | STAR | Ribosomal RNA removed and trimmed *.fastq.gz files; reference build | *.bam files |
Sort and index alignment | SAMTools | *.bam files | Sorted .bam files and .bai files |
Duplicate Read Marking | Picard markDuplicates | Sorted .bam files and .bai files | .markDups.bam files and .markDups.bai files |
Quality control | MultiQC | Output summaries from RSeQC, Qualimap, dupRadar, Preseq, edgeR | websummary.html |
Expression quantification | featureCounts | .markDups.bam files and .markDups.bai files; reference build | *.featureCounts.txt files |
Differential Expression | DESeq2 | *.featureCounts.txt files; sample metadata (names, groups, contrasts) | deseq2.results.txt files and *.deseq2.plots.pdf files |