CCBB BulkRNASeq Pipeline Usage Instructions
Please follow the instructions below to get access to our analysis pipeline and run it on your RNAseq data:
Step1:
Fill out the form available at CCBB website.
Step2:
As you will access our pipelines on Garibaldi, please get your account on Garibaldi by emailing HPC (if you do not have one already).
Step3:
Setting up Workflow, an interface to access CCBB pipelines on Garibaldi:
Allow us to contact the technical support staff to help you with Workflow set up on your computer.
Step4:
On Garibaldi, create a folder with your sample fastqs (e.g. 01sample_S1_R1_001.fastq.gz)
- If you have lane-level fastqs for each sample, please concatenate them such that you have one fastq file per sample for single-end dataset and two fastqs per sample for paired-end dataset.
- SampleID should be unique for each sample with no special characters (space, #, _ etc.)
- Sample fastqs need to be gzipped with extension .fastq.gz or fq.gz.
Step5:
Now you are ready to run the CCBB bulk RNASeq analysis!
Follow the instructions below to begin:
Open Workflow on your browser (Refer to the image at the end of this document for details)
- Click on “Experiments”
- To select a workflow click on “My WorkFlows” (on the top right corner), then click on “New Experiment”
- Click on “CCBB_BULKRNASeq_STEP1_WF”
- Choose Garibaldi from the dropdown menu for “Execution profile:” (top of the page)
The first workflow “CCBB_BULKRNASeq_STEP1_WF” is used to create the samplesheet which is needed to run the RNASeq pipeline. This needs two inputs as defined below.\
- Click on the “Input Data” tab and (a) type in the full path to your folder on Garibaldi with sample fastqs (Step4 above), (b) type in the full path to where you want the samplesheet.csv to be generated (this can be same path as for the fastqs), then click the button
to run “CCBB_BULKRNASeq_STEP1_WF”
Upon completion, check to see the samplesheet.csv file on Garibaldi at the location you specified.
Open Workflow on your browser
- Click on “Experiments”
- To select a workflow click on “My WorkFlows” (on the top right corner), then click on “New Experiment”
- Click on “CCBB_BULKRNASeq_STEP2_WF”
- Choose Garibaldi from the dropdown menu for “Execution profile:” (top of the page)
The second workflow “CCBB_BULKRNASeq_STEP2_WF” is used to run the RNASeq pipeline. This needs three inputs as defined below.
- Click on the “Input Data” tab and (a) type in the full path to your folder on Garibaldi with samplesheet.csv, (b) type in the reference genome code for your samples (e.g. HU0 (zero not the letter O) for human, MM0 for mouse, RN0 for rat and CE0 for C.elegans), (c ) type in the full path to where you want the results of the bulk RNASeq analysis to be generated on Garibaldi.
IMPORTANT NOTE on genomes:
Currently the pipeline will work only for Human, Mouse, Rat and C.elegans.
If you have any other custom species please contact CCBB
Follow the details to understand the pipeline outcome for your data.
IMPORTANT NOTE on library strandedness:
In your browser, review the following summary file (generated by the pipeline in your work directory): results/multiqc/star_salmon/multiqc_report.html
Click “Sample status checks” on the left hand side menu of this multiqc_report.html file. If there are “fail” flags, it indicates discrepancies in library strandedness assignments for those samples between what is inferred from RSeQC and Salmon methods, you will need to manually edit your samplesheet.csv file and change from “auto” status (in the last column) to the correct strandedness (must be one of “unstranded”, “forward”, “reverse”) for the samples as inferred by RSeQC method. Then re-run the pipeline following instructions above (in STEP5) for the second workflow “CCBB_BULKRNASeq_STEP2_WF” to run the RNASeq pipeline.
Step6:
Cleanup After the successful completion of the RNASeq pipeline, please remove the “work” sub-folder (that holds the intermediate files/folders generated by the pipeline) in your workflow setup folder on Garibaldi.
DISCLAIMER:
CCBB is providing pipelines on Garibaldi, the institute’s shared linux cluster, on “as is” and “as available” terms, solely for the benefit of the scientific community at Scripps Research.
CCBB reserves the right to continue/discontinue any of its pipelines on Garibaldi.