Example Workflow ---------------- .. _`Line4 snakemake/jobscript.sh`: https://github.com/saketkc/riboraptor/blob/47c8a50753c2bcc96b57d43b525a47bb8fde2d04/snakemake/jobscript.sh#L4 .. _`Line6 snakemake/cluster.yaml`: https://github.com/saketkc/riboraptor/blob/47c8a50753c2bcc96b57d43b525a47bb8fde2d04/snakemake/cluster.yaml#L6 .. _`Line7 snakemake/cluster.yaml`: https://github.com/saketkc/riboraptor/blob/47c8a50753c2bcc96b57d43b525a47bb8fde2d04/snakemake/cluster.yaml#L7 .. _`GSE37744`: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE37744 .. _`GSE13750`: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE13750 .. _`both`: https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRP000637 We will be working with the first published Ribo-seq dataset `GSE13750`_ from Ingolia et al. (2009) which has samples for `both`_ mRNA-seq and Ribo-seq from **Yeast** grown in starved and nutrient rich media. At this point, we assume you have already completed all the steps under "Installing dependencies" section of the README. Step 1: Downloading datasets ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We will download all SRA files corresponding to GSE13750. .. code-block:: bash cd riboraptor download_sra_data --sradb=../riboraptor-data/SRAmetadb.sqlite \ --geodb=../riboraptor-data/GEOmetadb.sqlite GSE13750 GEO IDs are automatatiicaly converted to corresponding SRP IDs. GSE13750 corresponds to SRP000637. There are 6 experiments in total (SRX003184-SRX003191), but we will be working with only two: `SRX003187`and `SRX003191` one of which is mRNA-seq while other is Ribo-seq. (We will figure out which is which later.) You can delete are SRX directories except the above two. We will be using `sacCerR64` as our reference. We will now use Snakemake to run all the downstream steps. Here is what the overall workflow looks like: .. figure:: images/dag.svg :scale: 30% Step 2: Copy template ~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: bash cd snakemake cp configs/SRP000637.py.sample configs/SRP000637.py Edit the paths inside `SRP000637.py` to point to your RAW data, GTF and BED files. BED files for a lot of assemblies are inbuilt into riboraptor. In most cases, those should suffice. However, you stil need to provide paths to fasta, chromosome sizes and the GTF. The BED files are created from a particular version of the GTF, so if you are using your own GTF, you should ideally be using your own BED files too. An example of a config would be: .. code-block:: python ## Path to SRP directory RAWDATA_DIR = '/staging/as/skchoudh/SRA_datasets/SRP000637' ## Output directory (will be created if does not exist) OUT_DIR = '/staging/as/skchoudh/riboraptor-analysis/SRP000637' ## Genome fasta location GENOME_FASTA = '/home/cmb-06/as/skchoudh/genomes/sacCerR64/fasta/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa' ## Chromosome sizes location CHROM_SIZES = '/home/cmb-06/as/skchoudh/genomes/sacCerR64/fasta/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.sizes' ## Path to STAR index (will be generated if does not exist) STAR_INDEX = '/home/cmb-06/as/skchoudh/genomes/sacCerR64/star_annotated' ## GTF path GTF = '/home/cmb-06/as/skchoudh/genomes/sacCerR64/annotation/Saccharomyces_cerevisiae.R64-1-1.91.gtf' ## Path to bed file containing Intron coordinates INTRON_BED = '/home/cmb-panasas2/skchoudh/github_projects/riboraptor/riboraptor/annotation/sacCerR64/intron.bed' ## Path to bed file containing CDS coordinates CDS_BED = '/home/cmb-panasas2/skchoudh/github_projects/riboraptor/riboraptor/annotation/sacCerR64/cds.bed' ## Path to bed file containing 5'UTR coordinates UTR5_BED = '/home/cmb-panasas2/skchoudh/github_projects/riboraptor/riboraptor/annotation/sacCerR64/utr5.bed' ## Path to bed file containing 3'UTR coordinates UTR3_BED = '/home/cmb-panasas2/skchoudh/github_projects/riboraptor/riboraptor/annotation/sacCerR64/utr3.bed' Step 3 : Change your miniconda path ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Edit `Line4 snakemake/jobscript.sh`_ pointing to your conda root directory. An example path would be: .. code-block:: bash export PATH="/home/cmb-panasas2/wenzhenl/miniconda3/bin:$PATH" Step 4: Edit snakemake/cluster.yaml ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Edit `Line6 snakemake/cluster.yaml`_ and `Line7 snakemake/cluster.yaml`_ to point to your log directory error log file. An example path would be: .. code-block:: yaml logout: '/home/cmb-06/as/skchoudh/logs/{rule}.{wildcards}.out' logerror: '/home/cmb-06/as/skchoudh/logs/{rule}.{wildcards}.err' You would want to just edit the directory path leading to `/home/cmb-06/as/skchoudh/logs/` and leave the rest as it is. Step 5: Submit job ~~~~~~~~~~~~~~~~~~ .. code-block:: bash bash submitall.sh SRP000637 The `submitall.sh` looks for a file named `SRP000637.py` in the configs directory, so make sure `SRP000637.py` exists inside `configs/` directory. Visualizing Results ~~~~~~~~~~~~~~~~~~~~ When the entire pipeline as run, it will create an html file `riboraptor_report.html` as output. You can copy it locally to visualize metagene profiles and read length distribution for all samples.