Open In Colab

Command-line Download

This notebook demonstrates how to use pysradb from the command line to download SRA data.

[ ]:
# Install pysradb if not already installed
try:
    import pysradb

    print(f"pysradb {pysradb.__version__} is already installed")
except ImportError:
    print("Installing pysradb from GitHub...")
    import sys

    !{sys.executable} -m pip install -q git+https://github.com/saketkc/pysradb
    print("pysradb installed successfully!")
[1]:
# pip install -U pysradb
[2]:
!pysradb --version
pysradb 2.4.1

Get metadata for SRX (SRRs/SRS etc)

[3]:
!pysradb srx-to-srr SRX4720625
experiment_accession    run_accession   study_accession study_title     experiment_title        experiment_desc organism_taxid  organism_name   library_name    library_strategy        library_source  library_selection       library_layout  sample_accession        sample_title    biosample       bioproject      instrument      instrument_model        instrument_model_desc   total_spots     total_size      run_total_spots run_total_bases
SRX4720625      SRR7882015      SRP162234       Transcriptomic profile of zebrafish cardiomyocytes throughout heart development GSM3396533: wt_GFPpos_24hpf_rep1; Danio rerio; RNA-Seq  GSM3396533: wt_GFPpos_24hpf_rep1; Danio rerio; RNA-Seq  7955    Danio rerio             RNA-Seq TRANSCRIPTOMIC  cDNA    PAIRED  SRS3805811              SAMN10095723    PRJNA492280     NextSeq 500     NextSeq 500     ILLUMINA        47867961        3470385670      47867961        7230485009

Get detailed metadata

[4]:
!pysradb srx-to-srr SRX4720625 --detailed
experiment_accession    run_accession   study_accession study_title     experiment_title        experiment_desc organism_taxid  organism_name   library_name    library_strategy        library_source  library_selection       library_layout  sample_accession        sample_title    biosample       bioproject      instrument      instrument_model        instrument_model_desc   total_spots     total_size      run_total_spots run_total_bases run_alias       public_filename public_size     public_date     public_md5      public_version  public_semantic_name    public_supertype        public_sratoolkit       aws_url aws_free_egress aws_access_type public_url      ncbi_url        ncbi_free_egress        ncbi_access_type        gcp_url gcp_free_egress gcp_access_type experiment_alias        source_name     tissue  developmental stage     gfp status      genetic background      ena_fastq_http  ena_fastq_http_1        ena_fastq_http_2        ena_fastq_ftp   ena_fastq_ftp_1 ena_fastq_ftp_2
SRX4720625      SRR7882015      SRP162234       Transcriptomic profile of zebrafish cardiomyocytes throughout heart development GSM3396533: wt_GFPpos_24hpf_rep1; Danio rerio; RNA-Seq  GSM3396533: wt_GFPpos_24hpf_rep1; Danio rerio; RNA-Seq  7955    Danio rerio     <NA>      RNA-Seq TRANSCRIPTOMIC  cDNA    PAIRED  SRS3805811      <NA>      SAMN10095723    PRJNA492280     NextSeq 500     NextSeq 500     ILLUMINA        47867961        3470385670      47867961        7230485009      GSM3396533_r1   SRR7882015.sralite      1881003321      2020-06-14 12:02:25     8161154ca4e9cf674e3f0e4af74c8455        1       SRA Lite        Primary ETL     1       s3://sra-pub-zq-8/SRR7882015/SRR7882015.sralite.1       s3.us-east-1    aws identity    https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882015/SRR7882015.sralite.1      https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882015/SRR7882015.sralite.1      worldwide       anonymous       gs://sra-pub-zq-107/SRR7882015/SRR7882015.zq.1  gs.us-east1     gcp identity    GSM3396533      FACS-sorted embryo cells        FACS-sorted embryo cells        24 hpf  GFP positive    wild type       <NA>      http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR788/005/SRR7882015/SRR7882015_1.fastq.gz http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR788/005/SRR7882015/SRR7882015_2.fastq.gz <NA>      era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR788/005/SRR7882015/SRR7882015_1.fastq.gz      era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR788/005/SRR7882015/SRR7882015_2.fastq.gz

Download all runs for a particular experiment

[5]:
!pysradb srx-to-srr SRX4720625 --detailed | pysradb download
Checking download URLs
The following files will be downloaded:

experiment_accession run_accession study_accession public_url                                                                                                 download_url                                                                                          out_dir                                          filesize
SRX4720625           SRR7882015    SRP162234       https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882015/SRR7882015.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882015/SRR7882015.sra /data/github/pysradb/notebooks/pysradb_downloads 1.9 GB


Total size: 1.9 GB


  0%|                                                     | 0/1 [00:00<?, ?it/s]^C
  0%|                                                     | 0/1 [03:01<?, ?it/s]

Get metadata for entire project

[ ]:
!pysradb metadata SRP162234 --detailed

Download an entire project!

[7]:
!pysradb download -p SRP162234
Checking download URLs
The following files will be downloaded:

run_accession study_accession experiment_accession public_url                                                                                                 download_url                                                                                          out_dir                                          filesize
SRR7882014    SRP162234       SRX4720624           https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882014/SRR7882014.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882014/SRR7882014.sra /data/github/pysradb/notebooks/pysradb_downloads 843.8 MB
SRR7882015    SRP162234       SRX4720625           https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882015/SRR7882015.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882015/SRR7882015.sra /data/github/pysradb/notebooks/pysradb_downloads   1.9 GB
SRR7882016    SRP162234       SRX4720626           https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882016/SRR7882016.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882016/SRR7882016.sra /data/github/pysradb/notebooks/pysradb_downloads   1.8 GB
SRR7882017    SRP162234       SRX4720627                     https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos9/sra-pub-zq-922/SRR007/882/SRR7882017.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882017/SRR7882017.sra /data/github/pysradb/notebooks/pysradb_downloads 991.8 MB
SRR7882018    SRP162234       SRX4720628           https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882018/SRR7882018.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882018/SRR7882018.sra /data/github/pysradb/notebooks/pysradb_downloads   2.7 GB
SRR7882019    SRP162234       SRX4720629           https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882019/SRR7882019.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882019/SRR7882019.sra /data/github/pysradb/notebooks/pysradb_downloads   2.9 GB
SRR7882020    SRP162234       SRX4720630                      https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-14/SRR007/882/SRR7882020.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882020/SRR7882020.sra /data/github/pysradb/notebooks/pysradb_downloads 693.3 MB
SRR7882021    SRP162234       SRX4720631           https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882021/SRR7882021.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882021/SRR7882021.sra /data/github/pysradb/notebooks/pysradb_downloads   2.5 GB
SRR7882022    SRP162234       SRX4720632           https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882022/SRR7882022.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882022/SRR7882022.sra /data/github/pysradb/notebooks/pysradb_downloads   2.6 GB
SRR7882023    SRP162234       SRX4720633                     https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos9/sra-pub-zq-922/SRR007/882/SRR7882023.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882023/SRR7882023.sra /data/github/pysradb/notebooks/pysradb_downloads   1.1 GB
SRR7882024    SRP162234       SRX4720634           https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882024/SRR7882024.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882024/SRR7882024.sra /data/github/pysradb/notebooks/pysradb_downloads   2.2 GB
SRR7882025    SRP162234       SRX4720635           https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882025/SRR7882025.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882025/SRR7882025.sra /data/github/pysradb/notebooks/pysradb_downloads   2.4 GB
SRR7882026    SRP162234       SRX4720636           https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882026/SRR7882026.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882026/SRR7882026.sra /data/github/pysradb/notebooks/pysradb_downloads   1.9 GB
SRR7882027    SRP162234       SRX4720637           https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882027/SRR7882027.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882027/SRR7882027.sra /data/github/pysradb/notebooks/pysradb_downloads   3.8 GB
SRR7882028    SRP162234       SRX4720638           https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882028/SRR7882028.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882028/SRR7882028.sra /data/github/pysradb/notebooks/pysradb_downloads   2.5 GB
SRR7882029    SRP162234       SRX4720639           https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882029/SRR7882029.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882029/SRR7882029.sra /data/github/pysradb/notebooks/pysradb_downloads   1.2 GB
SRR7882030    SRP162234       SRX4720640           https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882030/SRR7882030.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882030/SRR7882030.sra /data/github/pysradb/notebooks/pysradb_downloads   2.5 GB
SRR7882031    SRP162234       SRX4720641           https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882031/SRR7882031.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882031/SRR7882031.sra /data/github/pysradb/notebooks/pysradb_downloads   3.1 GB


Total size: 37.5 GB




Start download?  [Y/n]: ^C
Traceback (most recent call last):
  File "/home/saket/github/2025_iGEM/localcolabfold/colabfold-conda/bin/pysradb", line 7, in <module>
    sys.exit(parse_args())
  File "/data/github/pysradb/pysradb/cli.py", line 1215, in parse_args
    download(
  File "/data/github/pysradb/pysradb/cli.py", line 111, in download
    sradb.download(
  File "/data/github/pysradb/pysradb/sradb.py", line 1543, in download
    if not confirm("Start download? "):
  File "/data/github/pysradb/pysradb/utils.py", line 269, in confirm
    choice = input("{} [Y/n]: ".format(preceeding_text)).lower()
KeyboardInterrupt