Command-line Download¶
This notebook demonstrates how to use pysradb from the command line to download SRA data.
[ ]:
# Install pysradb if not already installed
try:
import pysradb
print(f"pysradb {pysradb.__version__} is already installed")
except ImportError:
print("Installing pysradb from GitHub...")
import sys
!{sys.executable} -m pip install -q git+https://github.com/saketkc/pysradb
print("pysradb installed successfully!")
[1]:
# pip install -U pysradb
[2]:
!pysradb --version
pysradb 2.4.1
Get metadata for SRX (SRRs/SRS etc)¶
[3]:
!pysradb srx-to-srr SRX4720625
experiment_accession run_accession study_accession study_title experiment_title experiment_desc organism_taxid organism_name library_name library_strategy library_source library_selection library_layout sample_accession sample_title biosample bioproject instrument instrument_model instrument_model_desc total_spots total_size run_total_spots run_total_bases
SRX4720625 SRR7882015 SRP162234 Transcriptomic profile of zebrafish cardiomyocytes throughout heart development GSM3396533: wt_GFPpos_24hpf_rep1; Danio rerio; RNA-Seq GSM3396533: wt_GFPpos_24hpf_rep1; Danio rerio; RNA-Seq 7955 Danio rerio RNA-Seq TRANSCRIPTOMIC cDNA PAIRED SRS3805811 SAMN10095723 PRJNA492280 NextSeq 500 NextSeq 500 ILLUMINA 47867961 3470385670 47867961 7230485009
Get detailed metadata¶
[4]:
!pysradb srx-to-srr SRX4720625 --detailed
experiment_accession run_accession study_accession study_title experiment_title experiment_desc organism_taxid organism_name library_name library_strategy library_source library_selection library_layout sample_accession sample_title biosample bioproject instrument instrument_model instrument_model_desc total_spots total_size run_total_spots run_total_bases run_alias public_filename public_size public_date public_md5 public_version public_semantic_name public_supertype public_sratoolkit aws_url aws_free_egress aws_access_type public_url ncbi_url ncbi_free_egress ncbi_access_type gcp_url gcp_free_egress gcp_access_type experiment_alias source_name tissue developmental stage gfp status genetic background ena_fastq_http ena_fastq_http_1 ena_fastq_http_2 ena_fastq_ftp ena_fastq_ftp_1 ena_fastq_ftp_2
SRX4720625 SRR7882015 SRP162234 Transcriptomic profile of zebrafish cardiomyocytes throughout heart development GSM3396533: wt_GFPpos_24hpf_rep1; Danio rerio; RNA-Seq GSM3396533: wt_GFPpos_24hpf_rep1; Danio rerio; RNA-Seq 7955 Danio rerio <NA> RNA-Seq TRANSCRIPTOMIC cDNA PAIRED SRS3805811 <NA> SAMN10095723 PRJNA492280 NextSeq 500 NextSeq 500 ILLUMINA 47867961 3470385670 47867961 7230485009 GSM3396533_r1 SRR7882015.sralite 1881003321 2020-06-14 12:02:25 8161154ca4e9cf674e3f0e4af74c8455 1 SRA Lite Primary ETL 1 s3://sra-pub-zq-8/SRR7882015/SRR7882015.sralite.1 s3.us-east-1 aws identity https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882015/SRR7882015.sralite.1 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882015/SRR7882015.sralite.1 worldwide anonymous gs://sra-pub-zq-107/SRR7882015/SRR7882015.zq.1 gs.us-east1 gcp identity GSM3396533 FACS-sorted embryo cells FACS-sorted embryo cells 24 hpf GFP positive wild type <NA> http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR788/005/SRR7882015/SRR7882015_1.fastq.gz http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR788/005/SRR7882015/SRR7882015_2.fastq.gz <NA> era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR788/005/SRR7882015/SRR7882015_1.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR788/005/SRR7882015/SRR7882015_2.fastq.gz
Download all runs for a particular experiment¶
[5]:
!pysradb srx-to-srr SRX4720625 --detailed | pysradb download
Checking download URLs
The following files will be downloaded:
experiment_accession run_accession study_accession public_url download_url out_dir filesize
SRX4720625 SRR7882015 SRP162234 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882015/SRR7882015.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882015/SRR7882015.sra /data/github/pysradb/notebooks/pysradb_downloads 1.9 GB
Total size: 1.9 GB
0%| | 0/1 [00:00<?, ?it/s]^C
0%| | 0/1 [03:01<?, ?it/s]
Get metadata for entire project¶
[ ]:
!pysradb metadata SRP162234 --detailed
Download an entire project!¶
[7]:
!pysradb download -p SRP162234
Checking download URLs
The following files will be downloaded:
run_accession study_accession experiment_accession public_url download_url out_dir filesize
SRR7882014 SRP162234 SRX4720624 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882014/SRR7882014.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882014/SRR7882014.sra /data/github/pysradb/notebooks/pysradb_downloads 843.8 MB
SRR7882015 SRP162234 SRX4720625 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882015/SRR7882015.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882015/SRR7882015.sra /data/github/pysradb/notebooks/pysradb_downloads 1.9 GB
SRR7882016 SRP162234 SRX4720626 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882016/SRR7882016.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882016/SRR7882016.sra /data/github/pysradb/notebooks/pysradb_downloads 1.8 GB
SRR7882017 SRP162234 SRX4720627 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos9/sra-pub-zq-922/SRR007/882/SRR7882017.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882017/SRR7882017.sra /data/github/pysradb/notebooks/pysradb_downloads 991.8 MB
SRR7882018 SRP162234 SRX4720628 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882018/SRR7882018.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882018/SRR7882018.sra /data/github/pysradb/notebooks/pysradb_downloads 2.7 GB
SRR7882019 SRP162234 SRX4720629 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882019/SRR7882019.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882019/SRR7882019.sra /data/github/pysradb/notebooks/pysradb_downloads 2.9 GB
SRR7882020 SRP162234 SRX4720630 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-14/SRR007/882/SRR7882020.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882020/SRR7882020.sra /data/github/pysradb/notebooks/pysradb_downloads 693.3 MB
SRR7882021 SRP162234 SRX4720631 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882021/SRR7882021.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882021/SRR7882021.sra /data/github/pysradb/notebooks/pysradb_downloads 2.5 GB
SRR7882022 SRP162234 SRX4720632 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882022/SRR7882022.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882022/SRR7882022.sra /data/github/pysradb/notebooks/pysradb_downloads 2.6 GB
SRR7882023 SRP162234 SRX4720633 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos9/sra-pub-zq-922/SRR007/882/SRR7882023.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882023/SRR7882023.sra /data/github/pysradb/notebooks/pysradb_downloads 1.1 GB
SRR7882024 SRP162234 SRX4720634 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882024/SRR7882024.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882024/SRR7882024.sra /data/github/pysradb/notebooks/pysradb_downloads 2.2 GB
SRR7882025 SRP162234 SRX4720635 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882025/SRR7882025.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882025/SRR7882025.sra /data/github/pysradb/notebooks/pysradb_downloads 2.4 GB
SRR7882026 SRP162234 SRX4720636 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882026/SRR7882026.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882026/SRR7882026.sra /data/github/pysradb/notebooks/pysradb_downloads 1.9 GB
SRR7882027 SRP162234 SRX4720637 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882027/SRR7882027.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882027/SRR7882027.sra /data/github/pysradb/notebooks/pysradb_downloads 3.8 GB
SRR7882028 SRP162234 SRX4720638 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882028/SRR7882028.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882028/SRR7882028.sra /data/github/pysradb/notebooks/pysradb_downloads 2.5 GB
SRR7882029 SRP162234 SRX4720639 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882029/SRR7882029.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882029/SRR7882029.sra /data/github/pysradb/notebooks/pysradb_downloads 1.2 GB
SRR7882030 SRP162234 SRX4720640 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882030/SRR7882030.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882030/SRR7882030.sra /data/github/pysradb/notebooks/pysradb_downloads 2.5 GB
SRR7882031 SRP162234 SRX4720641 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR007/882/SRR7882031/SRR7882031.sralite.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR788/SRR7882031/SRR7882031.sra /data/github/pysradb/notebooks/pysradb_downloads 3.1 GB
Total size: 37.5 GB
Start download? [Y/n]: ^C
Traceback (most recent call last):
File "/home/saket/github/2025_iGEM/localcolabfold/colabfold-conda/bin/pysradb", line 7, in <module>
sys.exit(parse_args())
File "/data/github/pysradb/pysradb/cli.py", line 1215, in parse_args
download(
File "/data/github/pysradb/pysradb/cli.py", line 111, in download
sradb.download(
File "/data/github/pysradb/pysradb/sradb.py", line 1543, in download
if not confirm("Start download? "):
File "/data/github/pysradb/pysradb/utils.py", line 269, in confirm
choice = input("{} [Y/n]: ".format(preceeding_text)).lower()
KeyboardInterrupt