Open In Colab

Python API Demo

This notebook demonstrates the core functionality of pysradb Python API for querying SRA metadata.

[1]:
# Install pysradb if not already installed
try:
    import pysradb

    print(f"pysradb {pysradb.__version__} is already installed")
except ImportError:
    print("Installing pysradb from GitHub...")
    import sys

    !{sys.executable} -m pip install -q git+https://github.com/saketkc/pysradb
    print("pysradb installed successfully!")
pysradb 3.0.0.dev0 is already installed
/home/runner/work/pysradb/pysradb/pysradb/download.py:15: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)
  from tqdm.autonotebook import tqdm
[2]:
# pip install git+https://github.com/saketkc/pysradb
[3]:
!pysradb --version
pysradb 3.0.0.dev0
[4]:
from pysradb.sraweb import SRAweb
[5]:
db = SRAweb()

Get metadata of one project

[6]:
df = db.sra_metadata("SRP016501")
df
[6]:
study_accession study_title experiment_accession experiment_title experiment_desc organism_taxid organism_name library_name library_strategy library_source ... biosample bioproject instrument instrument_model instrument_model_desc total_spots total_size run_accession run_total_spots run_total_bases
133 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196264 GSM1020640: mouse_a_brain; Mus musculus; RNA-Seq GSM1020640: mouse_a_brain; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC ... SAMN01766814 PRJNA177791 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 87264604 5927043102 SRR594393 87264604 8726460400
132 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196265 GSM1020641: mouse_a_colon; Mus musculus; RNA-Seq GSM1020641: mouse_a_colon; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC ... SAMN01766815 PRJNA177791 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 101816491 6835402318 SRR594394 101816491 10181649100
131 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196266 GSM1020642: mouse_a_heart; Mus musculus; RNA-Seq GSM1020642: mouse_a_heart; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC ... SAMN01766816 PRJNA177791 Illumina Genome Analyzer IIx Illumina Genome Analyzer IIx ILLUMINA 35175982 1502674440 SRR594395 35175982 2532670704
130 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196267 GSM1020643: mouse_a_kidney; Mus musculus; RNA-Seq GSM1020643: mouse_a_kidney; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC ... SAMN01766817 PRJNA177791 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 119274786 7555854784 SRR594396 119274786 11927478600
129 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196268 GSM1020644: mouse_a_liver; Mus musculus; RNA-Seq GSM1020644: mouse_a_liver; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC ... SAMN01766818 PRJNA177791 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 116292478 7481554926 SRR594397 116292478 11629247800
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196393 GSM1020769: chicken_c_liver; Gallus gallus; RN... GSM1020769: chicken_c_liver; Gallus gallus; RN... 9031 Gallus gallus RNA-Seq TRANSCRIPTOMIC ... SAMN01766943 PRJNA177791 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 18978066 562367072 SRR594522 18978066 1366420752
3 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196394 GSM1020770: chicken_c_lung; Gallus gallus; RNA... GSM1020770: chicken_c_lung; Gallus gallus; RNA... 9031 Gallus gallus RNA-Seq TRANSCRIPTOMIC ... SAMN01766944 PRJNA177791 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 26604280 931417024 SRR594523 26604280 1862299600
2 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196395 GSM1020771: chicken_c_skm; Gallus gallus; RNA-Seq GSM1020771: chicken_c_skm; Gallus gallus; RNA-Seq 9031 Gallus gallus RNA-Seq TRANSCRIPTOMIC ... SAMN01766945 PRJNA177791 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 25606436 986287075 SRR594524 25606436 1792450520
1 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196396 GSM1020772: chicken_c_spleen; Gallus gallus; R... GSM1020772: chicken_c_spleen; Gallus gallus; R... 9031 Gallus gallus RNA-Seq TRANSCRIPTOMIC ... SAMN01766946 PRJNA177791 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 24401708 1201671888 SRR594525 24401708 1756922976
0 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196397 GSM1020773: chicken_c_testes; Gallus gallus; R... GSM1020773: chicken_c_testes; Gallus gallus; R... 9031 Gallus gallus RNA-Seq TRANSCRIPTOMIC ... SAMN01766947 PRJNA177791 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 37423394 1980545796 SRR594526 37423394 2993871520

134 rows × 24 columns

Get detailed metadata

[7]:
df = db.sra_metadata("SRP016501", detailed=True)
df
[7]:
run_accession study_accession study_title experiment_accession experiment_title experiment_desc organism_taxid organism_name library_name library_strategy ... tissue strain ena_fastq_http ena_fastq_http_1 ena_fastq_http_2 ena_fastq_ftp ena_fastq_ftp_1 ena_fastq_ftp_2 study_geo_accession experiment_geo_accession
0 SRR594393 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196264 GSM1020640: mouse_a_brain; Mus musculus; RNA-Seq GSM1020640: mouse_a_brain; Mus musculus; RNA-Seq 10090 Mus musculus NaN RNA-Seq ... brain DBA/2J <NA> http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... <NA> era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... GSE41637 GSM1020640
1 SRR594394 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196265 GSM1020641: mouse_a_colon; Mus musculus; RNA-Seq GSM1020641: mouse_a_colon; Mus musculus; RNA-Seq 10090 Mus musculus NaN RNA-Seq ... colon DBA/2J <NA> http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... <NA> era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... GSE41637 GSM1020641
2 SRR594395 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196266 GSM1020642: mouse_a_heart; Mus musculus; RNA-Seq GSM1020642: mouse_a_heart; Mus musculus; RNA-Seq 10090 Mus musculus NaN RNA-Seq ... heart DBA/2J <NA> http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... <NA> era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... GSE41637 GSM1020642
3 SRR594396 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196267 GSM1020643: mouse_a_kidney; Mus musculus; RNA-Seq GSM1020643: mouse_a_kidney; Mus musculus; RNA-Seq 10090 Mus musculus NaN RNA-Seq ... kidney DBA/2J <NA> http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... <NA> era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... GSE41637 GSM1020643
4 SRR594397 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196268 GSM1020644: mouse_a_liver; Mus musculus; RNA-Seq GSM1020644: mouse_a_liver; Mus musculus; RNA-Seq 10090 Mus musculus NaN RNA-Seq ... liver DBA/2J <NA> http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... <NA> era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... GSE41637 GSM1020644
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
129 SRR594522 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196393 GSM1020769: chicken_c_liver; Gallus gallus; RN... GSM1020769: chicken_c_liver; Gallus gallus; RN... 9031 Gallus gallus NaN RNA-Seq ... liver NaN <NA> http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... <NA> era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... GSE41637 GSM1020769
130 SRR594523 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196394 GSM1020770: chicken_c_lung; Gallus gallus; RNA... GSM1020770: chicken_c_lung; Gallus gallus; RNA... 9031 Gallus gallus NaN RNA-Seq ... lung NaN <NA> http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... <NA> era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... GSE41637 GSM1020770
131 SRR594524 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196395 GSM1020771: chicken_c_skm; Gallus gallus; RNA-Seq GSM1020771: chicken_c_skm; Gallus gallus; RNA-Seq 9031 Gallus gallus NaN RNA-Seq ... skeletal muscle NaN <NA> http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... <NA> era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... GSE41637 GSM1020771
132 SRR594525 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196396 GSM1020772: chicken_c_spleen; Gallus gallus; R... GSM1020772: chicken_c_spleen; Gallus gallus; R... 9031 Gallus gallus NaN RNA-Seq ... spleen NaN <NA> http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... <NA> era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... GSE41637 GSM1020772
133 SRR594526 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196397 GSM1020773: chicken_c_testes; Gallus gallus; R... GSM1020773: chicken_c_testes; Gallus gallus; R... 9031 Gallus gallus NaN RNA-Seq ... testes NaN <NA> http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... <NA> era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... GSE41637 GSM1020773

134 rows × 55 columns

Get metadata of multiple projects

[8]:
df = db.sra_metadata(["SRP016501", "SRP098789"])
df
[8]:
study_accession study_title experiment_accession experiment_title experiment_desc organism_taxid organism_name library_name library_strategy library_source ... biosample bioproject instrument instrument_model instrument_model_desc total_spots total_size run_accession run_total_spots run_total_bases
25 SRP098789 Selective stalling of human translation throug... SRX2536403 GSM2475997: 1.5 µM PF-067446846, 10 min, rep 1... GSM2475997: 1.5 µM PF-067446846, 10 min, rep 1... 9606 Homo sapiens OTHER TRANSCRIPTOMIC ... SAMN06293487 PRJNA369742 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 42082855 916745706 SRR5227288 42082855 2104142750
24 SRP098789 Selective stalling of human translation throug... SRX2536404 GSM2475998: 1.5 µM PF-067446846, 10 min, rep 2... GSM2475998: 1.5 µM PF-067446846, 10 min, rep 2... 9606 Homo sapiens OTHER TRANSCRIPTOMIC ... SAMN06293486 PRJNA369742 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 41657461 1360366732 SRR5227289 41657461 2082873050
23 SRP098789 Selective stalling of human translation throug... SRX2536405 GSM2475999: 1.5 µM PF-067446846, 10 min, rep 3... GSM2475999: 1.5 µM PF-067446846, 10 min, rep 3... 9606 Homo sapiens OTHER TRANSCRIPTOMIC ... SAMN06293485 PRJNA369742 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 40462973 1287284933 SRR5227290 40462973 2023148650
22 SRP098789 Selective stalling of human translation throug... SRX2536406 GSM2476000: 0.3 µM PF-067446846, 10 min, rep 1... GSM2476000: 0.3 µM PF-067446846, 10 min, rep 1... 9606 Homo sapiens OTHER TRANSCRIPTOMIC ... SAMN06293484 PRJNA369742 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 41143319 843881081 SRR5227291 41143319 2057165950
21 SRP098789 Selective stalling of human translation throug... SRX2536407 GSM2476001: 0.3 µM PF-067446846, 10 min, rep 2... GSM2476001: 0.3 µM PF-067446846, 10 min, rep 2... 9606 Homo sapiens OTHER TRANSCRIPTOMIC ... SAMN06293483 PRJNA369742 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 60552437 1875910244 SRR5227292 60552437 3027621850
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
30 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196393 GSM1020769: chicken_c_liver; Gallus gallus; RN... GSM1020769: chicken_c_liver; Gallus gallus; RN... 9031 Gallus gallus RNA-Seq TRANSCRIPTOMIC ... SAMN01766943 PRJNA177791 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 18978066 562367072 SRR594522 18978066 1366420752
29 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196394 GSM1020770: chicken_c_lung; Gallus gallus; RNA... GSM1020770: chicken_c_lung; Gallus gallus; RNA... 9031 Gallus gallus RNA-Seq TRANSCRIPTOMIC ... SAMN01766944 PRJNA177791 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 26604280 931417024 SRR594523 26604280 1862299600
28 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196395 GSM1020771: chicken_c_skm; Gallus gallus; RNA-Seq GSM1020771: chicken_c_skm; Gallus gallus; RNA-Seq 9031 Gallus gallus RNA-Seq TRANSCRIPTOMIC ... SAMN01766945 PRJNA177791 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 25606436 986287075 SRR594524 25606436 1792450520
27 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196396 GSM1020772: chicken_c_spleen; Gallus gallus; R... GSM1020772: chicken_c_spleen; Gallus gallus; R... 9031 Gallus gallus RNA-Seq TRANSCRIPTOMIC ... SAMN01766946 PRJNA177791 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 24401708 1201671888 SRR594525 24401708 1756922976
26 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196397 GSM1020773: chicken_c_testes; Gallus gallus; R... GSM1020773: chicken_c_testes; Gallus gallus; R... 9031 Gallus gallus RNA-Seq TRANSCRIPTOMIC ... SAMN01766947 PRJNA177791 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 37423394 1980545796 SRR594526 37423394 2993871520

160 rows × 24 columns

Get metadata of a Run

[9]:
df = db.sra_metadata("SRR11085797", detailed=True)
df
No results found for ['SRP249482'] | Obtained result: {}
No results found for ['SRX7724752'] | Obtained result: {}
[9]:
run_accession study_accession study_title experiment_accession experiment_title experiment_desc organism_taxid organism_name library_name library_strategy ... lat_lon biosamplemodel ena_fastq_http ena_fastq_http_1 ena_fastq_http_2 ena_fastq_ftp ena_fastq_ftp_1 ena_fastq_ftp_2 study_geo_accession experiment_geo_accession
0 SRR11085797 SRP249482 Bat coronavirus RaTG13 Genome sequencing SRX7724752 RNA-Seq of Rhinolophus affinis:Fecal swab RNA-Seq of Rhinolophus affinis:Fecal swab 694135 unidentified coronavirus RaTG13 RNA-Seq ... not collected Pathogen.cl <NA> http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR110/097... http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR110/097... <NA> era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR110/... era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR110/... <NA> <NA>

1 rows × 62 columns

SRX to GSM

[10]:
df = db.srx_to_gsm("SRX1254413")
df
[10]:
experiment_accession experiment_alias
0 SRX1254413 GSM1887643