Open In Colab

Python API Demo

This notebook demonstrates the core functionality of pysradb Python API for querying SRA metadata.

[1]:
# Install pysradb if not already installed
try:
    import pysradb

    print(f"pysradb {pysradb.__version__} is already installed")
except ImportError:
    print("Installing pysradb from GitHub...")
    import sys

    !{sys.executable} -m pip install -q git+https://github.com/saketkc/pysradb
    print("pysradb installed successfully!")
pysradb 3.0.0.dev0 is already installed
/data/github/pysradb/pysradb/utils.py:16: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)
  from tqdm.autonotebook import tqdm
[2]:
# pip install git+https://github.com/saketkc/pysradb
[3]:
!pysradb --version
pysradb 3.0.0.dev0
[4]:
from pysradb.sraweb import SRAweb
[5]:
db = SRAweb()

Get metadata of one project

[6]:
df = db.sra_metadata("SRP016501")
df
[6]:
study_accession study_title experiment_accession experiment_title experiment_desc organism_taxid organism_name library_name library_strategy library_source ... biosample bioproject instrument instrument_model instrument_model_desc total_spots total_size run_accession run_total_spots run_total_bases
133 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196264 GSM1020640: mouse_a_brain; Mus musculus; RNA-Seq GSM1020640: mouse_a_brain; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC ... SAMN01766814 PRJNA177791 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 87264604 5927043102 SRR594393 87264604 8726460400
132 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196265 GSM1020641: mouse_a_colon; Mus musculus; RNA-Seq GSM1020641: mouse_a_colon; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC ... SAMN01766815 PRJNA177791 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 101816491 6835402318 SRR594394 101816491 10181649100
131 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196266 GSM1020642: mouse_a_heart; Mus musculus; RNA-Seq GSM1020642: mouse_a_heart; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC ... SAMN01766816 PRJNA177791 Illumina Genome Analyzer IIx Illumina Genome Analyzer IIx ILLUMINA 35175982 1502674440 SRR594395 35175982 2532670704
130 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196267 GSM1020643: mouse_a_kidney; Mus musculus; RNA-Seq GSM1020643: mouse_a_kidney; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC ... SAMN01766817 PRJNA177791 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 119274786 7555854784 SRR594396 119274786 11927478600
129 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196268 GSM1020644: mouse_a_liver; Mus musculus; RNA-Seq GSM1020644: mouse_a_liver; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC ... SAMN01766818 PRJNA177791 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 116292478 7481554926 SRR594397 116292478 11629247800
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196393 GSM1020769: chicken_c_liver; Gallus gallus; RN... GSM1020769: chicken_c_liver; Gallus gallus; RN... 9031 Gallus gallus RNA-Seq TRANSCRIPTOMIC ... SAMN01766943 PRJNA177791 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 18978066 562367072 SRR594522 18978066 1366420752
3 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196394 GSM1020770: chicken_c_lung; Gallus gallus; RNA... GSM1020770: chicken_c_lung; Gallus gallus; RNA... 9031 Gallus gallus RNA-Seq TRANSCRIPTOMIC ... SAMN01766944 PRJNA177791 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 26604280 931417024 SRR594523 26604280 1862299600
2 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196395 GSM1020771: chicken_c_skm; Gallus gallus; RNA-Seq GSM1020771: chicken_c_skm; Gallus gallus; RNA-Seq 9031 Gallus gallus RNA-Seq TRANSCRIPTOMIC ... SAMN01766945 PRJNA177791 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 25606436 986287075 SRR594524 25606436 1792450520
1 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196396 GSM1020772: chicken_c_spleen; Gallus gallus; R... GSM1020772: chicken_c_spleen; Gallus gallus; R... 9031 Gallus gallus RNA-Seq TRANSCRIPTOMIC ... SAMN01766946 PRJNA177791 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 24401708 1201671888 SRR594525 24401708 1756922976
0 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196397 GSM1020773: chicken_c_testes; Gallus gallus; R... GSM1020773: chicken_c_testes; Gallus gallus; R... 9031 Gallus gallus RNA-Seq TRANSCRIPTOMIC ... SAMN01766947 PRJNA177791 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 37423394 1980545796 SRR594526 37423394 2993871520

134 rows × 24 columns

Get detailed metadata

[7]:
df = db.sra_metadata("SRP016501", detailed=True)
df
[7]:
run_accession study_accession study_title experiment_accession experiment_title experiment_desc organism_taxid organism_name library_name library_strategy ... experiment_alias source_name tissue strain ena_fastq_http ena_fastq_http_1 ena_fastq_http_2 ena_fastq_ftp ena_fastq_ftp_1 ena_fastq_ftp_2
0 SRR594393 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196264 GSM1020640: mouse_a_brain; Mus musculus; RNA-Seq GSM1020640: mouse_a_brain; Mus musculus; RNA-Seq 10090 Mus musculus <NA> RNA-Seq ... GSM1020640_1 mouse_brain brain DBA/2J <NA> http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... <NA> era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/...
1 SRR594394 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196265 GSM1020641: mouse_a_colon; Mus musculus; RNA-Seq GSM1020641: mouse_a_colon; Mus musculus; RNA-Seq 10090 Mus musculus <NA> RNA-Seq ... GSM1020641_1 mouse_colon colon DBA/2J <NA> http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... <NA> era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/...
2 SRR594395 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196266 GSM1020642: mouse_a_heart; Mus musculus; RNA-Seq GSM1020642: mouse_a_heart; Mus musculus; RNA-Seq 10090 Mus musculus <NA> RNA-Seq ... GSM1020642_1 mouse_heart heart DBA/2J <NA> http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... <NA> era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/...
3 SRR594396 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196267 GSM1020643: mouse_a_kidney; Mus musculus; RNA-Seq GSM1020643: mouse_a_kidney; Mus musculus; RNA-Seq 10090 Mus musculus <NA> RNA-Seq ... GSM1020643_1 mouse_kidney kidney DBA/2J <NA> http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... <NA> era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/...
4 SRR594397 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196268 GSM1020644: mouse_a_liver; Mus musculus; RNA-Seq GSM1020644: mouse_a_liver; Mus musculus; RNA-Seq 10090 Mus musculus <NA> RNA-Seq ... GSM1020644_1 mouse_liver liver DBA/2J <NA> http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... <NA> era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
129 SRR594522 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196393 GSM1020769: chicken_c_liver; Gallus gallus; RN... GSM1020769: chicken_c_liver; Gallus gallus; RN... 9031 Gallus gallus <NA> RNA-Seq ... GSM1020769_1 chicken_liver liver <NA> <NA> http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... <NA> era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/...
130 SRR594523 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196394 GSM1020770: chicken_c_lung; Gallus gallus; RNA... GSM1020770: chicken_c_lung; Gallus gallus; RNA... 9031 Gallus gallus <NA> RNA-Seq ... GSM1020770_1 chicken_lung lung <NA> <NA> http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... <NA> era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/...
131 SRR594524 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196395 GSM1020771: chicken_c_skm; Gallus gallus; RNA-Seq GSM1020771: chicken_c_skm; Gallus gallus; RNA-Seq 9031 Gallus gallus <NA> RNA-Seq ... GSM1020771_1 chicken_skm skeletal muscle <NA> <NA> http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... <NA> era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/...
132 SRR594525 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196396 GSM1020772: chicken_c_spleen; Gallus gallus; R... GSM1020772: chicken_c_spleen; Gallus gallus; R... 9031 Gallus gallus <NA> RNA-Seq ... GSM1020772_1 chicken_spleen spleen <NA> <NA> http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... <NA> era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/...
133 SRR594526 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196397 GSM1020773: chicken_c_testes; Gallus gallus; R... GSM1020773: chicken_c_testes; Gallus gallus; R... 9031 Gallus gallus <NA> RNA-Seq ... GSM1020773_1 chicken_testes testes <NA> <NA> http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... <NA> era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/...

134 rows × 53 columns

Get metadata of multiple projects

[8]:
df = db.sra_metadata(["SRP016501", "SRP098789"])
df
[8]:
study_accession study_title experiment_accession experiment_title experiment_desc organism_taxid organism_name library_name library_strategy library_source ... biosample bioproject instrument instrument_model instrument_model_desc total_spots total_size run_accession run_total_spots run_total_bases
25 SRP098789 Selective stalling of human translation throug... SRX2536403 GSM2475997: 1.5 µM PF-067446846, 10 min, rep 1... GSM2475997: 1.5 µM PF-067446846, 10 min, rep 1... 9606 Homo sapiens OTHER TRANSCRIPTOMIC ... SAMN06293487 PRJNA369742 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 42082855 916745706 SRR5227288 42082855 2104142750
24 SRP098789 Selective stalling of human translation throug... SRX2536404 GSM2475998: 1.5 µM PF-067446846, 10 min, rep 2... GSM2475998: 1.5 µM PF-067446846, 10 min, rep 2... 9606 Homo sapiens OTHER TRANSCRIPTOMIC ... SAMN06293486 PRJNA369742 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 41657461 1360366732 SRR5227289 41657461 2082873050
23 SRP098789 Selective stalling of human translation throug... SRX2536405 GSM2475999: 1.5 µM PF-067446846, 10 min, rep 3... GSM2475999: 1.5 µM PF-067446846, 10 min, rep 3... 9606 Homo sapiens OTHER TRANSCRIPTOMIC ... SAMN06293485 PRJNA369742 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 40462973 1287284933 SRR5227290 40462973 2023148650
22 SRP098789 Selective stalling of human translation throug... SRX2536406 GSM2476000: 0.3 µM PF-067446846, 10 min, rep 1... GSM2476000: 0.3 µM PF-067446846, 10 min, rep 1... 9606 Homo sapiens OTHER TRANSCRIPTOMIC ... SAMN06293484 PRJNA369742 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 41143319 843881081 SRR5227291 41143319 2057165950
21 SRP098789 Selective stalling of human translation throug... SRX2536407 GSM2476001: 0.3 µM PF-067446846, 10 min, rep 2... GSM2476001: 0.3 µM PF-067446846, 10 min, rep 2... 9606 Homo sapiens OTHER TRANSCRIPTOMIC ... SAMN06293483 PRJNA369742 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 60552437 1875910244 SRR5227292 60552437 3027621850
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
30 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196393 GSM1020769: chicken_c_liver; Gallus gallus; RN... GSM1020769: chicken_c_liver; Gallus gallus; RN... 9031 Gallus gallus RNA-Seq TRANSCRIPTOMIC ... SAMN01766943 PRJNA177791 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 18978066 562367072 SRR594522 18978066 1366420752
29 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196394 GSM1020770: chicken_c_lung; Gallus gallus; RNA... GSM1020770: chicken_c_lung; Gallus gallus; RNA... 9031 Gallus gallus RNA-Seq TRANSCRIPTOMIC ... SAMN01766944 PRJNA177791 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 26604280 931417024 SRR594523 26604280 1862299600
28 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196395 GSM1020771: chicken_c_skm; Gallus gallus; RNA-Seq GSM1020771: chicken_c_skm; Gallus gallus; RNA-Seq 9031 Gallus gallus RNA-Seq TRANSCRIPTOMIC ... SAMN01766945 PRJNA177791 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 25606436 986287075 SRR594524 25606436 1792450520
27 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196396 GSM1020772: chicken_c_spleen; Gallus gallus; R... GSM1020772: chicken_c_spleen; Gallus gallus; R... 9031 Gallus gallus RNA-Seq TRANSCRIPTOMIC ... SAMN01766946 PRJNA177791 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 24401708 1201671888 SRR594525 24401708 1756922976
26 SRP016501 Evolutionary dynamics of gene and isoform regu... SRX196397 GSM1020773: chicken_c_testes; Gallus gallus; R... GSM1020773: chicken_c_testes; Gallus gallus; R... 9031 Gallus gallus RNA-Seq TRANSCRIPTOMIC ... SAMN01766947 PRJNA177791 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 37423394 1980545796 SRR594526 37423394 2993871520

160 rows × 24 columns

Get metadata of a Run

[9]:
df = db.sra_metadata("SRR11085797", detailed=True)
df
[9]:
run_accession study_accession study_title experiment_accession experiment_title experiment_desc organism_taxid organism_name library_name library_strategy ... host_disease isolation_source lat_lon biosamplemodel ena_fastq_http ena_fastq_http_1 ena_fastq_http_2 ena_fastq_ftp ena_fastq_ftp_1 ena_fastq_ftp_2
0 SRR11085797 SRP249482 Bat coronavirus RaTG13 Genome sequencing SRX7724752 RNA-Seq of Rhinolophus affinis:Fecal swab RNA-Seq of Rhinolophus affinis:Fecal swab 694135 unidentified coronavirus RaTG13 RNA-Seq ... not applicable fecal swab not collected Pathogen.cl <NA> http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR110/097... http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR110/097... <NA> era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR110/... era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR110/...

1 rows × 60 columns

SRX to GSM

[10]:
df = db.srx_to_gsm("SRX1254413")
df
[10]:
experiment_accession experiment_alias
0 SRX1254413 GSM1887643