Python API Demo¶
This notebook demonstrates the core functionality of pysradb Python API for querying SRA metadata.
[1]:
# Install pysradb if not already installed
try:
import pysradb
print(f"pysradb {pysradb.__version__} is already installed")
except ImportError:
print("Installing pysradb from GitHub...")
import sys
!{sys.executable} -m pip install -q git+https://github.com/saketkc/pysradb
print("pysradb installed successfully!")
pysradb 3.0.0.dev0 is already installed
/data/github/pysradb/pysradb/utils.py:16: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)
from tqdm.autonotebook import tqdm
[2]:
# pip install git+https://github.com/saketkc/pysradb
[3]:
!pysradb --version
pysradb 3.0.0.dev0
[4]:
from pysradb.sraweb import SRAweb
[5]:
db = SRAweb()
Get metadata of one project¶
[6]:
df = db.sra_metadata("SRP016501")
df
[6]:
| study_accession | study_title | experiment_accession | experiment_title | experiment_desc | organism_taxid | organism_name | library_name | library_strategy | library_source | ... | biosample | bioproject | instrument | instrument_model | instrument_model_desc | total_spots | total_size | run_accession | run_total_spots | run_total_bases | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 133 | SRP016501 | Evolutionary dynamics of gene and isoform regu... | SRX196264 | GSM1020640: mouse_a_brain; Mus musculus; RNA-Seq | GSM1020640: mouse_a_brain; Mus musculus; RNA-Seq | 10090 | Mus musculus | RNA-Seq | TRANSCRIPTOMIC | ... | SAMN01766814 | PRJNA177791 | Illumina HiSeq 2000 | Illumina HiSeq 2000 | ILLUMINA | 87264604 | 5927043102 | SRR594393 | 87264604 | 8726460400 | |
| 132 | SRP016501 | Evolutionary dynamics of gene and isoform regu... | SRX196265 | GSM1020641: mouse_a_colon; Mus musculus; RNA-Seq | GSM1020641: mouse_a_colon; Mus musculus; RNA-Seq | 10090 | Mus musculus | RNA-Seq | TRANSCRIPTOMIC | ... | SAMN01766815 | PRJNA177791 | Illumina HiSeq 2000 | Illumina HiSeq 2000 | ILLUMINA | 101816491 | 6835402318 | SRR594394 | 101816491 | 10181649100 | |
| 131 | SRP016501 | Evolutionary dynamics of gene and isoform regu... | SRX196266 | GSM1020642: mouse_a_heart; Mus musculus; RNA-Seq | GSM1020642: mouse_a_heart; Mus musculus; RNA-Seq | 10090 | Mus musculus | RNA-Seq | TRANSCRIPTOMIC | ... | SAMN01766816 | PRJNA177791 | Illumina Genome Analyzer IIx | Illumina Genome Analyzer IIx | ILLUMINA | 35175982 | 1502674440 | SRR594395 | 35175982 | 2532670704 | |
| 130 | SRP016501 | Evolutionary dynamics of gene and isoform regu... | SRX196267 | GSM1020643: mouse_a_kidney; Mus musculus; RNA-Seq | GSM1020643: mouse_a_kidney; Mus musculus; RNA-Seq | 10090 | Mus musculus | RNA-Seq | TRANSCRIPTOMIC | ... | SAMN01766817 | PRJNA177791 | Illumina HiSeq 2000 | Illumina HiSeq 2000 | ILLUMINA | 119274786 | 7555854784 | SRR594396 | 119274786 | 11927478600 | |
| 129 | SRP016501 | Evolutionary dynamics of gene and isoform regu... | SRX196268 | GSM1020644: mouse_a_liver; Mus musculus; RNA-Seq | GSM1020644: mouse_a_liver; Mus musculus; RNA-Seq | 10090 | Mus musculus | RNA-Seq | TRANSCRIPTOMIC | ... | SAMN01766818 | PRJNA177791 | Illumina HiSeq 2000 | Illumina HiSeq 2000 | ILLUMINA | 116292478 | 7481554926 | SRR594397 | 116292478 | 11629247800 | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4 | SRP016501 | Evolutionary dynamics of gene and isoform regu... | SRX196393 | GSM1020769: chicken_c_liver; Gallus gallus; RN... | GSM1020769: chicken_c_liver; Gallus gallus; RN... | 9031 | Gallus gallus | RNA-Seq | TRANSCRIPTOMIC | ... | SAMN01766943 | PRJNA177791 | Illumina HiSeq 2000 | Illumina HiSeq 2000 | ILLUMINA | 18978066 | 562367072 | SRR594522 | 18978066 | 1366420752 | |
| 3 | SRP016501 | Evolutionary dynamics of gene and isoform regu... | SRX196394 | GSM1020770: chicken_c_lung; Gallus gallus; RNA... | GSM1020770: chicken_c_lung; Gallus gallus; RNA... | 9031 | Gallus gallus | RNA-Seq | TRANSCRIPTOMIC | ... | SAMN01766944 | PRJNA177791 | Illumina HiSeq 2000 | Illumina HiSeq 2000 | ILLUMINA | 26604280 | 931417024 | SRR594523 | 26604280 | 1862299600 | |
| 2 | SRP016501 | Evolutionary dynamics of gene and isoform regu... | SRX196395 | GSM1020771: chicken_c_skm; Gallus gallus; RNA-Seq | GSM1020771: chicken_c_skm; Gallus gallus; RNA-Seq | 9031 | Gallus gallus | RNA-Seq | TRANSCRIPTOMIC | ... | SAMN01766945 | PRJNA177791 | Illumina HiSeq 2000 | Illumina HiSeq 2000 | ILLUMINA | 25606436 | 986287075 | SRR594524 | 25606436 | 1792450520 | |
| 1 | SRP016501 | Evolutionary dynamics of gene and isoform regu... | SRX196396 | GSM1020772: chicken_c_spleen; Gallus gallus; R... | GSM1020772: chicken_c_spleen; Gallus gallus; R... | 9031 | Gallus gallus | RNA-Seq | TRANSCRIPTOMIC | ... | SAMN01766946 | PRJNA177791 | Illumina HiSeq 2000 | Illumina HiSeq 2000 | ILLUMINA | 24401708 | 1201671888 | SRR594525 | 24401708 | 1756922976 | |
| 0 | SRP016501 | Evolutionary dynamics of gene and isoform regu... | SRX196397 | GSM1020773: chicken_c_testes; Gallus gallus; R... | GSM1020773: chicken_c_testes; Gallus gallus; R... | 9031 | Gallus gallus | RNA-Seq | TRANSCRIPTOMIC | ... | SAMN01766947 | PRJNA177791 | Illumina HiSeq 2000 | Illumina HiSeq 2000 | ILLUMINA | 37423394 | 1980545796 | SRR594526 | 37423394 | 2993871520 |
134 rows × 24 columns
Get detailed metadata¶
[7]:
df = db.sra_metadata("SRP016501", detailed=True)
df
[7]:
| run_accession | study_accession | study_title | experiment_accession | experiment_title | experiment_desc | organism_taxid | organism_name | library_name | library_strategy | ... | experiment_alias | source_name | tissue | strain | ena_fastq_http | ena_fastq_http_1 | ena_fastq_http_2 | ena_fastq_ftp | ena_fastq_ftp_1 | ena_fastq_ftp_2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | SRR594393 | SRP016501 | Evolutionary dynamics of gene and isoform regu... | SRX196264 | GSM1020640: mouse_a_brain; Mus musculus; RNA-Seq | GSM1020640: mouse_a_brain; Mus musculus; RNA-Seq | 10090 | Mus musculus | <NA> | RNA-Seq | ... | GSM1020640_1 | mouse_brain | brain | DBA/2J | <NA> | http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... | http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... | <NA> | era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... | era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... |
| 1 | SRR594394 | SRP016501 | Evolutionary dynamics of gene and isoform regu... | SRX196265 | GSM1020641: mouse_a_colon; Mus musculus; RNA-Seq | GSM1020641: mouse_a_colon; Mus musculus; RNA-Seq | 10090 | Mus musculus | <NA> | RNA-Seq | ... | GSM1020641_1 | mouse_colon | colon | DBA/2J | <NA> | http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... | http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... | <NA> | era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... | era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... |
| 2 | SRR594395 | SRP016501 | Evolutionary dynamics of gene and isoform regu... | SRX196266 | GSM1020642: mouse_a_heart; Mus musculus; RNA-Seq | GSM1020642: mouse_a_heart; Mus musculus; RNA-Seq | 10090 | Mus musculus | <NA> | RNA-Seq | ... | GSM1020642_1 | mouse_heart | heart | DBA/2J | <NA> | http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... | http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... | <NA> | era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... | era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... |
| 3 | SRR594396 | SRP016501 | Evolutionary dynamics of gene and isoform regu... | SRX196267 | GSM1020643: mouse_a_kidney; Mus musculus; RNA-Seq | GSM1020643: mouse_a_kidney; Mus musculus; RNA-Seq | 10090 | Mus musculus | <NA> | RNA-Seq | ... | GSM1020643_1 | mouse_kidney | kidney | DBA/2J | <NA> | http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... | http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... | <NA> | era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... | era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... |
| 4 | SRR594397 | SRP016501 | Evolutionary dynamics of gene and isoform regu... | SRX196268 | GSM1020644: mouse_a_liver; Mus musculus; RNA-Seq | GSM1020644: mouse_a_liver; Mus musculus; RNA-Seq | 10090 | Mus musculus | <NA> | RNA-Seq | ... | GSM1020644_1 | mouse_liver | liver | DBA/2J | <NA> | http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... | http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... | <NA> | era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... | era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 129 | SRR594522 | SRP016501 | Evolutionary dynamics of gene and isoform regu... | SRX196393 | GSM1020769: chicken_c_liver; Gallus gallus; RN... | GSM1020769: chicken_c_liver; Gallus gallus; RN... | 9031 | Gallus gallus | <NA> | RNA-Seq | ... | GSM1020769_1 | chicken_liver | liver | <NA> | <NA> | http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... | http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... | <NA> | era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... | era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... |
| 130 | SRR594523 | SRP016501 | Evolutionary dynamics of gene and isoform regu... | SRX196394 | GSM1020770: chicken_c_lung; Gallus gallus; RNA... | GSM1020770: chicken_c_lung; Gallus gallus; RNA... | 9031 | Gallus gallus | <NA> | RNA-Seq | ... | GSM1020770_1 | chicken_lung | lung | <NA> | <NA> | http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... | http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... | <NA> | era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... | era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... |
| 131 | SRR594524 | SRP016501 | Evolutionary dynamics of gene and isoform regu... | SRX196395 | GSM1020771: chicken_c_skm; Gallus gallus; RNA-Seq | GSM1020771: chicken_c_skm; Gallus gallus; RNA-Seq | 9031 | Gallus gallus | <NA> | RNA-Seq | ... | GSM1020771_1 | chicken_skm | skeletal muscle | <NA> | <NA> | http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... | http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... | <NA> | era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... | era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... |
| 132 | SRR594525 | SRP016501 | Evolutionary dynamics of gene and isoform regu... | SRX196396 | GSM1020772: chicken_c_spleen; Gallus gallus; R... | GSM1020772: chicken_c_spleen; Gallus gallus; R... | 9031 | Gallus gallus | <NA> | RNA-Seq | ... | GSM1020772_1 | chicken_spleen | spleen | <NA> | <NA> | http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... | http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... | <NA> | era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... | era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... |
| 133 | SRR594526 | SRP016501 | Evolutionary dynamics of gene and isoform regu... | SRX196397 | GSM1020773: chicken_c_testes; Gallus gallus; R... | GSM1020773: chicken_c_testes; Gallus gallus; R... | 9031 | Gallus gallus | <NA> | RNA-Seq | ... | GSM1020773_1 | chicken_testes | testes | <NA> | <NA> | http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... | http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR594/SRR... | <NA> | era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... | era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR594/... |
134 rows × 53 columns
Get metadata of multiple projects¶
[8]:
df = db.sra_metadata(["SRP016501", "SRP098789"])
df
[8]:
| study_accession | study_title | experiment_accession | experiment_title | experiment_desc | organism_taxid | organism_name | library_name | library_strategy | library_source | ... | biosample | bioproject | instrument | instrument_model | instrument_model_desc | total_spots | total_size | run_accession | run_total_spots | run_total_bases | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 25 | SRP098789 | Selective stalling of human translation throug... | SRX2536403 | GSM2475997: 1.5 µM PF-067446846, 10 min, rep 1... | GSM2475997: 1.5 µM PF-067446846, 10 min, rep 1... | 9606 | Homo sapiens | OTHER | TRANSCRIPTOMIC | ... | SAMN06293487 | PRJNA369742 | Illumina HiSeq 2000 | Illumina HiSeq 2000 | ILLUMINA | 42082855 | 916745706 | SRR5227288 | 42082855 | 2104142750 | |
| 24 | SRP098789 | Selective stalling of human translation throug... | SRX2536404 | GSM2475998: 1.5 µM PF-067446846, 10 min, rep 2... | GSM2475998: 1.5 µM PF-067446846, 10 min, rep 2... | 9606 | Homo sapiens | OTHER | TRANSCRIPTOMIC | ... | SAMN06293486 | PRJNA369742 | Illumina HiSeq 2000 | Illumina HiSeq 2000 | ILLUMINA | 41657461 | 1360366732 | SRR5227289 | 41657461 | 2082873050 | |
| 23 | SRP098789 | Selective stalling of human translation throug... | SRX2536405 | GSM2475999: 1.5 µM PF-067446846, 10 min, rep 3... | GSM2475999: 1.5 µM PF-067446846, 10 min, rep 3... | 9606 | Homo sapiens | OTHER | TRANSCRIPTOMIC | ... | SAMN06293485 | PRJNA369742 | Illumina HiSeq 2000 | Illumina HiSeq 2000 | ILLUMINA | 40462973 | 1287284933 | SRR5227290 | 40462973 | 2023148650 | |
| 22 | SRP098789 | Selective stalling of human translation throug... | SRX2536406 | GSM2476000: 0.3 µM PF-067446846, 10 min, rep 1... | GSM2476000: 0.3 µM PF-067446846, 10 min, rep 1... | 9606 | Homo sapiens | OTHER | TRANSCRIPTOMIC | ... | SAMN06293484 | PRJNA369742 | Illumina HiSeq 2000 | Illumina HiSeq 2000 | ILLUMINA | 41143319 | 843881081 | SRR5227291 | 41143319 | 2057165950 | |
| 21 | SRP098789 | Selective stalling of human translation throug... | SRX2536407 | GSM2476001: 0.3 µM PF-067446846, 10 min, rep 2... | GSM2476001: 0.3 µM PF-067446846, 10 min, rep 2... | 9606 | Homo sapiens | OTHER | TRANSCRIPTOMIC | ... | SAMN06293483 | PRJNA369742 | Illumina HiSeq 2000 | Illumina HiSeq 2000 | ILLUMINA | 60552437 | 1875910244 | SRR5227292 | 60552437 | 3027621850 | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 30 | SRP016501 | Evolutionary dynamics of gene and isoform regu... | SRX196393 | GSM1020769: chicken_c_liver; Gallus gallus; RN... | GSM1020769: chicken_c_liver; Gallus gallus; RN... | 9031 | Gallus gallus | RNA-Seq | TRANSCRIPTOMIC | ... | SAMN01766943 | PRJNA177791 | Illumina HiSeq 2000 | Illumina HiSeq 2000 | ILLUMINA | 18978066 | 562367072 | SRR594522 | 18978066 | 1366420752 | |
| 29 | SRP016501 | Evolutionary dynamics of gene and isoform regu... | SRX196394 | GSM1020770: chicken_c_lung; Gallus gallus; RNA... | GSM1020770: chicken_c_lung; Gallus gallus; RNA... | 9031 | Gallus gallus | RNA-Seq | TRANSCRIPTOMIC | ... | SAMN01766944 | PRJNA177791 | Illumina HiSeq 2000 | Illumina HiSeq 2000 | ILLUMINA | 26604280 | 931417024 | SRR594523 | 26604280 | 1862299600 | |
| 28 | SRP016501 | Evolutionary dynamics of gene and isoform regu... | SRX196395 | GSM1020771: chicken_c_skm; Gallus gallus; RNA-Seq | GSM1020771: chicken_c_skm; Gallus gallus; RNA-Seq | 9031 | Gallus gallus | RNA-Seq | TRANSCRIPTOMIC | ... | SAMN01766945 | PRJNA177791 | Illumina HiSeq 2000 | Illumina HiSeq 2000 | ILLUMINA | 25606436 | 986287075 | SRR594524 | 25606436 | 1792450520 | |
| 27 | SRP016501 | Evolutionary dynamics of gene and isoform regu... | SRX196396 | GSM1020772: chicken_c_spleen; Gallus gallus; R... | GSM1020772: chicken_c_spleen; Gallus gallus; R... | 9031 | Gallus gallus | RNA-Seq | TRANSCRIPTOMIC | ... | SAMN01766946 | PRJNA177791 | Illumina HiSeq 2000 | Illumina HiSeq 2000 | ILLUMINA | 24401708 | 1201671888 | SRR594525 | 24401708 | 1756922976 | |
| 26 | SRP016501 | Evolutionary dynamics of gene and isoform regu... | SRX196397 | GSM1020773: chicken_c_testes; Gallus gallus; R... | GSM1020773: chicken_c_testes; Gallus gallus; R... | 9031 | Gallus gallus | RNA-Seq | TRANSCRIPTOMIC | ... | SAMN01766947 | PRJNA177791 | Illumina HiSeq 2000 | Illumina HiSeq 2000 | ILLUMINA | 37423394 | 1980545796 | SRR594526 | 37423394 | 2993871520 |
160 rows × 24 columns
Get metadata of a Run¶
[9]:
df = db.sra_metadata("SRR11085797", detailed=True)
df
[9]:
| run_accession | study_accession | study_title | experiment_accession | experiment_title | experiment_desc | organism_taxid | organism_name | library_name | library_strategy | ... | host_disease | isolation_source | lat_lon | biosamplemodel | ena_fastq_http | ena_fastq_http_1 | ena_fastq_http_2 | ena_fastq_ftp | ena_fastq_ftp_1 | ena_fastq_ftp_2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | SRR11085797 | SRP249482 | Bat coronavirus RaTG13 Genome sequencing | SRX7724752 | RNA-Seq of Rhinolophus affinis:Fecal swab | RNA-Seq of Rhinolophus affinis:Fecal swab | 694135 | unidentified coronavirus | RaTG13 | RNA-Seq | ... | not applicable | fecal swab | not collected | Pathogen.cl | <NA> | http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR110/097... | http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR110/097... | <NA> | era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR110/... | era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR110/... |
1 rows × 60 columns
SRX to GSM¶
[10]:
df = db.srx_to_gsm("SRX1254413")
df
[10]:
| experiment_accession | experiment_alias | |
|---|---|---|
| 0 | SRX1254413 | GSM1887643 |