Query and Search¶
This notebook demonstrates advanced search capabilities to find SRA studies based on specific criteria.
[1]:
# Install pysradb if not already installed
try:
import pysradb
print(f"pysradb {pysradb.__version__} is already installed")
except ImportError:
print("Installing pysradb from GitHub...")
import sys
!{sys.executable} -m pip install -q git+https://github.com/saketkc/pysradb
print("pysradb installed successfully!")
pysradb 3.0.0.dev0 is already installed
/home/runner/work/pysradb/pysradb/pysradb/download.py:15: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)
from tqdm.autonotebook import tqdm
pysradb search¶
The pysradb search module supports querying the Sequence Read Archive (SRA) and the European Nucleotide Archive (ENA) databases for sequencing data. The module also includes several built-in flags that can be used to fine-tune a search query.¶
[2]:
%%html
<style>
th {font-size: 16px;}
td {font-size: 14px;}
td:first-child {font-size: 15px; font-weight: 500;}
</style>
Terminal flags for the pysradb search module:¶
Flags |
Explanation |
|---|---|
-h, –help |
Displays the help message |
–saveto |
Saves the result in the file specified by the user.Supported file types: txt, tsv, csv |
–db |
Selects the database (SRA, ENA, or both SRA and Geo DataSets) to query. Default database is SRA. Accepted inputs: sra, ena, geo |
-v, –verbosity |
This determines how much details are retrieved and shown in the search result: 0: run_accession only 1: run_accession and experiment_description only 2: (default) study_accession, experiment_accession, experiment_title, description, tax_id, scientific_name, library_strategy, library_source, library_selection, sample_accession, sample_title, instrument_model, run_accession, read_count, base_count 3: Everything in verbosity level 2, followed by all other retrievable information from the database |
-m, –max |
Maximum number of returned entries. Default number is 20.Note: If the maximum number set is large, querying the SRA and GEO DataSets databases will take significantly longer due to API limits |
-q, –query |
The main query string. Note: if this flag is not used, at least one of the following flags must be supplied: |
–accession |
A relevant study / experiment / sample / run accession number |
–organism |
Scientific name of the sample organism |
–layout |
Library layout. Accepted inputs: single, paired |
–mbases |
Size of the sample rounded to the nearest megabase |
–publication-date |
The publication date of the run in the format dd-mm-yyyy. If a date range is desired, enter the start date, followed by end date, separated by a colon ‘:’ in the format dd-mm-yyyy:dd-mm-yyyy Example: 01-01-2010:31-12-2010 |
–platform |
Sequencing platform used for the run. Possible inputs: illumina, ion torrent, oxford nanopore |
–selection |
Library selection. Possible inputs: cdna, chip, dnase, pcr, polya |
–source |
Library source. Possible inputs: genomic, metagenomic, transcriptomic |
–strategy |
Library Preparation strategy. Possible inputs: wgs, amplicon, rna seq |
–title |
Title of the experiment associated with the run |
–geo-query |
The main query string to be sent to Geo DataSets |
–geo-dataset-type |
Dataset type. Possible inputs: expression profiling by array, expression profiling by high throughput sequencing, non coding rna profiling by high throughput sequencing |
–geo-entry-type |
Entry type. Accepted inputs: gds, gpl, gse, gsm |
Using pysradb search in python:¶
pysradb search organises each search query as a instance of either the SraSearch, EnaSearch or the GeoSearch classes. These classes takes in the following parameters in their constructor:¶
SraSearch (verbosity=2, return_max=20, query=None, accession=None, organism=None, layout=None, mbases=None, publication_date=None, platform=None, selection=None, source=None, strategy=None, title=None, suppress_validation=False,)
EnaSearch (verbosity=2, return_max=20, query=None, accession=None, organism=None, layout=None, mbases=None, publication_date=None, platform=None, selection=None, source=None, strategy=None, title=None, suppress_validation=False,)
GeoSearch (verbosity=2, return_max=20, query=None, accession=None, organism=None, layout=None, mbases=None, publication_date=None, platform=None, selection=None, source=None, strategy=None, title=None, geo_query=None, geo_dataset_type=None, geo_entry_type=None, suppress_validation=False,)
Parameters |
Explanations |
|---|---|
verbosity |
This determines how much details are retrieved and shown in the search result (default=2). Same as -v / –verbosity on terminal |
return_max |
Maximum number of returned entries (default=20). Same as -m / –max on terminal |
suppress_validation |
Defaults to False. If this is set to True, the user input format checks will be skipped. Setting this to True may cause the program to behave in unexpected ways, but allows the user to search queries that does not pass the format check. |
Other parameters match the command line flags of the same name.
To query the SRA database for ribosome profiling, expecting an output of verbosity level 2, and returning at most 5 entries, we can do the following:¶
[3]:
from pysradb.search import SraSearch
instance = SraSearch(2, 5, query="ribosome profiling")
instance.search()
df = instance.get_df()
print(df)
study_accession experiment_accession \
0 ERP160955 ERX12614938
1 ERP160955 ERX12614944
2 ERP160955 ERX12614964
3 ERP160955 ERX12614936
4 ERP160955 ERX12614973
experiment_title sample_taxon_id \
0 Ribosome profiling and RNA-seq of WT and TLR4-... 10090
1 Ribosome profiling and RNA-seq of WT and TLR4-... 10090
2 Ribosome profiling and RNA-seq of WT and TLR4-... 10090
3 Ribosome profiling and RNA-seq of WT and TLR4-... 10090
4 Ribosome profiling and RNA-seq of WT and TLR4-... 10090
sample_scientific_name experiment_library_strategy \
0 Mus musculus OTHER
1 Mus musculus OTHER
2 Mus musculus OTHER
3 Mus musculus OTHER
4 Mus musculus OTHER
experiment_library_source experiment_library_selection sample_accession \
0 OTHER other ERS20223304
1 OTHER other ERS20223310
2 OTHER other ERS20223330
3 OTHER other ERS20223302
4 OTHER other ERS20223339
sample_alias experiment_instrument_model pool_member_spots run_1_size \
0 SAMEA115714354 NextSeq 500 43532799 497485999
1 SAMEA115714360 NextSeq 500 32118169 549095273
2 SAMEA115714380 NextSeq 500 42636195 509511924
3 SAMEA115714352 NextSeq 500 37191015 528109302
4 SAMEA115714389 NextSeq 500 56142869 670283869
run_1_accession run_1_total_spots run_1_total_bases
0 ERR13244198 13641868 1036781968
1 ERR13244208 14311503 1087674228
2 ERR13244265 13154367 999731892
3 ERR13244192 13804716 1049158416
4 ERR13244289 18233119 1385717044
Quickstart¶
To query ENA instead, replace SraSearch class with the EnaSearch class:¶
[4]:
from pysradb.search import EnaSearch
instance = EnaSearch(2, 5, "ribosome profiling")
instance.search()
df = instance.get_df()
print(df)
Empty DataFrame
Columns: []
Index: []
To query GEO DataSets instead and retrieve the metadata of linked entries in SRA:¶
[5]:
from pysradb.search import GeoSearch
instance = GeoSearch(2, 5, geo_query="ribosome profiling")
instance.search()
df = instance.get_df()
print(df)
study_accession experiment_accession \
0 SRP580273 SRX33329449
1 SRP580273 SRX33329448
2 SRP580273 SRX33329447
3 SRP580273 SRX33329446
4 SRP580273 SRX33329445
experiment_title sample_taxon_id \
0 GSM9730942: SeRP, total translatome, in GFP-SE... 4932
1 GSM9730941: SeRP, SRP-bound translatome, in GF... 4932
2 GSM9730940: SeRP, total translatome, in GFP-SR... 4932
3 GSM9730939: SeRP, SRP-bound translatome, in GF... 4932
4 GSM9730938: SeRP, total translatome, in GFP-SR... 4932
sample_scientific_name experiment_library_strategy \
0 Saccharomyces cerevisiae OTHER
1 Saccharomyces cerevisiae OTHER
2 Saccharomyces cerevisiae OTHER
3 Saccharomyces cerevisiae OTHER
4 Saccharomyces cerevisiae OTHER
experiment_library_source experiment_library_selection sample_accession \
0 TRANSCRIPTOMIC other SRS29128554
1 TRANSCRIPTOMIC other SRS29128553
2 TRANSCRIPTOMIC other SRS29128552
3 TRANSCRIPTOMIC other SRS29128551
4 TRANSCRIPTOMIC other SRS29128550
sample_alias experiment_instrument_model run_1_size run_1_accession \
0 GSM9730942 NextSeq 2000 451856729 SRR38521764
1 GSM9730941 NextSeq 2000 452611233 SRR38521766
2 GSM9730940 NextSeq 2000 484692839 SRR38521768
3 GSM9730939 NextSeq 2000 582040670 SRR38521770
4 GSM9730938 NextSeq 2000 656583855 SRR38521772
run_1_total_spots run_1_total_bases
0 13413862 1354800062
1 13267812 1340049012
2 13917774 1405695174
3 17120756 1729196356
4 18371310 1873873620
7. Querying GEO DataSets with publication_date filter and displaying publication dates in results:¶
[6]:
from pysradb.search import GeoSearch
# Search for RNA-Seq datasets published in September 2024
# Using verbosity=3 to get all available fields including publication_date
instance = GeoSearch(
verbosity=3,
return_max=5,
geo_query="RNA-Seq",
publication_date="01-09-2024:30-09-2024",
)
try:
instance.search()
df = instance.get_df()
# Display select columns including publication_date
if not df.empty and "publication_date" in df.columns:
cols_to_show = [
"study_accession",
"experiment_accession",
"sample_scientific_name",
"experiment_library_strategy",
"publication_date",
]
available_cols = [c for c in cols_to_show if c in df.columns]
print(df[available_cols])
else:
print(df)
except Exception as exc:
print(f"GEO search example skipped because the live service returned: {type(exc).__name__}")
No results found for the following search query:
SRA: {'query': 'sra gds[Filter]', 'accession': None, 'organism': None, 'layout': None, 'mbases': None, 'publication_date': '01-09-2024:30-09-2024', 'platform': None, 'selection': None, 'source': None, 'strategy': None, 'title': None}
GEO DataSets: {'query': 'RNA-Seq AND gds sra[Filter]', 'dataset_type': None, 'entry_type': None, 'publication_date': '01-09-2024:30-09-2024', 'organism': None}
Empty DataFrame
Columns: []
Index: []
Error Handling¶
When suppress_validation is not set to True, query fields with incorrect entries will raise IncorrectFieldException, which provides the complete list of acceptable inputs for fields such as “selection”, etc:¶
[7]:
# 1. Invalid query entered for "selection"
try:
SraSearch(selection="Mudkip")
except Exception as exc:
print(type(exc).__name__, exc)
IncorrectFieldException Incorrect selection: Mudkip
--selection must be one of the following:
5-methylcytidine antibody, CAGE, ChIP, ChIP-Seq, DNase, HMPR, Hybrid Selection,
Inverse rRNA, Inverse rRNA selection, MBD2 protein methyl-CpG binding domain,
MDA, MF, MNase, MSLL, Oligo-dT, PCR, PolyA, RACE, RANDOM, RANDOM PCR, RT-PCR,
Reduced Representation, Restriction Digest, cDNA, cDNA_oligo_dT, cDNA_randomPriming
other, padlock probes capture method, repeat fractionation, size fractionation,
unspecified
[8]:
# 2. Ambiguous query entered for "source":
try:
EnaSearch(source="metagenomic viral rna ")
except Exception as exc:
print(type(exc).__name__, exc)
IncorrectFieldException Multiple potential matches have been identified for metagenomic viral rna :
['METAGENOMIC', 'VIRAL RNA']
Please check your input.
Usage Examples:¶
1. Checking the help message on terminal:¶
[9]:
!pysradb search -h
usage: pysradb search [-h] [-o SAVETO] [-s] [-g [GRAPHS]] [-d {ena,geo,sra}]
[-v {0,1,2,3}] [--run-description] [--detailed] [-m MAX]
[-q QUERY [QUERY ...]] [-A ACCESSION]
[-O ORGANISM [ORGANISM ...]] [-L {SINGLE,PAIRED}]
[-M MBASES] [-D PUBLICATION_DATE]
[-P PLATFORM [PLATFORM ...]]
[-E SELECTION [SELECTION ...]] [-C SOURCE [SOURCE ...]]
[-S STRATEGY [STRATEGY ...]] [-T TITLE [TITLE ...]] [-I]
[-G GEO_QUERY [GEO_QUERY ...]]
[-Y GEO_DATASET_TYPE [GEO_DATASET_TYPE ...]]
[-Z GEO_ENTRY_TYPE [GEO_ENTRY_TYPE ...]]
options:
-h, --help show this help message and exit
-o SAVETO, --saveto SAVETO
Save search result dataframe to file
-s, --stats Displays some useful statistics for the search
results.
-g [GRAPHS], --graphs [GRAPHS]
Generates graphs to illustrate the search result. By
default all graphs are generated. Alternatively,
select a subset from the options below in a space-
separated string: daterange, organism, source,
selection, platform, basecount
-d {ena,geo,sra}, --db {ena,geo,sra}
Select the db API (sra, ena, or geo) to query, default
= sra. Note: pysradb search works slightly differently
when db = geo. Please refer to 'pysradb search --geo-
info' for more details.
-v {0,1,2,3}, --verbosity {0,1,2,3}
Level of search result details (0, 1, 2 or 3), default
= 2 0: run accession only 1: run accession and
experiment title 2: accession numbers, titles and
sequencing information 3: records in 2 and other
information such as download url, sample attributes,
etc
--run-description Displays run accessions and descriptions only.
Equivalent to --verbosity 1
--detailed Displays detailed search results. Equivalent to
--verbosity 3.
-m MAX, --max MAX Maximum number of entries to return, default = 20
-q QUERY [QUERY ...], --query QUERY [QUERY ...]
Main query string. Note that if no query is supplied,
at least one of the following flags must be present:
-A ACCESSION, --accession ACCESSION
Accession number
-O ORGANISM [ORGANISM ...], --organism ORGANISM [ORGANISM ...]
Scientific name of the sample organism
-L {SINGLE,PAIRED}, --layout {SINGLE,PAIRED}
Library layout. Accepts either SINGLE or PAIRED
-M MBASES, --mbases MBASES
Size of the sample rounded to the nearest megabase
-D PUBLICATION_DATE, --publication-date PUBLICATION_DATE
Publication date of the run in the format dd-mm-yyyy.
If a date range is desired, enter the start date,
followed by end date, separated by a colon ':'.
Example: 01-01-2010:31-12-2010
-P PLATFORM [PLATFORM ...], --platform PLATFORM [PLATFORM ...]
Sequencing platform
-E SELECTION [SELECTION ...], --selection SELECTION [SELECTION ...]
Library selection
-C SOURCE [SOURCE ...], --source SOURCE [SOURCE ...]
Library source
-S STRATEGY [STRATEGY ...], --strategy STRATEGY [STRATEGY ...]
Library preparation strategy
-T TITLE [TITLE ...], --title TITLE [TITLE ...]
Experiment title
-I, --geo-info Displays information on how to query GEO DataSets via
'pysradb search --db geo ...', including accepted
inputs for -G/--geo-query, -Y/--geo-dataset-type and
-Z/--geo-entry-type.
-G GEO_QUERY [GEO_QUERY ...], --geo-query GEO_QUERY [GEO_QUERY ...]
Main query string for GEO DataSet. This flag is only
used when db is set to be geo.Please refer to 'pysradb
search --geo-info' for more details.
-Y GEO_DATASET_TYPE [GEO_DATASET_TYPE ...], --geo-dataset-type GEO_DATASET_TYPE [GEO_DATASET_TYPE ...]
GEO DataSet Type. This flag is only used when --db is
set to be geo.Please refer to 'pysradb search --geo-
info' for more details.
-Z GEO_ENTRY_TYPE [GEO_ENTRY_TYPE ...], --geo-entry-type GEO_ENTRY_TYPE [GEO_ENTRY_TYPE ...]
GEO Entry Type. This flag is only used when --db is
set to be geo.Please refer to 'pysradb search --geo-
info' for more details.
5. More complex example:¶
[13]:
from pysradb.search import EnaSearch
instance = EnaSearch(
3,
20,
organism="Escherichia coli",
layout="Paired",
mbases=10,
publication_date="01-01-2019:31-12-2021",
platform="Illumina",
selection="random",
source="Genomic",
strategy="WGS",
)
try:
instance.search()
df = instance.get_df()
df
except Exception as exc:
print(f"GEO search example skipped because the live service returned: {type(exc).__name__}")
[14]:
sorted(df.columns)
[14]:
['accession',
'age',
'aligned',
'altitude',
'assembly_quality',
'assembly_software',
'bam_aspera',
'bam_bytes',
'bam_file_role',
'bam_ftp',
'bam_galaxy',
'bam_md5',
'base_count',
'binning_software',
'bio_material',
'bisulfite_protocol',
'broad_scale_environmental_context',
'broker_name',
'cage_protocol',
'cell_line',
'cell_type',
'center_name',
'checklist',
'chip_ab_provider',
'chip_protocol',
'chip_target',
'collected_by',
'collection_date',
'collection_date_end',
'collection_date_start',
'completeness_score',
'contamination_score',
'control_experiment',
'country',
'cultivar',
'culture_collection',
'datahub',
'depth',
'description',
'dev_stage',
'disease',
'dnase_protocol',
'ecotype',
'elevation',
'environment_biome',
'environment_feature',
'environment_material',
'environmental_medium',
'environmental_sample',
'experiment_accession',
'experiment_alias',
'experiment_target',
'experiment_title',
'experimental_factor',
'experimental_protocol',
'extraction_protocol',
'faang_library_selection',
'fastq_aspera',
'fastq_bytes',
'fastq_file_role',
'fastq_ftp',
'fastq_galaxy',
'fastq_md5',
'file_location',
'first_created',
'first_public',
'germline',
'hi_c_protocol',
'host',
'host_body_site',
'host_genotype',
'host_gravidity',
'host_growth_conditions',
'host_phenotype',
'host_scientific_name',
'host_sex',
'host_status',
'host_tax_id',
'identified_by',
'instrument_model',
'instrument_platform',
'investigation_type',
'isolate',
'isolation_source',
'last_updated',
'lat',
'library_construction_protocol',
'library_gen_protocol',
'library_layout',
'library_max_fragment_size',
'library_min_fragment_size',
'library_name',
'library_pcr_isolation_protocol',
'library_prep_date',
'library_prep_date_format',
'library_prep_latitude',
'library_prep_location',
'library_prep_longitude',
'library_selection',
'library_source',
'library_strategy',
'local_environmental_context',
'location',
'location_end',
'location_start',
'lon',
'marine_region',
'mating_type',
'ncbi_reporting_standard',
'nominal_length',
'nominal_sdev',
'pcr_isolation_protocol',
'ph',
'project_name',
'protocol_label',
'read_count',
'read_strand',
'restriction_enzyme',
'restriction_enzyme_target_sequence',
'restriction_site',
'rna_integrity_num',
'rna_prep_3_protocol',
'rna_prep_5_protocol',
'rna_purity_230_ratio',
'rna_purity_280_ratio',
'rt_prep_protocol',
'run_accession',
'run_alias',
'run_date',
'salinity',
'sample_accession',
'sample_alias',
'sample_capture_status',
'sample_collection',
'sample_description',
'sample_material',
'sample_prep_interval',
'sample_prep_interval_units',
'sample_storage',
'sample_storage_processing',
'sample_title',
'sampling_campaign',
'sampling_platform',
'sampling_site',
'scientific_name',
'secondary_project',
'secondary_sample_accession',
'secondary_study_accession',
'sequencing_date',
'sequencing_date_format',
'sequencing_location',
'sequencing_longitude',
'sequencing_method',
'sequencing_primer_catalog',
'sequencing_primer_lot',
'sequencing_primer_provider',
'serotype',
'serovar',
'sex',
'specimen_voucher',
'sra_aspera',
'sra_bytes',
'sra_file_role',
'sra_ftp',
'sra_galaxy',
'sra_md5',
'status',
'strain',
'study_accession',
'study_alias',
'study_title',
'sub_species',
'sub_strain',
'submission_accession',
'submission_tool',
'submitted_aspera',
'submitted_bytes',
'submitted_file_role',
'submitted_format',
'submitted_ftp',
'submitted_galaxy',
'submitted_host_sex',
'submitted_md5',
'submitted_read_type',
'surveillance_target',
'tag',
'target_gene',
'tax_id',
'tax_lineage',
'taxonomic_classification',
'taxonomic_identity_marker',
'temperature',
'tissue_lib',
'tissue_type',
'transposase_protocol',
'variety']
[15]:
# https://github.com/saketkc/pysradb/issues/221
instance = GeoSearch(
publication_date="05-09-2024:06-09-2024", return_max=100, verbosity=3
)
instance.search()
df = instance.get_df()
df
[15]:
| study_accession | experiment_accession | experiment_title | sample_taxon_id | sample_scientific_name | experiment_library_strategy | experiment_library_source | experiment_library_selection | sample_accession | sample_alias | ... | study_link_1_type | study_link_1_value_1 | study_link_1_value_2 | study_study_abstract | study_study_title | study_study_type_existing_study_type | submission_accession | submission_alias | submission_center_name | submission_lab_name | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | SRP531137 | SRX25997822 | GSM8501051: HEK293, Prm1 negative, Replicate 3... | 9606 | Homo sapiens | Hi-C | GENOMIC | other | SRS22571584 | GSM8501051 | ... | NaN | NaN | NaN | Although the spatial organization of the genom... | Large-scale manipulation of radial positioning... | Other | SRA1964865 | SUB14711595 | Technion - Israel Institute of Technology | NaN |
| 1 | SRP531137 | SRX25997821 | GSM8501050: HEK293, Prm1 negative, Replicate 2... | 9606 | Homo sapiens | Hi-C | GENOMIC | other | SRS22571583 | GSM8501050 | ... | NaN | NaN | NaN | Although the spatial organization of the genom... | Large-scale manipulation of radial positioning... | Other | SRA1964865 | SUB14711595 | Technion - Israel Institute of Technology | NaN |
| 2 | SRP531137 | SRX25997820 | GSM8501049: HEK293, Prm1 negative, Replicate 1... | 9606 | Homo sapiens | Hi-C | GENOMIC | other | SRS22571582 | GSM8501049 | ... | NaN | NaN | NaN | Although the spatial organization of the genom... | Large-scale manipulation of radial positioning... | Other | SRA1964865 | SUB14711595 | Technion - Israel Institute of Technology | NaN |
| 3 | SRP531137 | SRX25997819 | GSM8501048: HEK293, Prm1 positive DAPI low, Hi... | 9606 | Homo sapiens | Hi-C | GENOMIC | other | SRS22571581 | GSM8501048 | ... | NaN | NaN | NaN | Although the spatial organization of the genom... | Large-scale manipulation of radial positioning... | Other | SRA1964865 | SUB14711595 | Technion - Israel Institute of Technology | NaN |
| 4 | SRP531137 | SRX25997818 | GSM8501047: HEK293, Prm1 positive DAPI high, H... | 9606 | Homo sapiens | Hi-C | GENOMIC | other | SRS22571580 | GSM8501047 | ... | NaN | NaN | NaN | Although the spatial organization of the genom... | Large-scale manipulation of radial positioning... | Other | SRA1964865 | SUB14711595 | Technion - Israel Institute of Technology | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 95 | SRP521398 | SRX25413002 | GSM8411330: mouse_mirmOTSTEO_sample_76_b (cont... | 10090 | Mus musculus | RNA-Seq | TRANSCRIPTOMIC SINGLE CELL | cDNA | SRS22073596 | GSM8411330 | ... | XREF_LINK | DB: pubmed | ID: 39367016 | Our current understanding of the molecular cir... | An atlas of small non-coding RNAs in Human Pre... | Transcriptome Analysis | SRA1930214 | SUB14617928 | Department of Clinical Science, Karolinska ins... | NaN |
| 96 | SRP521398 | SRX25413001 | GSM8411329: mouse_mirmOTSTEO_sample_76_a (miR3... | 10090 | Mus musculus | RNA-Seq | TRANSCRIPTOMIC SINGLE CELL | cDNA | SRS22073594 | GSM8411329 | ... | XREF_LINK | DB: pubmed | ID: 39367016 | Our current understanding of the molecular cir... | An atlas of small non-coding RNAs in Human Pre... | Transcriptome Analysis | SRA1930214 | SUB14617928 | Department of Clinical Science, Karolinska ins... | NaN |
| 97 | SRP521398 | SRX25413000 | GSM8411328: mouse_mirmOTSTEO_sample_75_b (cont... | 10090 | Mus musculus | RNA-Seq | TRANSCRIPTOMIC SINGLE CELL | cDNA | SRS22073595 | GSM8411328 | ... | XREF_LINK | DB: pubmed | ID: 39367016 | Our current understanding of the molecular cir... | An atlas of small non-coding RNAs in Human Pre... | Transcriptome Analysis | SRA1930214 | SUB14617928 | Department of Clinical Science, Karolinska ins... | NaN |
| 98 | SRP521398 | SRX25412999 | GSM8411327: mouse_mirmOTSTEO_sample_75_a (miR3... | 10090 | Mus musculus | RNA-Seq | TRANSCRIPTOMIC SINGLE CELL | cDNA | SRS22073593 | GSM8411327 | ... | XREF_LINK | DB: pubmed | ID: 39367016 | Our current understanding of the molecular cir... | An atlas of small non-coding RNAs in Human Pre... | Transcriptome Analysis | SRA1930214 | SUB14617928 | Department of Clinical Science, Karolinska ins... | NaN |
| 99 | SRP521398 | SRX25412998 | GSM8411326: mouse_mirmOTSTEO_sample_74_b (cont... | 10090 | Mus musculus | RNA-Seq | TRANSCRIPTOMIC SINGLE CELL | cDNA | SRS22073592 | GSM8411326 | ... | XREF_LINK | DB: pubmed | ID: 39367016 | Our current understanding of the molecular cir... | An atlas of small non-coding RNAs in Human Pre... | Transcriptome Analysis | SRA1930214 | SUB14617928 | Department of Clinical Science, Karolinska ins... | NaN |
100 rows × 630 columns
[16]:
instance = GeoSearch(
publication_date="04-09-2024:06-09-2024", return_max=1000, verbosity=3
)
try:
instance.search()
df = instance.get_df()
print(df["study_alias"].unique())
except Exception as exc:
print(f"GEO search example skipped because the live service returned: {type(exc).__name__}")
<StringArray>
['GSE276554', 'GSE276553', 'GSE276185', 'GSE276153', 'GSE275211', 'GSE273844',
'GSE273813', 'GSE272793', 'GSE268837']
Length: 9, dtype: str
6. Corresponding terminal command example, with max set to 20:¶
[17]:
!pysradb search --db ena -m 20 -v 3 --organism Escherichia coli --layout Paired --mbases 100 --publication-date 01-01-2019:31-12-2019 --platform illumina --selection random --source Genomic --strategy wgs
Displaying 196 columns in chunks of 5
Columns 1-5:
study_a experim
ccessio ent_acc
n ession experiment_title description tax_id
PRJEB34 ERX3552 Illumina MiSeq paired Illumina MiSeq paired 562
285 114 end sequencing: Raw end sequencing: Raw
reads: p51_OXA-plasmid reads: p51_OXA-plasmid
PRJEB34 ERX3552 Illumina HiSeq 4000 Illumina HiSeq 4000 562
513 075 paired end sequencing paired end sequencing
PRJNA54 SRX5991 NextSeq 500 sequencing: NextSeq 500 sequencing: 562
4527 194 WGS of Escherichia coli WGS of Escherichia coli
BIOML-A288 BIOML-A288
PRJNA51 SRX5306 Illumina MiSeq Illumina MiSeq 562
7654 330 sequencing: Sequencing sequencing: Sequencing
of environmental of environmental samples
samples of E. coli of E. coli collected
collected across across Nottinghamshire
Nottinghamshire during during 2015
2015
PRJNA51 SRX5308 Illumina HiSeq 2000 Illumina HiSeq 2000 562
7527 971 sequencing: Chemostat sequencing: Chemostat 1,
1, Heneration 450, Heneration 450, clone
clone H10 H10
PRJNA54 SRX5801 Illumina MiSeq Illumina MiSeq 562
1504 137 sequencing: Adapterama sequencing: Adapterama I
I E. coli E. coli
PRJNA54 SRX5883 Illumina MiSeq Illumina MiSeq 562
1983 127 sequencing: Whole sequencing: Whole genome
genome Illumina MiSeq Illumina MiSeq sequence
sequence of Escherichia of Escherichia coli
coli
PRJNA54 SRX5990 NextSeq 500 sequencing: NextSeq 500 sequencing: 562
4527 327 WGS of Escherichia coli WGS of Escherichia coli
BIOML-A341 BIOML-A341
PRJEB33 ERX3417 NextSeq 500 paired end NextSeq 500 paired end 562
169 194 sequencing sequencing
Columns 6-10:
scientific_nam library_strate library_sourc library_select sample_access
e gy e ion ion
Escherichia WGS GENOMIC RANDOM SAMEA5957593
coli
Escherichia WGS GENOMIC RANDOM SAMEA5789900
coli
Escherichia WGS GENOMIC RANDOM SAMN11848885
coli
Escherichia WGS GENOMIC RANDOM SAMN10840046
coli
Escherichia WGS GENOMIC RANDOM SAMN10836213
coli
Escherichia WGS GENOMIC RANDOM SAMN11586387
coli
Escherichia WGS GENOMIC RANDOM SAMN11660032
coli
Escherichia WGS GENOMIC RANDOM SAMN11848938
coli
Escherichia WGS GENOMIC RANDOM SAMEA5732361
coli
Columns 11-15:
instrume run_acces read_cou base_coun
sample_title nt_model sion nt t
p51_OXA-plasmid Illumina ERR353551 219933 100397176
MiSeq 6
AMRIL_7 Illumina ERR353547 530243 99574284
HiSeq 7
4000
MIGS Cultured Bacterial/Archaeal NextSeq SRR922059 365343 100440883
sample from Escherichia coli 500 7
T1-3 Illumina SRR850233 219871 100338769
MiSeq 8
Microbe sample from Escherichia Illumina SRR850515 495205 100031410
coli HiSeq 1
2000
Microbe sample from Escherichia Illumina SRR902326 199252 100024504
coli MiSeq 2
Pathogen: environmental/food/other Illumina SRR910862 202951 100260246
sample from Escherichia coli MiSeq 1
MIGS Cultured Bacterial/Archaeal NextSeq SRR922012 352524 99932805
sample from Escherichia coli 500 8
- NextSeq ERR339333 335367 99674857
500 0
Columns 16-20:
accession age aligned altitude assembly_quality
ERR3535516 - - - -
ERR3535477 - - - -
SRR9220597 - - - -
SRR8502338 - - - -
SRR8505151 - - - -
SRR9023262 - - - -
SRR9108621 - - - -
SRR9220128 - - - -
ERR3393330 - - - -
Columns 21-25:
assembly_software bam_aspera bam_bytes bam_file_role bam_ftp
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
Columns 26-30:
binning_softwar bisulfite_protoco
bam_galaxy bam_md5 e bio_material l
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
Columns 31-35:
broad_scale_environmental_conte broker_na cage_prot
xt me ocol cell_line cell_type
- - - - -
- - - - -
bodily fluid material biome - - - -
[ENVO:02000019]
- - - - -
- - - - -
- - - - -
- - - - -
bodily fluid material biome - - - -
[ENVO:02000019]
- DTU-GE - - -
Columns 36-40:
chip_ab chip_p
checkli _provid rotoco chip_ta
center_name st er l rget
UMC Utrecht ERC0000 - - -
11
LONDON SCHOOL OF HYGIENE AND TROPICAL ERC0000 - - -
MEDICINE 28
SUB5744285 - - - -
SUB5096574 - - - -
SUB5048450 - - - -
SUB5588746 - - - -
SUB5658670 - - - -
SUB5744285 - - - -
Centre for Genomic Epidemiology;National ERC0000 - - -
Food Institute;Technical University of 29
Denmark (DTU);Denmark;DTU-GE
Columns 41-45:
collection_ collection_ collection_ completenes
collected_by date date_end date_start s_score
- - - - -
- 2016 2016-12-31 2016-01-01 -
- 2016-01-15 2016-01-15 2016-01-15 -
- 2015-10-26 2015-10-26 2015-10-26 -
- missing - - -
- not - - -
applicable
Mohamed Ezzat El 2018-05-24 2018-05-24 2018-05-23 -
Zowalaty Laboratory
(VMID)
- 2015-12-09 2015-12-09 2015-12-09 -
DTU 2017 2017-12-31 2017-01-01 -
Columns 46-50:
contamination control_exper culture_coll
_score iment country cultivar ection
- - - - -
- - United Kingdom - -
- - USA:Boston - -
- - United Kingdom - -
- - missing - -
- - - - -
- - South Africa: - -
Eastern Cape
- - USA:Boston - -
- - Portugal - -
Columns 51-55:
datahub depth dev_stage disease dnase_protocol
dcc_compare - - - -
dcc_compare - - - -
dcc_compare - - - -
dcc_compare - - - -
dcc_compare - - - -
dcc_compare - - - -
dcc_compare - - - -
dcc_compare - - - -
dcc_compare;dcc_br - - - -
omhead
Columns 56-60:
environment_fea environment_m
ecotype elevation environment_biome ture aterial
- - - - -
- - - - -
- - bodily fluid material excreta fecal
biome [ENVO:02000019] material material
[ENVO:02000022] [ENVO:0000200
3]
- - - - -
- - - - -
- - - - -
- - - - -
- - bodily fluid material excreta fecal
biome [ENVO:02000019] material material
[ENVO:02000022] [ENVO:0000200
3]
- - - - -
Columns 61-65:
environme environme experimen
ntal_medi ntal_samp experiment tal_facto
um le experiment_alias _target r
- - webin-reads-p51_OXA-plasmid - -
- - ena-EXPERIMENT-LONDON SCHOOL OF - -
HYGIENE AND TROPICAL
MEDICINE-20-09-2019-14:10:16:49
6-7
fecal - an_0080_0058_g10 - -
material
[ENVO:000
02003]
- - T1-3 - -
- - Clonal_chem1_H10 - -
- - iNext_07 - -
- - Nextera XT library SEQ000089458 - -
fecal - bf_0095_0059_a9 - -
material
[ENVO:000
02003]
- false Exp_2019_6_24_8_4_22_293 - -
Columns 66-70:
faang_l
experim extract ibrary_
ental_p ion_pro selecti fastq_b
rotocol tocol on fastq_aspera ytes
- - - fasp.sra.ebi.ac.uk:/vol1/fastq/ERR353/00 3252929
6/ERR3535516/ERR3535516_1.fastq.gz;fasp. 9;36908
sra.ebi.ac.uk:/vol1/fastq/ERR353/006/ERR 444
3535516/ERR3535516_2.fastq.gz
- - - fasp.sra.ebi.ac.uk:/vol1/fastq/ERR353/00 3507740
7/ERR3535477/ERR3535477_1.fastq.gz;fasp. 1;35433
sra.ebi.ac.uk:/vol1/fastq/ERR353/007/ERR 727
3535477/ERR3535477_2.fastq.gz
- - - fasp.sra.ebi.ac.uk:/vol1/fastq/SRR922/00 2902200
7/SRR9220597/SRR9220597_1.fastq.gz;fasp. 6;29050
sra.ebi.ac.uk:/vol1/fastq/SRR922/007/SRR 948
9220597/SRR9220597_2.fastq.gz
- - - fasp.sra.ebi.ac.uk:/vol1/fastq/SRR850/00 3090210
8/SRR8502338/SRR8502338_1.fastq.gz;fasp. 6;35363
sra.ebi.ac.uk:/vol1/fastq/SRR850/008/SRR 087
8502338/SRR8502338_2.fastq.gz
- - - fasp.sra.ebi.ac.uk:/vol1/fastq/SRR850/00 4624917
1/SRR8505151/SRR8505151_1.fastq.gz;fasp. 7;46229
sra.ebi.ac.uk:/vol1/fastq/SRR850/001/SRR 911
8505151/SRR8505151_2.fastq.gz
- - - fasp.sra.ebi.ac.uk:/vol1/fastq/SRR902/00 3272464
2/SRR9023262/SRR9023262_1.fastq.gz;fasp. 9;37407
sra.ebi.ac.uk:/vol1/fastq/SRR902/002/SRR 901
9023262/SRR9023262_2.fastq.gz
- - - fasp.sra.ebi.ac.uk:/vol1/fastq/SRR910/00 3435059
1/SRR9108621/SRR9108621_1.fastq.gz;fasp. 5;40093
sra.ebi.ac.uk:/vol1/fastq/SRR910/001/SRR 732
9108621/SRR9108621_2.fastq.gz
- - - fasp.sra.ebi.ac.uk:/vol1/fastq/SRR922/00 2819205
8/SRR9220128/SRR9220128_1.fastq.gz;fasp. 3;28009
sra.ebi.ac.uk:/vol1/fastq/SRR922/008/SRR 156
9220128/SRR9220128_2.fastq.gz
- - - fasp.sra.ebi.ac.uk:/vol1/fastq/ERR339/00 3281153
0/ERR3393330/ERR3393330_1.fastq.gz;fasp. 5;33340
sra.ebi.ac.uk:/vol1/fastq/ERR339/000/ERR 844
3393330/ERR3393330_2.fastq.gz
Columns 71-75:
fastq_fil fastq_md file_loca
e_role fastq_ftp fastq_galaxy 5 tion
GENERATED ftp.sra.ebi.ac.uk/vol ftp.sra.ebi.ac.uk/vol ee425499 -
_FILE;GEN 1/fastq/ERR353/006/ER 1/fastq/ERR353/006/ER fba579cf
ERATED_FI R3535516/ERR3535516_1 R3535516/ERR3535516_1 12fe9028
LE .fastq.gz;ftp.sra.ebi .fastq.gz;ftp.sra.ebi e9a9c28d
.ac.uk/vol1/fastq/ERR .ac.uk/vol1/fastq/ERR ;f523ce7
353/006/ERR3535516/ER 353/006/ERR3535516/ER 7e4a1e4a
R3535516_2.fastq.gz R3535516_2.fastq.gz 27c46402
913255c5
0
GENERATED ftp.sra.ebi.ac.uk/vol ftp.sra.ebi.ac.uk/vol e5fb3426 -
_FILE;GEN 1/fastq/ERR353/007/ER 1/fastq/ERR353/007/ER 86609929
ERATED_FI R3535477/ERR3535477_1 R3535477/ERR3535477_1 865025f6
LE .fastq.gz;ftp.sra.ebi .fastq.gz;ftp.sra.ebi 5b07a0a6
.ac.uk/vol1/fastq/ERR .ac.uk/vol1/fastq/ERR ;1ab3f17
353/007/ERR3535477/ER 353/007/ERR3535477/ER 124ecdf8
R3535477_2.fastq.gz R3535477_2.fastq.gz 408dcf47
71d2fcd3
a
GENERATED ftp.sra.ebi.ac.uk/vol ftp.sra.ebi.ac.uk/vol 7c4a1ec4 -
_FILE;GEN 1/fastq/SRR922/007/SR 1/fastq/SRR922/007/SR e1dfcfea
ERATED_FI R9220597/SRR9220597_1 R9220597/SRR9220597_1 8bfa411a
LE .fastq.gz;ftp.sra.ebi .fastq.gz;ftp.sra.ebi db6efec0
.ac.uk/vol1/fastq/SRR .ac.uk/vol1/fastq/SRR ;9d0bc10
922/007/SRR9220597/SR 922/007/SRR9220597/SR 52f9a19b
R9220597_2.fastq.gz R9220597_2.fastq.gz 1d636994
5d223ec0
7
GENERATED ftp.sra.ebi.ac.uk/vol ftp.sra.ebi.ac.uk/vol 7e50c3f5 -
_FILE;GEN 1/fastq/SRR850/008/SR 1/fastq/SRR850/008/SR d03d8b7a
ERATED_FI R8502338/SRR8502338_1 R8502338/SRR8502338_1 a03c8316
LE .fastq.gz;ftp.sra.ebi .fastq.gz;ftp.sra.ebi 41e1ecc3
.ac.uk/vol1/fastq/SRR .ac.uk/vol1/fastq/SRR ;0e87334
850/008/SRR8502338/SR 850/008/SRR8502338/SR ede312a4
R8502338_2.fastq.gz R8502338_2.fastq.gz 0552551f
b6331146
9
GENERATED ftp.sra.ebi.ac.uk/vol ftp.sra.ebi.ac.uk/vol a2715241 -
_FILE;GEN 1/fastq/SRR850/001/SR 1/fastq/SRR850/001/SR 220079bb
ERATED_FI R8505151/SRR8505151_1 R8505151/SRR8505151_1 283e2602
LE .fastq.gz;ftp.sra.ebi .fastq.gz;ftp.sra.ebi ad0b2909
.ac.uk/vol1/fastq/SRR .ac.uk/vol1/fastq/SRR ;6bc9650
850/001/SRR8505151/SR 850/001/SRR8505151/SR 0bf70549
R8505151_2.fastq.gz R8505151_2.fastq.gz 5fb6ae4f
8f8feb85
2
GENERATED ftp.sra.ebi.ac.uk/vol ftp.sra.ebi.ac.uk/vol 93205309 -
_FILE;GEN 1/fastq/SRR902/002/SR 1/fastq/SRR902/002/SR 0feb8c08
ERATED_FI R9023262/SRR9023262_1 R9023262/SRR9023262_1 e0db953d
LE .fastq.gz;ftp.sra.ebi .fastq.gz;ftp.sra.ebi d463598c
.ac.uk/vol1/fastq/SRR .ac.uk/vol1/fastq/SRR ;e60592d
902/002/SRR9023262/SR 902/002/SRR9023262/SR eb6a7a4f
R9023262_2.fastq.gz R9023262_2.fastq.gz 33cd5163
a925291e
9
GENERATED ftp.sra.ebi.ac.uk/vol ftp.sra.ebi.ac.uk/vol f74ebb9b -
_FILE;GEN 1/fastq/SRR910/001/SR 1/fastq/SRR910/001/SR 7f3f98c1
ERATED_FI R9108621/SRR9108621_1 R9108621/SRR9108621_1 0ce97fec
LE .fastq.gz;ftp.sra.ebi .fastq.gz;ftp.sra.ebi 379b8a19
.ac.uk/vol1/fastq/SRR .ac.uk/vol1/fastq/SRR ;11afabe
910/001/SRR9108621/SR 910/001/SRR9108621/SR ac3bbd09
R9108621_2.fastq.gz R9108621_2.fastq.gz b8932e8b
0aa9a1ea
4
GENERATED ftp.sra.ebi.ac.uk/vol ftp.sra.ebi.ac.uk/vol 2bbf6a59 -
_FILE;GEN 1/fastq/SRR922/008/SR 1/fastq/SRR922/008/SR 84b6a73a
ERATED_FI R9220128/SRR9220128_1 R9220128/SRR9220128_1 02975a17
LE .fastq.gz;ftp.sra.ebi .fastq.gz;ftp.sra.ebi fc39cce4
.ac.uk/vol1/fastq/SRR .ac.uk/vol1/fastq/SRR ;60ea299
922/008/SRR9220128/SR 922/008/SRR9220128/SR 84e452fb
R9220128_2.fastq.gz R9220128_2.fastq.gz 8c4dedf1
9d2878f9
6
GENERATED ftp.sra.ebi.ac.uk/vol ftp.sra.ebi.ac.uk/vol a5f88dd1 -
_FILE;GEN 1/fastq/ERR339/000/ER 1/fastq/ERR339/000/ER 66b70616
ERATED_FI R3393330/ERR3393330_1 R3393330/ERR3393330_1 76a3ec98
LE .fastq.gz;ftp.sra.ebi .fastq.gz;ftp.sra.ebi d06b4ee9
.ac.uk/vol1/fastq/ERR .ac.uk/vol1/fastq/ERR ;64d9cd6
339/000/ERR3393330/ER 339/000/ERR3393330/ER fcfa9f1f
R3393330_2.fastq.gz R3393330_2.fastq.gz abf589b9
82d18408
7
Columns 76-80:
first_created first_public germline hi_c_protocol host
2019-09-20 2019-09-20 - - -
2019-09-20 2019-09-23 - - Homo sapiens
2019-08-01 2019-08-01 - - Homo sapiens
2019-01-30 2019-01-30 - - -
2019-01-30 2019-01-30 - - missing
2019-05-12 2019-05-12 - - -
2019-05-24 2019-05-24 - - cow
2019-08-01 2019-08-01 - - Homo sapiens
2019-06-24 2021-01-19 - - Bos taurus
Columns 81-85:
host_body_sit host_genotyp host_gravidi host_growth_conditi host_phenoty
e e ty ons pe
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
Columns 86-90:
host_scientific_nam
e host_sex host_status host_tax_id identified_by
- - - - -
Homo sapiens - diseased 9606 -
Homo sapiens - - 9606 -
- - - - -
- - - - -
- - - - -
- - - - -
Homo sapiens - - 9606 -
Bos taurus - - 9913 -
Columns 91-95:
instrument_plat investigation_t isolation_sou
form ype isolate rce last_updated
ILLUMINA - - - 2019-09-25
ILLUMINA - Isolate 5 Stool 2019-09-25
ILLUMINA - - - 2019-08-01
ILLUMINA - T1-3 Retail 2019-01-30
chicken
ILLUMINA - - missing 2019-01-30
ILLUMINA - - - 2019-05-12
ILLUMINA - - oral 2019-05-24
ILLUMINA - - - 2019-08-01
ILLUMINA - PAT-17-35506/ meat 2019-06-24
ECX
Columns 96-100:
library_constructi library_gen_p library_la library_max_frag
lat on_protocol rotocol yout ment_size
- - - PAIRED -
- - - PAIRED -
42.36 - - PAIRED -
- - - PAIRED -
- - - PAIRED -
- - - PAIRED -
- - - PAIRED -
42.36 - - PAIRED -
39.399872 - - PAIRED -
Columns 101-105:
library_pre
library_min_f library_pcr_isol library_prep p_date_form
ragment_size library_name ation_protocol _date at
- - - - -
- unspecified - - -
- an_0080_0058_g10 - - -
- T1-3 - - -
- Clonal_chem1_H10 - - -
- iNext_07 - - -
- Nextera XT - - -
library
SEQ000089458
- bf_0095_0059_a9 - - -
- unspecified - - -
Columns 106-110:
library_prep_ library_prep_ library_prep local_environmenta
latitude location _longitude l_context location
- - - - -
- - - - -
- - - excreta material 42.36 N
[ENVO:02000022] 71.06 W
- - - - -
- - - - -
- - - - -
- - - - -
- - - excreta material 42.36 N
[ENVO:02000022] 71.06 W
- - - - 39.399872 N
8.224454 W
Columns 111-115:
marine_reg
location_end location_start lon ion mating_type
- - - - -
- - - - -
42.36 N 71.06 W 42.36 N 71.06 W -71.06 - -
- - - - -
- - - - -
- - - - -
- - - - -
42.36 N 71.06 W 42.36 N 71.06 W -71.06 - -
39.399872 N 39.399872 N -8.224454 - -
8.224454 W 8.224454 W
Columns 116-120:
nominal_le nominal_s pcr_isolation_p
ncbi_reporting_standard ngth dev rotocol ph
- 600 - - -
- 500 - - -
MIGS.ba;MIGS/MIMS/MIMARKS - - - -
.human-gut
Microbe, viral or - - - -
environmental
Microbe, viral or - - - -
environmental
Microbe, viral or - - - -
environmental
Pathogen.env - - - -
MIGS.ba;MIGS/MIMS/MIMARKS - - - -
.human-gut
- 300 - - -
Columns 121-125:
restric
tion_en
restrict zyme_ta
protocol read_st ion_enzy rget_se
project_name _label rand me quence
OXA-48 dissemination during an - - - -
outbreak
Comparing antimicrobial resistance - - - -
prediction pipelines from bacterial
whole genome sequencing data: An
inter-laboratory study
- - - - -
- - - - -
- - - - -
- - - - -
Escherichia coli - - - -
- - - - -
- - - - -
Columns 126-130:
restriction_s rna_integrity rna_prep_3_pro rna_prep_5_pro rna_purity_230
ite _num tocol tocol _ratio
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
Columns 131-135:
rna_puri
ty_280_r rt_prep_ salinit
atio protocol run_alias run_date y
- - webin-reads-p51_OXA-plasmid - -
- - ena-RUN-LONDON SCHOOL OF HYGIENE AND - -
TROPICAL
MEDICINE-20-09-2019-14:10:16:496-7
- - an_0080_0058_g10R1.concat.trim.fastq. - -
gz
- - T1-3_S42_L001_R1_001.fastq - -
- - 140203_MONK_0337_BC3CVWACXX_L7_CGAGGC - -
TG-CTAAGCCT_H10_1_pf.fastq.gz
- - inext7_S32_L001_R1_001.fastq.gz - -
- - MEZEC42_S22_L001_R1_001.fastq.gz - -
- - bf_0095_0059_a9R1.concat.trim.fastq.g - -
z
- - Run_2019_6_24_8_4_22_run293 - -
Columns 136-140:
sample
_captu sample_ sample_
sample re_sta collect materia
_alias tus ion sample_description l
p51_Il - - Sequencing data of the OXA-48 plasmid -
lumina (tranformed into and isolated from E. coli
dh10-beta competent cells)
E - - AMRIL_7 -
an_008 - - Keywords: GSC:MIxS MIGS:5.0 -
0_0058
_g10
TR103 - - T1-3 -
Clonal - - Chemostat 1 (generation 450) Clone H10 -
_chem1
_H10
iNext_ - - Microbe sample from Escherichia coli -
07
Escher - - Whole genome sequencing of cultured E. -
ichia coli as part of Prof. ME El Zowalaty's One
coli health Zoonosis surveillance project for
strain the rapid detection of outbreaks of
MEZEC4 foodborne illnesses and antimicrobial
2 resistance.
bf_009 - - Keywords: GSC:MIxS MIGS:5.0 -
5_0059
_a9
PAT-17 - - - -
-35506
/ECX
Columns 141-145:
sample_prep_i sample_prep_inter sample_stor sample_storage_p sampling_ca
nterval val_units age rocessing mpaign
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
Columns 146-150:
sampling_pl sampling_sit secondary_p secondary_sample_a secondary_study_
atform e roject ccession accession
- - - ERS3746069 ERP117164
- - - ERS3760037 ERP117428
- - - SRS4896365 SRP200548
- - - SRS4303892 SRP182798
- - - SRS4306477 SRP182873
- - - SRS4731159 SRP195771
- - - SRS4805681 SRP197605
- - - SRS4895497 SRP200548
- - - ERS3535664 ERP115939
Columns 151-155:
sequencing_da sequencing_date_ sequencing_lo sequencing_lon sequencing_m
te format cation gitude ethod
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
Columns 156-160:
sequencing_primer sequencing_pri sequencing_primer_
_catalog mer_lot provider serotype serovar
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
Columns 161-165:
sex specimen_voucher sra_aspera sra_bytes sra_file_role
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
Columns 166-170:
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.11.15/x64/bin/pysradb", line 6, in <module>
sys.exit(parse_args())
^^^^^^^^^^^^
File "/home/runner/work/pysradb/pysradb/pysradb/cli.py", line 1859, in parse_args
search(
File "/home/runner/work/pysradb/pysradb/pysradb/cli.py", line 598, in search
_print_save_df(instance.get_df(), saveto)
File "/home/runner/work/pysradb/pysradb/pysradb/cli.py", line 216, in _print_save_df
pretty_print_df(df, enriched_cols=enriched_cols)
File "/home/runner/work/pysradb/pysradb/pysradb/cli.py", line 96, in pretty_print_df
_create_table(
File "/home/runner/work/pysradb/pysradb/pysradb/cli.py", line 165, in _create_table
console.print(table)
File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/console.py", line 1731, in print
extend(render(renderable, render_options))
File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/console.py", line 1339, in render
for render_output in iter_render:
File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/table.py", line 515, in __rich_console__
yield from self._render(console, render_options, widths)
File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/table.py", line 838, in _render
lines = console.render_lines(
^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/console.py", line 1379, in render_lines
lines = list(
^^^^^
File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/segment.py", line 333, in split_and_crop_lines
for segment in segments:
File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/segment.py", line 208, in <genexpr>
result_segments = (
^
File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/console.py", line 1339, in render
for render_output in iter_render:
File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/padding.py", line 97, in __rich_console__
lines = console.render_lines(
^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/console.py", line 1379, in render_lines
lines = list(
^^^^^
File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/segment.py", line 333, in split_and_crop_lines
for segment in segments:
File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/console.py", line 1339, in render
for render_output in iter_render:
File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/text.py", line 696, in __rich_console__
lines = self.wrap(
^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/text.py", line 1244, in wrap
new_lines.justify(
File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/containers.py", line 131, in justify
line.truncate(width, overflow=overflow, pad=True)
File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/text.py", line 883, in truncate
self._text = [f"{self.plain}{' ' * spaces}"]
~~~~^~~~~~~~
TypeError: can't multiply sequence by non-int of type 'numpy.float64'
[ ]: