Query and Search¶

This notebook demonstrates advanced search capabilities to find SRA studies based on specific criteria.

[1]:

# Install pysradb if not already installed
try:
    import pysradb

    print(f"pysradb {pysradb.__version__} is already installed")
except ImportError:
    print("Installing pysradb from GitHub...")
    import sys

    !{sys.executable} -m pip install -q git+https://github.com/saketkc/pysradb
    print("pysradb installed successfully!")

pysradb 3.0.0.dev0 is already installed

/home/runner/work/pysradb/pysradb/pysradb/download.py:15: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)
  from tqdm.autonotebook import tqdm

pysradb search¶

The pysradb search module supports querying the Sequence Read Archive (SRA) and the European Nucleotide Archive (ENA) databases for sequencing data. The module also includes several built-in flags that can be used to fine-tune a search query.¶

[2]:

%%html
<style>
th {font-size: 16px;}
td {font-size: 14px;}
td:first-child {font-size: 15px; font-weight: 500;}
</style>

Terminal flags for the pysradb search module:¶

Flags	Explanation
-h, –help	Displays the help message
–saveto	Saves the result in the file specified by the user.Supported file types: txt, tsv, csv
–db	Selects the database (SRA, ENA, or both SRA and Geo DataSets) to query. Default database is SRA. Accepted inputs: sra, ena, geo
-v, –verbosity	This determines how much details are retrieved and shown in the search result: 0: run_accession only 1: run_accession and experiment_description only 2: (default) study_accession, experiment_accession, experiment_title, description, tax_id, scientific_name, library_strategy, library_source, library_selection, sample_accession, sample_title, instrument_model, run_accession, read_count, base_count 3: Everything in verbosity level 2, followed by all other retrievable information from the database
-m, –max	Maximum number of returned entries. Default number is 20.Note: If the maximum number set is large, querying the SRA and GEO DataSets databases will take significantly longer due to API limits
-q, –query	The main query string. Note: if this flag is not used, at least one of the following flags must be supplied:
–accession	A relevant study / experiment / sample / run accession number
–organism	Scientific name of the sample organism
–layout	Library layout. Accepted inputs: single, paired
–mbases	Size of the sample rounded to the nearest megabase
–publication-date	The publication date of the run in the format dd-mm-yyyy. If a date range is desired, enter the start date, followed by end date, separated by a colon ‘:’ in the format dd-mm-yyyy:dd-mm-yyyy Example: 01-01-2010:31-12-2010
–platform	Sequencing platform used for the run. Possible inputs: illumina, ion torrent, oxford nanopore
–selection	Library selection. Possible inputs: cdna, chip, dnase, pcr, polya
–source	Library source. Possible inputs: genomic, metagenomic, transcriptomic
–strategy	Library Preparation strategy. Possible inputs: wgs, amplicon, rna seq
–title	Title of the experiment associated with the run
–geo-query	The main query string to be sent to Geo DataSets
–geo-dataset-type	Dataset type. Possible inputs: expression profiling by array, expression profiling by high throughput sequencing, non coding rna profiling by high throughput sequencing
–geo-entry-type	Entry type. Accepted inputs: gds, gpl, gse, gsm

Using pysradb search in python:¶

pysradb search organises each search query as a instance of either the SraSearch, EnaSearch or the GeoSearch classes. These classes takes in the following parameters in their constructor:¶

SraSearch (verbosity=2, return_max=20, query=None, accession=None, organism=None, layout=None, mbases=None, publication_date=None, platform=None, selection=None, source=None, strategy=None, title=None, suppress_validation=False,)

EnaSearch (verbosity=2, return_max=20, query=None, accession=None, organism=None, layout=None, mbases=None, publication_date=None, platform=None, selection=None, source=None, strategy=None, title=None, suppress_validation=False,)

GeoSearch (verbosity=2, return_max=20, query=None, accession=None, organism=None, layout=None, mbases=None, publication_date=None, platform=None, selection=None, source=None, strategy=None, title=None, geo_query=None, geo_dataset_type=None, geo_entry_type=None, suppress_validation=False,)

Parameters	Explanations
verbosity	This determines how much details are retrieved and shown in the search result (default=2). Same as -v / –verbosity on terminal
return_max	Maximum number of returned entries (default=20). Same as -m / –max on terminal
suppress_validation	Defaults to False. If this is set to True, the user input format checks will be skipped. Setting this to True may cause the program to behave in unexpected ways, but allows the user to search queries that does not pass the format check.

Other parameters match the command line flags of the same name.

To query the SRA database for ribosome profiling, expecting an output of verbosity level 2, and returning at most 5 entries, we can do the following:¶

[3]:

from pysradb.search import SraSearch

instance = SraSearch(2, 5, query="ribosome profiling")
instance.search()
df = instance.get_df()
print(df)

  study_accession experiment_accession  \
0       ERP194994          ERX16775590
1       ERP194994          ERX16775596
2       ERP194994          ERX16775593
3       ERP194994          ERX16775591
4       ERP194994          ERX16775587

                                    experiment_title sample_taxon_id  \
0  Ribo-seq of wild-type and feronia mutant tomat...            4081
1  Ribo-seq of wild-type and feronia mutant tomat...            4081
2  Ribo-seq of wild-type and feronia mutant tomat...            4081
3  Ribo-seq of wild-type and feronia mutant tomat...            4081
4  Ribo-seq of wild-type and feronia mutant tomat...            4081

  sample_scientific_name experiment_library_strategy  \
0   Solanum lycopersicum                       OTHER
1   Solanum lycopersicum                       OTHER
2   Solanum lycopersicum                       OTHER
3   Solanum lycopersicum                       OTHER
4   Solanum lycopersicum                       OTHER

  experiment_library_source experiment_library_selection sample_accession  \
0                     OTHER           size fractionation      ERS30508994
1                     OTHER           size fractionation      ERS30509000
2                     OTHER           size fractionation      ERS30508997
3                     OTHER           size fractionation      ERS30508995
4                     OTHER           size fractionation      ERS30508991

     sample_alias experiment_instrument_model pool_member_spots   run_1_size  \
0  SAMEA122821969            Illumina HiSeq X         134949006  18939008582
1  SAMEA122821975            Illumina HiSeq X         149002072  20671941213
2  SAMEA122821972            Illumina HiSeq X         122667204  16900044793
3  SAMEA122821970            Illumina HiSeq X         112735318  15334280969
4  SAMEA122821966            Illumina HiSeq X         149093385  20676582290

  run_1_accession run_1_total_spots run_1_total_bases
0     ERR17386225         134949006       40484701800
1     ERR17386231         149002072       44700621600
2     ERR17386228         122667204       36800161200
3     ERR17386226         112735318       33820595400
4     ERR17386222         149093385       44728015500

Quickstart¶

To query ENA instead, replace SraSearch class with the EnaSearch class:¶

[4]:

from pysradb.search import EnaSearch

instance = EnaSearch(2, 5, "ribosome profiling")
instance.search()
df = instance.get_df()
print(df)

Empty DataFrame
Columns: []
Index: []

To query GEO DataSets instead and retrieve the metadata of linked entries in SRA:¶

[5]:

from pysradb.search import GeoSearch

instance = GeoSearch(2, 5, geo_query="ribosome profiling")
instance.search()
df = instance.get_df()
print(df)

No results found for the following search query:
 SRA: {'query': None, 'accession': None, 'organism': None, 'layout': None, 'mbases': None, 'publication_date': None, 'platform': None, 'selection': None, 'source': None, 'strategy': None, 'title': None}
GEO DataSets: {'query': 'ribosome profiling AND gds sra[Filter]', 'dataset_type': None, 'entry_type': None, 'publication_date': None, 'organism': None}
Empty DataFrame
Columns: []
Index: []

7. Querying GEO DataSets with publication_date filter and displaying publication dates in results:¶

[6]:

from pysradb.search import GeoSearch

# Search for RNA-Seq datasets published in September 2024
# Using verbosity=3 to get all available fields including publication_date
instance = GeoSearch(
    verbosity=3,
    return_max=5,
    geo_query="RNA-Seq",
    publication_date="01-09-2024:30-09-2024",
)
try:
    instance.search()
    df = instance.get_df()

    # Display select columns including publication_date
    if not df.empty and "publication_date" in df.columns:
        cols_to_show = [
            "study_accession",
            "experiment_accession",
            "sample_scientific_name",
            "experiment_library_strategy",
            "publication_date",
        ]
        available_cols = [c for c in cols_to_show if c in df.columns]
        print(df[available_cols])
    else:
        print(df)
except Exception as exc:
    print(
        f"GEO search example skipped because the live service returned: {type(exc).__name__}"
    )

  study_accession experiment_accession  \
0       SRP525424          SRX25655372
1       SRP525424          SRX25655371
2       SRP525424          SRX25655370
3       SRP525424          SRX25655369
4       SRP525424          SRX25655368

                                    experiment_title sample_taxon_id  \
0  GSM8449964: mm202C-sgPcgf1_RNA_4-4_steadystate...           10090
1  GSM8449963: mm202C-sgPcgf1_RNA_4-4_steadystate...           10090
2  GSM8449962: mm202C-sgPcgf1_RNA_4-4_steadystate...           10090
3  GSM8449961: mm202C-sgLuc_RNA_steadystate_REP3_...           10090
4  GSM8449960: mm202C-sgLuc_RNA_steadystate_REP2_...           10090

  sample_scientific_name experiment_library_strategy  \
0           Mus musculus                     RNA-Seq
1           Mus musculus                     RNA-Seq
2           Mus musculus                     RNA-Seq
3           Mus musculus                     RNA-Seq
4           Mus musculus                     RNA-Seq

  experiment_library_source experiment_library_selection sample_accession  \
0            TRANSCRIPTOMIC                         cDNA      SRS22298113
1            TRANSCRIPTOMIC                         cDNA      SRS22298112
2            TRANSCRIPTOMIC                         cDNA      SRS22298111
3            TRANSCRIPTOMIC                         cDNA      SRS22298110
4            TRANSCRIPTOMIC                         cDNA      SRS22298109

  sample_alias  ... study_link_1_type study_link_1_value_1  \
0   GSM8449964  ...         XREF_LINK           DB: pubmed
1   GSM8449963  ...         XREF_LINK           DB: pubmed
2   GSM8449962  ...         XREF_LINK           DB: pubmed
3   GSM8449961  ...         XREF_LINK           DB: pubmed
4   GSM8449960  ...         XREF_LINK           DB: pubmed

  study_link_1_value_2                               study_study_abstract  \
0         ID: 39475509  Translocations involving the Nucleoporin 98 (N...
1         ID: 39475509  Translocations involving the Nucleoporin 98 (N...
2         ID: 39475509  Translocations involving the Nucleoporin 98 (N...
3         ID: 39475509  Translocations involving the Nucleoporin 98 (N...
4         ID: 39475509  Translocations involving the Nucleoporin 98 (N...

                                   study_study_title  \
0  Non-Canonical PRC1.1 is required for the activ...
1  Non-Canonical PRC1.1 is required for the activ...
2  Non-Canonical PRC1.1 is required for the activ...
3  Non-Canonical PRC1.1 is required for the activ...
4  Non-Canonical PRC1.1 is required for the activ...

  study_study_type_existing_study_type submission_accession submission_alias  \
0               Transcriptome Analysis           SRA2184187      SUB15505098
1               Transcriptome Analysis           SRA2184187      SUB15505098
2               Transcriptome Analysis           SRA2184187      SUB15505098
3               Transcriptome Analysis           SRA2184187      SUB15505098
4               Transcriptome Analysis           SRA2184187      SUB15505098

                              submission_center_name submission_lab_name
0  Armstrong lab, pediatric oncology, Dana Farber...                 NaN
1  Armstrong lab, pediatric oncology, Dana Farber...                 NaN
2  Armstrong lab, pediatric oncology, Dana Farber...                 NaN
3  Armstrong lab, pediatric oncology, Dana Farber...                 NaN
4  Armstrong lab, pediatric oncology, Dana Farber...                 NaN

[5 rows x 162 columns]

Error Handling¶

When suppress_validation is not set to True, query fields with incorrect entries will raise IncorrectFieldException, which provides the complete list of acceptable inputs for fields such as “selection”, etc:¶

[7]:

# 1. Invalid query entered for "selection"
try:
    SraSearch(selection="Mudkip")
except Exception as exc:
    print(type(exc).__name__, exc)

IncorrectFieldException Incorrect selection: Mudkip
--selection must be one of the following:
5-methylcytidine antibody, CAGE, ChIP, ChIP-Seq, DNase, HMPR, Hybrid Selection,
Inverse rRNA, Inverse rRNA selection, MBD2 protein methyl-CpG binding domain,
MDA, MF, MNase, MSLL, Oligo-dT, PCR, PolyA, RACE, RANDOM, RANDOM PCR, RT-PCR,
Reduced Representation, Restriction Digest, cDNA, cDNA_oligo_dT, cDNA_randomPriming
other, padlock probes capture method, repeat fractionation, size fractionation,
unspecified

[8]:

# 2. Ambiguous query entered for "source":
try:
    EnaSearch(source="metagenomic viral rna ")
except Exception as exc:
    print(type(exc).__name__, exc)

IncorrectFieldException Multiple potential matches have been identified for metagenomic viral rna :
['METAGENOMIC', 'VIRAL RNA']
Please check your input.

Usage Examples:¶

1. Checking the help message on terminal:¶

[9]:

!pysradb search -h

usage: pysradb search [-h] [-o SAVETO] [-s] [-g [GRAPHS]] [-d {ena,geo,sra}]
                      [-v {0,1,2,3}] [--run-description] [--detailed] [-m MAX]
                      [-q QUERY [QUERY ...]] [-A ACCESSION]
                      [-O ORGANISM [ORGANISM ...]] [-L {SINGLE,PAIRED}]
                      [-M MBASES] [-D PUBLICATION_DATE]
                      [-P PLATFORM [PLATFORM ...]]
                      [-E SELECTION [SELECTION ...]] [-C SOURCE [SOURCE ...]]
                      [-S STRATEGY [STRATEGY ...]] [-T TITLE [TITLE ...]] [-I]
                      [-G GEO_QUERY [GEO_QUERY ...]]
                      [-Y GEO_DATASET_TYPE [GEO_DATASET_TYPE ...]]
                      [-Z GEO_ENTRY_TYPE [GEO_ENTRY_TYPE ...]]

options:
  -h, --help            show this help message and exit
  -o SAVETO, --saveto SAVETO
                        Save search result dataframe to file
  -s, --stats           Displays some useful statistics for the search
                        results.
  -g [GRAPHS], --graphs [GRAPHS]
                        Generates graphs to illustrate the search result. By
                        default all graphs are generated. Alternatively,
                        select a subset from the options below in a space-
                        separated string: daterange, organism, source,
                        selection, platform, basecount
  -d {ena,geo,sra}, --db {ena,geo,sra}
                        Select the db API (sra, ena, or geo) to query, default
                        = sra. Note: pysradb search works slightly differently
                        when db = geo. Please refer to 'pysradb search --geo-
                        info' for more details.
  -v {0,1,2,3}, --verbosity {0,1,2,3}
                        Level of search result details (0, 1, 2 or 3), default
                        = 2 0: run accession only 1: run accession and
                        experiment title 2: accession numbers, titles and
                        sequencing information 3: records in 2 and other
                        information such as download url, sample attributes,
                        etc
  --run-description     Displays run accessions and descriptions only.
                        Equivalent to --verbosity 1
  --detailed            Displays detailed search results. Equivalent to
                        --verbosity 3.
  -m MAX, --max MAX     Maximum number of entries to return, default = 20
  -q QUERY [QUERY ...], --query QUERY [QUERY ...]
                        Main query string. Note that if no query is supplied,
                        at least one of the following flags must be present:
  -A ACCESSION, --accession ACCESSION
                        Accession number
  -O ORGANISM [ORGANISM ...], --organism ORGANISM [ORGANISM ...]
                        Scientific name of the sample organism
  -L {SINGLE,PAIRED}, --layout {SINGLE,PAIRED}
                        Library layout. Accepts either SINGLE or PAIRED
  -M MBASES, --mbases MBASES
                        Size of the sample rounded to the nearest megabase
  -D PUBLICATION_DATE, --publication-date PUBLICATION_DATE
                        Publication date of the run in the format dd-mm-yyyy.
                        If a date range is desired, enter the start date,
                        followed by end date, separated by a colon ':'.
                        Example: 01-01-2010:31-12-2010
  -P PLATFORM [PLATFORM ...], --platform PLATFORM [PLATFORM ...]
                        Sequencing platform
  -E SELECTION [SELECTION ...], --selection SELECTION [SELECTION ...]
                        Library selection
  -C SOURCE [SOURCE ...], --source SOURCE [SOURCE ...]
                        Library source
  -S STRATEGY [STRATEGY ...], --strategy STRATEGY [STRATEGY ...]
                        Library preparation strategy
  -T TITLE [TITLE ...], --title TITLE [TITLE ...]
                        Experiment title
  -I, --geo-info        Displays information on how to query GEO DataSets via
                        'pysradb search --db geo ...', including accepted
                        inputs for -G/--geo-query, -Y/--geo-dataset-type and
                        -Z/--geo-entry-type.
  -G GEO_QUERY [GEO_QUERY ...], --geo-query GEO_QUERY [GEO_QUERY ...]
                        Main query string for GEO DataSet. This flag is only
                        used when db is set to be geo.Please refer to 'pysradb
                        search --geo-info' for more details.
  -Y GEO_DATASET_TYPE [GEO_DATASET_TYPE ...], --geo-dataset-type GEO_DATASET_TYPE [GEO_DATASET_TYPE ...]
                        GEO DataSet Type. This flag is only used when --db is
                        set to be geo.Please refer to 'pysradb search --geo-
                        info' for more details.
  -Z GEO_ENTRY_TYPE [GEO_ENTRY_TYPE ...], --geo-entry-type GEO_ENTRY_TYPE [GEO_ENTRY_TYPE ...]
                        GEO Entry Type. This flag is only used when --db is
                        set to be geo.Please refer to 'pysradb search --geo-
                        info' for more details.

2. Searching for 5 illumina sequences related to the covid-19 pandemic on ENA, using the terminal:¶

[10]:

!pysradb search -q covid19 --platform illumina --db ena -m 5

Displaying 15 columns in chunks of 5

Columns 1-5:
         experi                                                               
 study_  ment_a                                                               
 access  ccessi                                                               
 ion     on      experiment_title           description                tax_id 
 PRJEB6  ERX112  Illumina HiSeq 4000        Illumina HiSeq 4000        9606
 5477    85569   sequencing: Blood RNA      sequencing: Blood RNA
                 sequencing was performed   sequencing was performed
                 on a cohort of adults      on a cohort of adults
                 attending the Emergency    attending the Emergency
                 Department with suspected  Department with suspected
                 infection who had          infection who had
                 subsequently-confirmed     subsequently-confirmed
                 viral, bacterial, COVID19  viral, bacterial, COVID19
                 and healthy controls       and healthy controls
 PRJEB6  ERX112  Illumina HiSeq 4000        Illumina HiSeq 4000        9606
 5477    85571   sequencing: Blood RNA      sequencing: Blood RNA
                 sequencing was performed   sequencing was performed
                 on a cohort of adults      on a cohort of adults
                 attending the Emergency    attending the Emergency
                 Department with suspected  Department with suspected
                 infection who had          infection who had
                 subsequently-confirmed     subsequently-confirmed
                 viral, bacterial, COVID19  viral, bacterial, COVID19
                 and healthy controls       and healthy controls
 PRJEB6  ERX112  Illumina HiSeq 4000        Illumina HiSeq 4000        9606
 5477    85572   sequencing: Blood RNA      sequencing: Blood RNA
                 sequencing was performed   sequencing was performed
                 on a cohort of adults      on a cohort of adults
                 attending the Emergency    attending the Emergency
                 Department with suspected  Department with suspected
                 infection who had          infection who had
                 subsequently-confirmed     subsequently-confirmed
                 viral, bacterial, COVID19  viral, bacterial, COVID19
                 and healthy controls       and healthy controls
 PRJEB6  ERX112  Illumina HiSeq 4000        Illumina HiSeq 4000        9606
 5477    85575   sequencing: Blood RNA      sequencing: Blood RNA
                 sequencing was performed   sequencing was performed
                 on a cohort of adults      on a cohort of adults
                 attending the Emergency    attending the Emergency
                 Department with suspected  Department with suspected
                 infection who had          infection who had
                 subsequently-confirmed     subsequently-confirmed
                 viral, bacterial, COVID19  viral, bacterial, COVID19
                 and healthy controls       and healthy controls
 PRJEB6  ERX112  Illumina HiSeq 4000        Illumina HiSeq 4000        9606
 5477    85582   sequencing: Blood RNA      sequencing: Blood RNA
                 sequencing was performed   sequencing was performed
                 on a cohort of adults      on a cohort of adults
                 attending the Emergency    attending the Emergency
                 Department with suspected  Department with suspected
                 infection who had          infection who had
                 subsequently-confirmed     subsequently-confirmed
                 viral, bacterial, COVID19  viral, bacterial, COVID19
                 and healthy controls       and healthy controls

Columns 6-10:
 scientific_nam  library_strate  library_sourc  library_select  sample_access 
 e               gy              e              ion             ion           
 Homo sapiens    RNA-Seq         TRANSCRIPTOMI  Inverse rRNA    SAMEA11430872
                                 C                              7
 Homo sapiens    RNA-Seq         TRANSCRIPTOMI  Inverse rRNA    SAMEA11430872
                                 C                              9
 Homo sapiens    RNA-Seq         TRANSCRIPTOMI  Inverse rRNA    SAMEA11430873
                                 C                              0
 Homo sapiens    RNA-Seq         TRANSCRIPTOMI  Inverse rRNA    SAMEA11430873
                                 C                              3
 Homo sapiens    RNA-Seq         TRANSCRIPTOMI  Inverse rRNA    SAMEA11430874
                                 C                              0

Columns 11-15:
 sample_title   instrument_model    run_accession  read_count    base_count   
 Sample 101     Illumina HiSeq      ERR11901282    129097560     19493731560
                4000
 Sample 103     Illumina HiSeq      ERR11901284    126726390     19135684890
                4000
 Sample 104     Illumina HiSeq      ERR11901285    125446130     18942365630
                4000
 Sample 107     Illumina HiSeq      ERR11901288    129507944     19555699544
                4000
 Sample 113     Illumina HiSeq      ERR11901295    126294530     19070474030
                4000

4. Searching for illumina sequences related to the covid-19 pandemic on ENA, within python: (outputs a pandas dataframe)¶

[12]:

from pysradb.search import EnaSearch

instance = EnaSearch(2, 20, query="covid19", platform="illumina")
instance.search()
df = instance.get_df()
print(df)

   study_accession experiment_accession  \
0       PRJEB65477          ERX11285569
1       PRJEB65477          ERX11285571
2       PRJEB65477          ERX11285572
3       PRJEB65477          ERX11285575
4       PRJEB65477          ERX11285582
5       PRJEB65477          ERX11285585
6       PRJEB65477          ERX11285596
7       PRJEB65477          ERX11285597
8       PRJEB65477          ERX11285598
9       PRJEB65477          ERX11285606
10      PRJEB65477          ERX11285610
11      PRJEB65477          ERX11285613
12      PRJEB65477          ERX11285616
13      PRJEB65477          ERX11285617
14      PRJEB65477          ERX11285618
15      PRJEB65477          ERX11285619
16      PRJEB65477          ERX11285621
17      PRJEB65477          ERX11285622
18      PRJEB65477          ERX11285624
19      PRJEB65477          ERX11285635

                                     experiment_title  \
0   Illumina HiSeq 4000 sequencing: Blood RNA sequ...
1   Illumina HiSeq 4000 sequencing: Blood RNA sequ...
2   Illumina HiSeq 4000 sequencing: Blood RNA sequ...
3   Illumina HiSeq 4000 sequencing: Blood RNA sequ...
4   Illumina HiSeq 4000 sequencing: Blood RNA sequ...
5   Illumina HiSeq 4000 sequencing: Blood RNA sequ...
6   Illumina HiSeq 4000 sequencing: Blood RNA sequ...
7   Illumina HiSeq 4000 sequencing: Blood RNA sequ...
8   Illumina HiSeq 4000 sequencing: Blood RNA sequ...
9   Illumina HiSeq 4000 sequencing: Blood RNA sequ...
10  Illumina HiSeq 4000 sequencing: Blood RNA sequ...
11  Illumina HiSeq 4000 sequencing: Blood RNA sequ...
12  Illumina HiSeq 4000 sequencing: Blood RNA sequ...
13  Illumina HiSeq 4000 sequencing: Blood RNA sequ...
14  Illumina HiSeq 4000 sequencing: Blood RNA sequ...
15  Illumina HiSeq 4000 sequencing: Blood RNA sequ...
16  Illumina HiSeq 4000 sequencing: Blood RNA sequ...
17  Illumina HiSeq 4000 sequencing: Blood RNA sequ...
18  Illumina HiSeq 4000 sequencing: Blood RNA sequ...
19  Illumina HiSeq 4000 sequencing: Blood RNA sequ...

                                          description tax_id scientific_name  \
0   Illumina HiSeq 4000 sequencing: Blood RNA sequ...   9606    Homo sapiens
1   Illumina HiSeq 4000 sequencing: Blood RNA sequ...   9606    Homo sapiens
2   Illumina HiSeq 4000 sequencing: Blood RNA sequ...   9606    Homo sapiens
3   Illumina HiSeq 4000 sequencing: Blood RNA sequ...   9606    Homo sapiens
4   Illumina HiSeq 4000 sequencing: Blood RNA sequ...   9606    Homo sapiens
5   Illumina HiSeq 4000 sequencing: Blood RNA sequ...   9606    Homo sapiens
6   Illumina HiSeq 4000 sequencing: Blood RNA sequ...   9606    Homo sapiens
7   Illumina HiSeq 4000 sequencing: Blood RNA sequ...   9606    Homo sapiens
8   Illumina HiSeq 4000 sequencing: Blood RNA sequ...   9606    Homo sapiens
9   Illumina HiSeq 4000 sequencing: Blood RNA sequ...   9606    Homo sapiens
10  Illumina HiSeq 4000 sequencing: Blood RNA sequ...   9606    Homo sapiens
11  Illumina HiSeq 4000 sequencing: Blood RNA sequ...   9606    Homo sapiens
12  Illumina HiSeq 4000 sequencing: Blood RNA sequ...   9606    Homo sapiens
13  Illumina HiSeq 4000 sequencing: Blood RNA sequ...   9606    Homo sapiens
14  Illumina HiSeq 4000 sequencing: Blood RNA sequ...   9606    Homo sapiens
15  Illumina HiSeq 4000 sequencing: Blood RNA sequ...   9606    Homo sapiens
16  Illumina HiSeq 4000 sequencing: Blood RNA sequ...   9606    Homo sapiens
17  Illumina HiSeq 4000 sequencing: Blood RNA sequ...   9606    Homo sapiens
18  Illumina HiSeq 4000 sequencing: Blood RNA sequ...   9606    Homo sapiens
19  Illumina HiSeq 4000 sequencing: Blood RNA sequ...   9606    Homo sapiens

   library_strategy  library_source library_selection sample_accession  \
0           RNA-Seq  TRANSCRIPTOMIC      Inverse rRNA   SAMEA114308727
1           RNA-Seq  TRANSCRIPTOMIC      Inverse rRNA   SAMEA114308729
2           RNA-Seq  TRANSCRIPTOMIC      Inverse rRNA   SAMEA114308730
3           RNA-Seq  TRANSCRIPTOMIC      Inverse rRNA   SAMEA114308733
4           RNA-Seq  TRANSCRIPTOMIC      Inverse rRNA   SAMEA114308740
5           RNA-Seq  TRANSCRIPTOMIC      Inverse rRNA   SAMEA114308743
6           RNA-Seq  TRANSCRIPTOMIC      Inverse rRNA   SAMEA114308754
7           RNA-Seq  TRANSCRIPTOMIC      Inverse rRNA   SAMEA114308755
8           RNA-Seq  TRANSCRIPTOMIC      Inverse rRNA   SAMEA114308756
9           RNA-Seq  TRANSCRIPTOMIC      Inverse rRNA   SAMEA114308764
10          RNA-Seq  TRANSCRIPTOMIC      Inverse rRNA   SAMEA114308768
11          RNA-Seq  TRANSCRIPTOMIC      Inverse rRNA   SAMEA114308771
12          RNA-Seq  TRANSCRIPTOMIC      Inverse rRNA   SAMEA114308774
13          RNA-Seq  TRANSCRIPTOMIC      Inverse rRNA   SAMEA114308775
14          RNA-Seq  TRANSCRIPTOMIC      Inverse rRNA   SAMEA114308776
15          RNA-Seq  TRANSCRIPTOMIC      Inverse rRNA   SAMEA114308777
16          RNA-Seq  TRANSCRIPTOMIC      Inverse rRNA   SAMEA114308779
17          RNA-Seq  TRANSCRIPTOMIC      Inverse rRNA   SAMEA114308780
18          RNA-Seq  TRANSCRIPTOMIC      Inverse rRNA   SAMEA114308782
19          RNA-Seq  TRANSCRIPTOMIC      Inverse rRNA   SAMEA114308793

   sample_title     instrument_model run_accession read_count   base_count
0    Sample 101  Illumina HiSeq 4000   ERR11901282  129097560  19493731560
1    Sample 103  Illumina HiSeq 4000   ERR11901284  126726390  19135684890
2    Sample 104  Illumina HiSeq 4000   ERR11901285  125446130  18942365630
3    Sample 107  Illumina HiSeq 4000   ERR11901288  129507944  19555699544
4    Sample 113  Illumina HiSeq 4000   ERR11901295  126294530  19070474030
5    Sample 116  Illumina HiSeq 4000   ERR11901298  126905710  19162762210
6    Sample 126  Illumina HiSeq 4000   ERR11901309  121282548  18313664748
7    Sample 127  Illumina HiSeq 4000   ERR11901310  125225640  18909071640
8    Sample 128  Illumina HiSeq 4000   ERR11901311  133392106  20142208006
9     Sample 14  Illumina HiSeq 4000   ERR11901319  107446714  16224453814
10    Sample 18  Illumina HiSeq 4000   ERR11901323  103324646  15602021546
11    Sample 20  Illumina HiSeq 4000   ERR11901326  103358972  15607204772
12    Sample 23  Illumina HiSeq 4000   ERR11901329  103959324  15697857924
13    Sample 24  Illumina HiSeq 4000   ERR11901330  106380716  16063488116
14    Sample 25  Illumina HiSeq 4000   ERR11901331  104070702  15714676002
15    Sample 26  Illumina HiSeq 4000   ERR11901332  102391596  15461130996
16    Sample 28  Illumina HiSeq 4000   ERR11901334  108539524  16389468124
17    Sample 29  Illumina HiSeq 4000   ERR11901335  101468438  15321734138
18    Sample 30  Illumina HiSeq 4000   ERR11901337  102349680  15454801680
19    Sample 40  Illumina HiSeq 4000   ERR11901348  112614400  17004774400

5. More complex example:¶

[13]:

from pysradb.search import EnaSearch

instance = EnaSearch(
    3,
    20,
    organism="Escherichia coli",
    layout="Paired",
    mbases=10,
    publication_date="01-01-2019:31-12-2021",
    platform="Illumina",
    selection="random",
    source="Genomic",
    strategy="WGS",
)
try:
    instance.search()
    df = instance.get_df()
    df
except Exception as exc:
    print(
        f"GEO search example skipped because the live service returned: {type(exc).__name__}"
    )

[14]:

sorted(df.columns)

[14]:

['accession',
 'age',
 'aligned',
 'altitude',
 'assembly_quality',
 'assembly_software',
 'bam_aspera',
 'bam_bytes',
 'bam_file_role',
 'bam_ftp',
 'bam_galaxy',
 'bam_md5',
 'base_count',
 'binning_software',
 'bio_material',
 'bisulfite_protocol',
 'broad_scale_environmental_context',
 'broker_name',
 'cage_protocol',
 'cell_line',
 'cell_type',
 'center_name',
 'checklist',
 'chip_ab_provider',
 'chip_protocol',
 'chip_target',
 'collected_by',
 'collection_date',
 'collection_date_end',
 'collection_date_start',
 'completeness_score',
 'contamination_score',
 'control_experiment',
 'country',
 'cultivar',
 'culture_collection',
 'datahub',
 'depth',
 'description',
 'dev_stage',
 'disease',
 'dnase_protocol',
 'ecotype',
 'elevation',
 'environment_biome',
 'environment_feature',
 'environment_material',
 'environmental_medium',
 'environmental_sample',
 'experiment_accession',
 'experiment_alias',
 'experiment_target',
 'experiment_title',
 'experimental_factor',
 'experimental_protocol',
 'extraction_protocol',
 'faang_library_selection',
 'fastq_aspera',
 'fastq_bytes',
 'fastq_file_role',
 'fastq_ftp',
 'fastq_galaxy',
 'fastq_md5',
 'file_location',
 'first_created',
 'first_public',
 'germline',
 'hi_c_protocol',
 'host',
 'host_body_site',
 'host_genotype',
 'host_gravidity',
 'host_growth_conditions',
 'host_phenotype',
 'host_scientific_name',
 'host_sex',
 'host_status',
 'host_tax_id',
 'identified_by',
 'instrument_model',
 'instrument_platform',
 'investigation_type',
 'isolate',
 'isolation_source',
 'last_updated',
 'lat',
 'library_construction_protocol',
 'library_gen_protocol',
 'library_layout',
 'library_max_fragment_size',
 'library_min_fragment_size',
 'library_name',
 'library_pcr_isolation_protocol',
 'library_prep_date',
 'library_prep_date_format',
 'library_prep_latitude',
 'library_prep_location',
 'library_prep_longitude',
 'library_selection',
 'library_source',
 'library_strategy',
 'local_environmental_context',
 'location',
 'location_end',
 'location_start',
 'lon',
 'marine_region',
 'mating_type',
 'ncbi_reporting_standard',
 'nominal_length',
 'nominal_sdev',
 'pcr_isolation_protocol',
 'ph',
 'project_name',
 'protocol_label',
 'read_count',
 'read_strand',
 'restriction_enzyme',
 'restriction_enzyme_target_sequence',
 'restriction_site',
 'rna_integrity_num',
 'rna_prep_3_protocol',
 'rna_prep_5_protocol',
 'rna_purity_230_ratio',
 'rna_purity_280_ratio',
 'rt_prep_protocol',
 'run_accession',
 'run_alias',
 'run_date',
 'salinity',
 'sample_accession',
 'sample_alias',
 'sample_capture_status',
 'sample_collection',
 'sample_description',
 'sample_material',
 'sample_prep_interval',
 'sample_prep_interval_units',
 'sample_storage',
 'sample_storage_processing',
 'sample_title',
 'sampling_campaign',
 'sampling_platform',
 'sampling_site',
 'scientific_name',
 'secondary_project',
 'secondary_sample_accession',
 'secondary_study_accession',
 'sequencing_date',
 'sequencing_date_format',
 'sequencing_location',
 'sequencing_longitude',
 'sequencing_method',
 'sequencing_primer_catalog',
 'sequencing_primer_lot',
 'sequencing_primer_provider',
 'serotype',
 'serovar',
 'sex',
 'specimen_voucher',
 'sra_aspera',
 'sra_bytes',
 'sra_file_role',
 'sra_ftp',
 'sra_galaxy',
 'sra_md5',
 'status',
 'strain',
 'study_accession',
 'study_alias',
 'study_title',
 'sub_species',
 'sub_strain',
 'submission_accession',
 'submission_tool',
 'submitted_aspera',
 'submitted_bytes',
 'submitted_file_role',
 'submitted_format',
 'submitted_ftp',
 'submitted_galaxy',
 'submitted_host_sex',
 'submitted_md5',
 'submitted_read_type',
 'surveillance_target',
 'tag',
 'target_gene',
 'tax_id',
 'tax_lineage',
 'taxonomic_classification',
 'taxonomic_identity_marker',
 'temperature',
 'tissue_lib',
 'tissue_type',
 'transposase_protocol',
 'variety']

[15]:

# https://github.com/saketkc/pysradb/issues/221
instance = GeoSearch(
    publication_date="05-09-2024:06-09-2024", return_max=100, verbosity=3
)
instance.search()
df = instance.get_df()
df

[15]:

	study_accession	experiment_accession	experiment_title	sample_taxon_id	sample_scientific_name	experiment_library_strategy	experiment_library_source	experiment_library_selection	sample_accession	sample_alias	...	study_link_2_type	study_link_2_value_1	study_link_2_value_2	study_study_abstract	study_study_title	study_study_type_existing_study_type	submission_accession	submission_alias	submission_center_name	submission_lab_name
0	SRP531137	SRX25997822	GSM8501051: HEK293, Prm1 negative, Replicate 3...	9606	Homo sapiens	Hi-C	GENOMIC	other	SRS22571584	GSM8501051	...	NaN	NaN	NaN	Although the spatial organization of the genom...	Large-scale manipulation of radial positioning...	Other	SRA1964865	SUB14711595	Technion - Israel Institute of Technology	NaN
1	SRP531137	SRX25997821	GSM8501050: HEK293, Prm1 negative, Replicate 2...	9606	Homo sapiens	Hi-C	GENOMIC	other	SRS22571583	GSM8501050	...	NaN	NaN	NaN	Although the spatial organization of the genom...	Large-scale manipulation of radial positioning...	Other	SRA1964865	SUB14711595	Technion - Israel Institute of Technology	NaN
2	SRP531137	SRX25997820	GSM8501049: HEK293, Prm1 negative, Replicate 1...	9606	Homo sapiens	Hi-C	GENOMIC	other	SRS22571582	GSM8501049	...	NaN	NaN	NaN	Although the spatial organization of the genom...	Large-scale manipulation of radial positioning...	Other	SRA1964865	SUB14711595	Technion - Israel Institute of Technology	NaN
3	SRP531137	SRX25997819	GSM8501048: HEK293, Prm1 positive DAPI low, Hi...	9606	Homo sapiens	Hi-C	GENOMIC	other	SRS22571581	GSM8501048	...	NaN	NaN	NaN	Although the spatial organization of the genom...	Large-scale manipulation of radial positioning...	Other	SRA1964865	SUB14711595	Technion - Israel Institute of Technology	NaN
4	SRP531137	SRX25997818	GSM8501047: HEK293, Prm1 positive DAPI high, H...	9606	Homo sapiens	Hi-C	GENOMIC	other	SRS22571580	GSM8501047	...	NaN	NaN	NaN	Although the spatial organization of the genom...	Large-scale manipulation of radial positioning...	Other	SRA1964865	SUB14711595	Technion - Israel Institute of Technology	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
95	SRP530052	SRX25934604	GSM8493072: HSS-3; Homo sapiens; RNA-Seq	9606	Homo sapiens	RNA-Seq	TRANSCRIPTOMIC	cDNA	SRS22520997	GSM8493072	...	NaN	NaN	NaN	Fluid shear stress (FSS) from blood flow sense...	cSTAR analysis identifies endothelial cell cyc...	Transcriptome Analysis	SRA1960858	SUB14701816	Systems Biology Ireland, University College Du...	NaN
96	SRP530052	SRX25934603	GSM8493071: HSS-2; Homo sapiens; RNA-Seq	9606	Homo sapiens	RNA-Seq	TRANSCRIPTOMIC	cDNA	SRS22520996	GSM8493071	...	NaN	NaN	NaN	Fluid shear stress (FSS) from blood flow sense...	cSTAR analysis identifies endothelial cell cyc...	Transcriptome Analysis	SRA1960858	SUB14701816	Systems Biology Ireland, University College Du...	NaN
97	SRP530052	SRX25934602	GSM8493070: HSS-1; Homo sapiens; RNA-Seq	9606	Homo sapiens	RNA-Seq	TRANSCRIPTOMIC	cDNA	SRS22520995	GSM8493070	...	NaN	NaN	NaN	Fluid shear stress (FSS) from blood flow sense...	cSTAR analysis identifies endothelial cell cyc...	Transcriptome Analysis	SRA1960858	SUB14701816	Systems Biology Ireland, University College Du...	NaN
98	SRP529928	SRX25928352	GSM8492057: NC1 group2; Homo sapiens; RNA-Seq	9606	Homo sapiens	RNA-Seq	TRANSCRIPTOMIC	cDNA	SRS22515729	GSM8492057	...	NaN	NaN	NaN	Adrenoleukodystrophy (ALD) is a rare X-linked ...	Transcriptomic Analysis of Identical Twins wit...	Transcriptome Analysis	SRA1960558	SUB14700372	Southwest Hospital, Army Medical University (T...	NaN
99	SRP529928	SRX25928351	GSM8492056: NC1 group1; Homo sapiens; RNA-Seq	9606	Homo sapiens	RNA-Seq	TRANSCRIPTOMIC	cDNA	SRS22515728	GSM8492056	...	NaN	NaN	NaN	Adrenoleukodystrophy (ALD) is a rare X-linked ...	Transcriptomic Analysis of Identical Twins wit...	Transcriptome Analysis	SRA1960558	SUB14700372	Southwest Hospital, Army Medical University (T...	NaN

100 rows × 647 columns

[16]:

instance = GeoSearch(
    publication_date="04-09-2024:06-09-2024", return_max=1000, verbosity=3
)
try:
    instance.search()
    df = instance.get_df()
    print(df["study_alias"].unique())
except Exception as exc:
    print(
        f"GEO search example skipped because the live service returned: {type(exc).__name__}"
    )

<StringArray>
['GSE276554', 'GSE276553', 'GSE267301', 'GSE276379', 'GSE276473', 'GSE276365',
 'GSE276447', 'GSE276438', 'GSE219269', 'GSE276372', 'GSE276338', 'GSE276337',
 'GSE276206', 'GSE276309', 'GSE276304', 'GSE276245', 'GSE276204', 'GSE276185',
 'GSE276195', 'GSE276192', 'GSE276153', 'GSE276130', 'GSE276122', 'GSE276065',
 'GSE276058', 'GSE276038', 'GSE276037', 'GSE275962', 'GSE275896', 'GSE275863',
 'GSE275777', 'GSE275778', 'GSE275571', 'GSE253407', 'GSE275211', 'GSE274586',
 'GSE274408', 'GSE273907', 'GSE273844', 'GSE273813', 'GSE253145', 'GSE272793',
 'GSE272635', 'GSE272394', 'GSE247707', 'GSE247706', 'GSE272252', 'GSE271996',
 'GSE271746', 'GSE271667', 'GSE271653', 'GSE264193', 'GSE245033', 'GSE268837',
 'GSE267343', 'GSE267342', 'GSE266976', 'GSE266471', 'GSE264212', 'GSE264012',
 'GSE263804', 'GSE263798', 'GSE263549', 'GSE263414', 'GSE263441', 'GSE262699',
 'GSE262282', 'GSE262272', 'GSE262127', 'GSE262125', 'GSE262126']
Length: 71, dtype: str

6. Corresponding terminal command example, with max set to 20:¶

[17]:

!pysradb search --db ena -m 20 -v 3 --organism Escherichia coli --layout Paired --mbases 100 --publication-date 01-01-2019:31-12-2019 --platform illumina --selection random --source Genomic --strategy wgs

Displaying 196 columns in chunks of 5

Columns 1-5:
 study_a  experim                                                             
 ccessio  ent_acc                                                             
 n        ession   experiment_title         description               tax_id  
 PRJEB34  ERX3552  Illumina MiSeq paired    Illumina MiSeq paired     562
 285      114      end sequencing: Raw      end sequencing: Raw
                   reads: p51_OXA-plasmid   reads: p51_OXA-plasmid
 PRJNA51  SRX5306  Illumina MiSeq           Illumina MiSeq            562
 7654     330      sequencing: Sequencing   sequencing: Sequencing
                   of environmental         of environmental samples
                   samples of E. coli       of E. coli collected
                   collected across         across Nottinghamshire
                   Nottinghamshire during   during 2015
                   2015
 PRJNA51  SRX5308  Illumina HiSeq 2000      Illumina HiSeq 2000       562
 7527     971      sequencing: Chemostat    sequencing: Chemostat 1,
                   1, Heneration 450,       Heneration 450, clone
                   clone H10                H10
 PRJNA54  SRX5801  Illumina MiSeq           Illumina MiSeq            562
 1504     137      sequencing: Adapterama   sequencing: Adapterama I
                   I E. coli                E. coli
 PRJNA54  SRX5883  Illumina MiSeq           Illumina MiSeq            562
 1983     127      sequencing: Whole        sequencing: Whole genome
                   genome Illumina MiSeq    Illumina MiSeq sequence
                   sequence of Escherichia  of Escherichia coli
                   coli
 PRJNA54  SRX5990  NextSeq 500 sequencing:  NextSeq 500 sequencing:   562
 4527     327      WGS of Escherichia coli  WGS of Escherichia coli
                   BIOML-A341               BIOML-A341
 PRJEB33  ERX3417  NextSeq 500 paired end   NextSeq 500 paired end    562
 169      194      sequencing               sequencing
 PRJEB34  ERX3552  Illumina HiSeq 4000      Illumina HiSeq 4000       562
 513      075      paired end sequencing    paired end sequencing
 PRJNA54  SRX5991  NextSeq 500 sequencing:  NextSeq 500 sequencing:   562
 4527     194      WGS of Escherichia coli  WGS of Escherichia coli
                   BIOML-A288               BIOML-A288

Columns 6-10:
 scientific_nam  library_strate  library_sourc  library_select  sample_access 
 e               gy              e              ion             ion           
 Escherichia     WGS             GENOMIC        RANDOM          SAMEA5957593
 coli
 Escherichia     WGS             GENOMIC        RANDOM          SAMN10840046
 coli
 Escherichia     WGS             GENOMIC        RANDOM          SAMN10836213
 coli
 Escherichia     WGS             GENOMIC        RANDOM          SAMN11586387
 coli
 Escherichia     WGS             GENOMIC        RANDOM          SAMN11660032
 coli
 Escherichia     WGS             GENOMIC        RANDOM          SAMN11848938
 coli
 Escherichia     WGS             GENOMIC        RANDOM          SAMEA5732361
 coli
 Escherichia     WGS             GENOMIC        RANDOM          SAMEA5789900
 coli
 Escherichia     WGS             GENOMIC        RANDOM          SAMN11848885
 coli

Columns 11-15:
                                     instrume  run_acces  read_cou  base_coun 
 sample_title                        nt_model  sion       nt        t         
 p51_OXA-plasmid                     Illumina  ERR353551  219933    100397176
                                     MiSeq     6
 T1-3                                Illumina  SRR850233  219871    100338769
                                     MiSeq     8
 Microbe sample from Escherichia     Illumina  SRR850515  495205    100031410
 coli                                HiSeq     1
                                     2000
 Microbe sample from Escherichia     Illumina  SRR902326  199252    100024504
 coli                                MiSeq     2
 Pathogen: environmental/food/other  Illumina  SRR910862  202951    100260246
 sample from Escherichia coli        MiSeq     1
 MIGS Cultured Bacterial/Archaeal    NextSeq   SRR922012  352524    99932805
 sample from Escherichia coli        500       8
 -                                   NextSeq   ERR339333  335367    99674857
                                     500       0
 AMRIL_7                             Illumina  ERR353547  530243    99574284
                                     HiSeq     7
                                     4000
 MIGS Cultured Bacterial/Archaeal    NextSeq   SRR922059  365343    100440883
 sample from Escherichia coli        500       7

Columns 16-20:
 accession      age            aligned        altitude       assembly_quality 
 ERR3535516     -              -              -              -
 SRR8502338     -              -              -              -
 SRR8505151     -              -              -              -
 SRR9023262     -              -              -              -
 SRR9108621     -              -              -              -
 SRR9220128     -              -              -              -
 ERR3393330     -              -              -              -
 ERR3535477     -              -              -              -
 SRR9220597     -              -              -              -

Columns 21-25:
 assembly_software   bam_aspera     bam_bytes     bam_file_role  bam_ftp      
 -                   -              -             -              -
 -                   -              -             -              -
 -                   -              -             -              -
 -                   -              -             -              -
 -                   -              -             -              -
 -                   -              -             -              -
 -                   -              -             -              -
 -                   -              -             -              -
 -                   -              -             -              -

Columns 26-30:
                             binning_softwar                bisulfite_protoco 
 bam_galaxy    bam_md5       e                bio_material  l                 
 -             -             -                -             -
 -             -             -                -             -
 -             -             -                -             -
 -             -             -                -             -
 -             -             -                -             -
 -             -             -                -             -
 -             -             -                -             -
 -             -             -                -             -
 -             -             -                -             -

Columns 31-35:
 broad_scale_environmental_conte  broker_na  cage_prot                        
 xt                               me         ocol       cell_line   cell_type 
 -                                -          -          -           -
 -                                -          -          -           -
 -                                -          -          -           -
 -                                -          -          -           -
 -                                -          -          -           -
 bodily fluid material biome      -          -          -           -
 [ENVO:02000019]
 -                                DTU-GE     -          -           -
 -                                -          -          -           -
 bodily fluid material biome      -          -          -           -
 [ENVO:02000019]

Columns 36-40:
                                                     chip_ab  chip_p          
                                            checkli  _provid  rotoco  chip_ta 
 center_name                                st       er       l       rget    
 UMC Utrecht                                ERC0000  -        -       -
                                            11
 SUB5096574                                 -        -        -       -
 SUB5048450                                 -        -        -       -
 SUB5588746                                 -        -        -       -
 SUB5658670                                 -        -        -       -
 SUB5744285                                 -        -        -       -
 Centre for Genomic Epidemiology;National   ERC0000  -        -       -
 Food Institute;Technical University of     29
 Denmark (DTU);Denmark;DTU-GE
 LONDON SCHOOL OF HYGIENE AND TROPICAL      ERC0000  -        -       -
 MEDICINE                                   28
 SUB5744285                                 -        -        -       -

Columns 41-45:
                           collection_  collection_  collection_  completenes 
 collected_by              date         date_end     date_start   s_score     
 -                         -            -            -            -
 -                         2015-10-26   2015-10-26   2015-10-26   -
 -                         missing      -            -            -
 -                         not          -            -            -
                           applicable
 Mohamed Ezzat El          2018-05-24   2018-05-24   2018-05-23   -
 Zowalaty Laboratory
 (VMID)
 -                         2015-12-09   2015-12-09   2015-12-09   -
 DTU                       2017         2017-12-31   2017-01-01   -
 -                         2016         2016-12-31   2016-01-01   -
 -                         2016-01-15   2016-01-15   2016-01-15   -

Columns 46-50:
 contamination  control_exper                                    culture_coll 
 _score         iment          country             cultivar      ection       
 -              -              -                   -             -
 -              -              United Kingdom      -             -
 -              -              missing             -             -
 -              -              -                   -             -
 -              -              South Africa:       -             -
                               Eastern Cape
 -              -              USA:Boston          -             -
 -              -              Portugal            -             -
 -              -              United Kingdom      -             -
 -              -              USA:Boston          -             -

Columns 51-55:
 datahub             depth         dev_stage     disease       dnase_protocol 
 dcc_compare         -             -             -             -
 dcc_compare         -             -             -             -
 dcc_compare         -             -             -             -
 dcc_compare         -             -             -             -
 dcc_compare         -             -             -             -
 dcc_compare         -             -             -             -
 dcc_compare;dcc_br  -             -             -             -
 omhead
 dcc_compare         -             -             -             -
 dcc_compare         -             -             -             -

Columns 56-60:
                                               environment_fea  environment_m 
 ecotype    elevation   environment_biome      ture             aterial       
 -          -           -                      -                -
 -          -           -                      -                -
 -          -           -                      -                -
 -          -           -                      -                -
 -          -           -                      -                -
 -          -           bodily fluid material  excreta          fecal
                        biome [ENVO:02000019]  material         material
                                               [ENVO:02000022]  [ENVO:0000200
                                                                3]
 -          -           -                      -                -
 -          -           -                      -                -
 -          -           bodily fluid material  excreta          fecal
                        biome [ENVO:02000019]  material         material
                                               [ENVO:02000022]  [ENVO:0000200
                                                                3]

Columns 61-65:
 environme  environme                                               experimen 
 ntal_medi  ntal_samp                                   experiment  tal_facto 
 um         le         experiment_alias                 _target     r         
 -          -          webin-reads-p51_OXA-plasmid      -           -
 -          -          T1-3                             -           -
 -          -          Clonal_chem1_H10                 -           -
 -          -          iNext_07                         -           -
 -          -          Nextera XT library SEQ000089458  -           -
 fecal      -          bf_0095_0059_a9                  -           -
 material
 [ENVO:000
 02003]
 -          false      Exp_2019_6_24_8_4_22_293         -           -
 -          -          ena-EXPERIMENT-LONDON SCHOOL OF  -           -
                       HYGIENE AND TROPICAL
                       MEDICINE-20-09-2019-14:10:16:49
                       6-7
 fecal      -          an_0080_0058_g10                 -           -
 material
 [ENVO:000
 02003]

Columns 66-70:
                   faang_l                                                    
 experim  extract  ibrary_                                                    
 ental_p  ion_pro  selecti                                            fastq_b 
 rotocol  tocol    on       fastq_aspera                              ytes    
 -        -        -        fasp.sra.ebi.ac.uk:/vol1/fastq/ERR353/00  3252929
                            6/ERR3535516/ERR3535516_1.fastq.gz;fasp.  9;36908
                            sra.ebi.ac.uk:/vol1/fastq/ERR353/006/ERR  444
                            3535516/ERR3535516_2.fastq.gz
 -        -        -        fasp.sra.ebi.ac.uk:/vol1/fastq/SRR850/00  3090210
                            8/SRR8502338/SRR8502338_1.fastq.gz;fasp.  6;35363
                            sra.ebi.ac.uk:/vol1/fastq/SRR850/008/SRR  087
                            8502338/SRR8502338_2.fastq.gz
 -        -        -        fasp.sra.ebi.ac.uk:/vol1/fastq/SRR850/00  4624917
                            1/SRR8505151/SRR8505151_1.fastq.gz;fasp.  7;46229
                            sra.ebi.ac.uk:/vol1/fastq/SRR850/001/SRR  911
                            8505151/SRR8505151_2.fastq.gz
 -        -        -        fasp.sra.ebi.ac.uk:/vol1/fastq/SRR902/00  3272464
                            2/SRR9023262/SRR9023262_1.fastq.gz;fasp.  9;37407
                            sra.ebi.ac.uk:/vol1/fastq/SRR902/002/SRR  901
                            9023262/SRR9023262_2.fastq.gz
 -        -        -        fasp.sra.ebi.ac.uk:/vol1/fastq/SRR910/00  3435059
                            1/SRR9108621/SRR9108621_1.fastq.gz;fasp.  5;40093
                            sra.ebi.ac.uk:/vol1/fastq/SRR910/001/SRR  732
                            9108621/SRR9108621_2.fastq.gz
 -        -        -        fasp.sra.ebi.ac.uk:/vol1/fastq/SRR922/00  2819205
                            8/SRR9220128/SRR9220128_1.fastq.gz;fasp.  3;28009
                            sra.ebi.ac.uk:/vol1/fastq/SRR922/008/SRR  156
                            9220128/SRR9220128_2.fastq.gz
 -        -        -        fasp.sra.ebi.ac.uk:/vol1/fastq/ERR339/00  3281153
                            0/ERR3393330/ERR3393330_1.fastq.gz;fasp.  5;33340
                            sra.ebi.ac.uk:/vol1/fastq/ERR339/000/ERR  844
                            3393330/ERR3393330_2.fastq.gz
 -        -        -        fasp.sra.ebi.ac.uk:/vol1/fastq/ERR353/00  3507740
                            7/ERR3535477/ERR3535477_1.fastq.gz;fasp.  1;35433
                            sra.ebi.ac.uk:/vol1/fastq/ERR353/007/ERR  727
                            3535477/ERR3535477_2.fastq.gz
 -        -        -        fasp.sra.ebi.ac.uk:/vol1/fastq/SRR922/00  2902200
                            7/SRR9220597/SRR9220597_1.fastq.gz;fasp.  6;29050
                            sra.ebi.ac.uk:/vol1/fastq/SRR922/007/SRR  948
                            9220597/SRR9220597_2.fastq.gz

Columns 71-75:
 fastq_fil                                                fastq_md  file_loca 
 e_role     fastq_ftp              fastq_galaxy           5         tion      
 GENERATED  ftp.sra.ebi.ac.uk/vol  ftp.sra.ebi.ac.uk/vol  ee425499  -
 _FILE;GEN  1/fastq/ERR353/006/ER  1/fastq/ERR353/006/ER  fba579cf
 ERATED_FI  R3535516/ERR3535516_1  R3535516/ERR3535516_1  12fe9028
 LE         .fastq.gz;ftp.sra.ebi  .fastq.gz;ftp.sra.ebi  e9a9c28d
            .ac.uk/vol1/fastq/ERR  .ac.uk/vol1/fastq/ERR  ;f523ce7
            353/006/ERR3535516/ER  353/006/ERR3535516/ER  7e4a1e4a
            R3535516_2.fastq.gz    R3535516_2.fastq.gz    27c46402
                                                          913255c5
                                                          0
 GENERATED  ftp.sra.ebi.ac.uk/vol  ftp.sra.ebi.ac.uk/vol  7e50c3f5  -
 _FILE;GEN  1/fastq/SRR850/008/SR  1/fastq/SRR850/008/SR  d03d8b7a
 ERATED_FI  R8502338/SRR8502338_1  R8502338/SRR8502338_1  a03c8316
 LE         .fastq.gz;ftp.sra.ebi  .fastq.gz;ftp.sra.ebi  41e1ecc3
            .ac.uk/vol1/fastq/SRR  .ac.uk/vol1/fastq/SRR  ;0e87334
            850/008/SRR8502338/SR  850/008/SRR8502338/SR  ede312a4
            R8502338_2.fastq.gz    R8502338_2.fastq.gz    0552551f
                                                          b6331146
                                                          9
 GENERATED  ftp.sra.ebi.ac.uk/vol  ftp.sra.ebi.ac.uk/vol  a2715241  -
 _FILE;GEN  1/fastq/SRR850/001/SR  1/fastq/SRR850/001/SR  220079bb
 ERATED_FI  R8505151/SRR8505151_1  R8505151/SRR8505151_1  283e2602
 LE         .fastq.gz;ftp.sra.ebi  .fastq.gz;ftp.sra.ebi  ad0b2909
            .ac.uk/vol1/fastq/SRR  .ac.uk/vol1/fastq/SRR  ;6bc9650
            850/001/SRR8505151/SR  850/001/SRR8505151/SR  0bf70549
            R8505151_2.fastq.gz    R8505151_2.fastq.gz    5fb6ae4f
                                                          8f8feb85
                                                          2
 GENERATED  ftp.sra.ebi.ac.uk/vol  ftp.sra.ebi.ac.uk/vol  93205309  -
 _FILE;GEN  1/fastq/SRR902/002/SR  1/fastq/SRR902/002/SR  0feb8c08
 ERATED_FI  R9023262/SRR9023262_1  R9023262/SRR9023262_1  e0db953d
 LE         .fastq.gz;ftp.sra.ebi  .fastq.gz;ftp.sra.ebi  d463598c
            .ac.uk/vol1/fastq/SRR  .ac.uk/vol1/fastq/SRR  ;e60592d
            902/002/SRR9023262/SR  902/002/SRR9023262/SR  eb6a7a4f
            R9023262_2.fastq.gz    R9023262_2.fastq.gz    33cd5163
                                                          a925291e
                                                          9
 GENERATED  ftp.sra.ebi.ac.uk/vol  ftp.sra.ebi.ac.uk/vol  f74ebb9b  -
 _FILE;GEN  1/fastq/SRR910/001/SR  1/fastq/SRR910/001/SR  7f3f98c1
 ERATED_FI  R9108621/SRR9108621_1  R9108621/SRR9108621_1  0ce97fec
 LE         .fastq.gz;ftp.sra.ebi  .fastq.gz;ftp.sra.ebi  379b8a19
            .ac.uk/vol1/fastq/SRR  .ac.uk/vol1/fastq/SRR  ;11afabe
            910/001/SRR9108621/SR  910/001/SRR9108621/SR  ac3bbd09
            R9108621_2.fastq.gz    R9108621_2.fastq.gz    b8932e8b
                                                          0aa9a1ea
                                                          4
 GENERATED  ftp.sra.ebi.ac.uk/vol  ftp.sra.ebi.ac.uk/vol  2bbf6a59  -
 _FILE;GEN  1/fastq/SRR922/008/SR  1/fastq/SRR922/008/SR  84b6a73a
 ERATED_FI  R9220128/SRR9220128_1  R9220128/SRR9220128_1  02975a17
 LE         .fastq.gz;ftp.sra.ebi  .fastq.gz;ftp.sra.ebi  fc39cce4
            .ac.uk/vol1/fastq/SRR  .ac.uk/vol1/fastq/SRR  ;60ea299
            922/008/SRR9220128/SR  922/008/SRR9220128/SR  84e452fb
            R9220128_2.fastq.gz    R9220128_2.fastq.gz    8c4dedf1
                                                          9d2878f9
                                                          6
 GENERATED  ftp.sra.ebi.ac.uk/vol  ftp.sra.ebi.ac.uk/vol  a5f88dd1  -
 _FILE;GEN  1/fastq/ERR339/000/ER  1/fastq/ERR339/000/ER  66b70616
 ERATED_FI  R3393330/ERR3393330_1  R3393330/ERR3393330_1  76a3ec98
 LE         .fastq.gz;ftp.sra.ebi  .fastq.gz;ftp.sra.ebi  d06b4ee9
            .ac.uk/vol1/fastq/ERR  .ac.uk/vol1/fastq/ERR  ;64d9cd6
            339/000/ERR3393330/ER  339/000/ERR3393330/ER  fcfa9f1f
            R3393330_2.fastq.gz    R3393330_2.fastq.gz    abf589b9
                                                          82d18408
                                                          7
 GENERATED  ftp.sra.ebi.ac.uk/vol  ftp.sra.ebi.ac.uk/vol  e5fb3426  -
 _FILE;GEN  1/fastq/ERR353/007/ER  1/fastq/ERR353/007/ER  86609929
 ERATED_FI  R3535477/ERR3535477_1  R3535477/ERR3535477_1  865025f6
 LE         .fastq.gz;ftp.sra.ebi  .fastq.gz;ftp.sra.ebi  5b07a0a6
            .ac.uk/vol1/fastq/ERR  .ac.uk/vol1/fastq/ERR  ;1ab3f17
            353/007/ERR3535477/ER  353/007/ERR3535477/ER  124ecdf8
            R3535477_2.fastq.gz    R3535477_2.fastq.gz    408dcf47
                                                          71d2fcd3
                                                          a
 GENERATED  ftp.sra.ebi.ac.uk/vol  ftp.sra.ebi.ac.uk/vol  7c4a1ec4  -
 _FILE;GEN  1/fastq/SRR922/007/SR  1/fastq/SRR922/007/SR  e1dfcfea
 ERATED_FI  R9220597/SRR9220597_1  R9220597/SRR9220597_1  8bfa411a
 LE         .fastq.gz;ftp.sra.ebi  .fastq.gz;ftp.sra.ebi  db6efec0
            .ac.uk/vol1/fastq/SRR  .ac.uk/vol1/fastq/SRR  ;9d0bc10
            922/007/SRR9220597/SR  922/007/SRR9220597/SR  52f9a19b
            R9220597_2.fastq.gz    R9220597_2.fastq.gz    1d636994
                                                          5d223ec0
                                                          7

Columns 76-80:
 first_created    first_public   germline       hi_c_protocol    host         
 2019-09-20       2019-09-20     -              -                -
 2019-01-30       2019-01-30     -              -                -
 2019-01-30       2019-01-30     -              -                missing
 2019-05-12       2019-05-12     -              -                -
 2019-05-24       2019-05-24     -              -                cow
 2019-08-01       2019-08-01     -              -                Homo sapiens
 2019-06-24       2021-01-19     -              -                Bos taurus
 2019-09-20       2019-09-23     -              -                Homo sapiens
 2019-08-01       2019-08-01     -              -                Homo sapiens

Columns 81-85:
 host_body_sit  host_genotyp  host_gravidi  host_growth_conditi  host_phenoty 
 e              e             ty            ons                  pe           
 -              -             -             -                    -
 -              -             -             -                    -
 -              -             -             -                    -
 -              -             -             -                    -
 -              -             -             -                    -
 -              -             -             -                    -
 -              -             -             -                    -
 -              -             -             -                    -
 -              -             -             -                    -

Columns 86-90:
 host_scientific_nam                                                          
 e                    host_sex      host_status   host_tax_id   identified_by 
 -                    -             -             -             -
 -                    -             -             -             -
 -                    -             -             -             -
 -                    -             -             -             -
 -                    -             -             -             -
 Homo sapiens         -             -             9606          -
 Bos taurus           -             -             9913          -
 Homo sapiens         -             diseased      9606          -
 Homo sapiens         -             -             9606          -

Columns 91-95:
 instrument_plat  investigation_t                 isolation_sou               
 form             ype              isolate        rce            last_updated 
 ILLUMINA         -                -              -              2019-09-25
 ILLUMINA         -                T1-3           Retail         2019-01-30
                                                  chicken
 ILLUMINA         -                -              missing        2019-01-30
 ILLUMINA         -                -              -              2019-05-12
 ILLUMINA         -                -              oral           2019-05-24
 ILLUMINA         -                -              -              2019-08-01
 ILLUMINA         -                PAT-17-35506/  meat           2019-06-24
                                   ECX
 ILLUMINA         -                Isolate 5      Stool          2019-09-25
 ILLUMINA         -                -              -              2019-08-01

Columns 96-100:
              library_constructi  library_gen_p  library_la  library_max_frag 
 lat          on_protocol         rotocol        yout        ment_size        
 -            -                   -              PAIRED      -
 -            -                   -              PAIRED      -
 -            -                   -              PAIRED      -
 -            -                   -              PAIRED      -
 -            -                   -              PAIRED      -
 42.36        -                   -              PAIRED      -
 39.399872    -                   -              PAIRED      -
 -            -                   -              PAIRED      -
 42.36        -                   -              PAIRED      -

Columns 101-105:
                                                                  library_pre 
 library_min_f                    library_pcr_isol  library_prep  p_date_form 
 ragment_size   library_name      ation_protocol    _date         at          
 -              -                 -                 -             -
 -              T1-3              -                 -             -
 -              Clonal_chem1_H10  -                 -             -
 -              iNext_07          -                 -             -
 -              Nextera XT        -                 -             -
                library
                SEQ000089458
 -              bf_0095_0059_a9   -                 -             -
 -              unspecified       -                 -             -
 -              unspecified       -                 -             -
 -              an_0080_0058_g10  -                 -             -

Columns 106-110:
 library_prep_  library_prep_  library_prep  local_environmenta               
 latitude       location       _longitude    l_context           location     
 -              -              -             -                   -
 -              -              -             -                   -
 -              -              -             -                   -
 -              -              -             -                   -
 -              -              -             -                   -
 -              -              -             excreta material    42.36 N
                                             [ENVO:02000022]     71.06 W
 -              -              -             -                   39.399872 N
                                                                 8.224454 W
 -              -              -             -                   -
 -              -              -             excreta material    42.36 N
                                             [ENVO:02000022]     71.06 W

Columns 111-115:
                                                      marine_reg              
 location_end        location_start      lon          ion         mating_type 
 -                   -                   -            -           -
 -                   -                   -            -           -
 -                   -                   -            -           -
 -                   -                   -            -           -
 -                   -                   -            -           -
 42.36 N 71.06 W     42.36 N 71.06 W     -71.06       -           -
 39.399872 N         39.399872 N         -8.224454    -           -
 8.224454 W          8.224454 W
 -                   -                   -            -           -
 42.36 N 71.06 W     42.36 N 71.06 W     -71.06       -           -

Columns 116-120:
                            nominal_le  nominal_s  pcr_isolation_p            
 ncbi_reporting_standard    ngth        dev        rotocol          ph        
 -                          600         -          -                -
 Microbe, viral or          -           -          -                -
 environmental
 Microbe, viral or          -           -          -                -
 environmental
 Microbe, viral or          -           -          -                -
 environmental
 Pathogen.env               -           -          -                -
 MIGS.ba;MIGS/MIMS/MIMARKS  -           -          -                -
 .human-gut
 -                          300         -          -                -
 -                          500         -          -                -
 MIGS.ba;MIGS/MIMS/MIMARKS  -           -          -                -
 .human-gut

Columns 121-125:
                                                                      restric 
                                                                      tion_en 
                                                            restrict  zyme_ta 
                                         protocol  read_st  ion_enzy  rget_se 
 project_name                            _label    rand     me        quence  
 OXA-48 dissemination during an          -         -        -         -
 outbreak
 -                                       -         -        -         -
 -                                       -         -        -         -
 -                                       -         -        -         -
 Escherichia coli                        -         -        -         -
 -                                       -         -        -         -
 -                                       -         -        -         -
 Comparing antimicrobial resistance      -         -        -         -
 prediction pipelines from bacterial
 whole genome sequencing data: An
 inter-laboratory study
 -                                       -         -        -         -

Columns 126-130:
 restriction_s  rna_integrity  rna_prep_3_pro  rna_prep_5_pro  rna_purity_230 
 ite            _num           tocol           tocol           _ratio         
 -              -              -               -               -
 -              -              -               -               -
 -              -              -               -               -
 -              -              -               -               -
 -              -              -               -               -
 -              -              -               -               -
 -              -              -               -               -
 -              -              -               -               -
 -              -              -               -               -

Columns 131-135:
 rna_puri                                                                     
 ty_280_r  rt_prep_                                                   salinit 
 atio      protocol  run_alias                              run_date  y       
 -         -         webin-reads-p51_OXA-plasmid            -         -
 -         -         T1-3_S42_L001_R1_001.fastq             -         -
 -         -         140203_MONK_0337_BC3CVWACXX_L7_CGAGGC  -         -
                     TG-CTAAGCCT_H10_1_pf.fastq.gz
 -         -         inext7_S32_L001_R1_001.fastq.gz        -         -
 -         -         MEZEC42_S22_L001_R1_001.fastq.gz       -         -
 -         -         bf_0095_0059_a9R1.concat.trim.fastq.g  -         -
                     z
 -         -         Run_2019_6_24_8_4_22_run293            -         -
 -         -         ena-RUN-LONDON SCHOOL OF HYGIENE AND   -         -
                     TROPICAL
                     MEDICINE-20-09-2019-14:10:16:496-7
 -         -         an_0080_0058_g10R1.concat.trim.fastq.  -         -
                     gz

Columns 136-140:
         sample                                                               
         _captu  sample_                                              sample_ 
 sample  re_sta  collect                                              materia 
 _alias  tus     ion      sample_description                          l       
 p51_Il  -       -        Sequencing data of the OXA-48 plasmid       -
 lumina                   (tranformed into and isolated from E. coli
                          dh10-beta competent cells)
 TR103   -       -        T1-3                                        -
 Clonal  -       -        Chemostat 1 (generation 450) Clone H10      -
 _chem1
 _H10
 iNext_  -       -        Microbe sample from Escherichia coli        -
 07
 Escher  -       -        Whole genome sequencing of cultured E.      -
 ichia                    coli as part of Prof. ME El Zowalaty's One
 coli                     health Zoonosis surveillance project for
 strain                   the rapid detection of outbreaks of
 MEZEC4                   foodborne illnesses and antimicrobial
 2                        resistance.
 bf_009  -       -        Keywords: GSC:MIxS MIGS:5.0                 -
 5_0059
 _a9
 PAT-17  -       -        -                                           -
 -35506
 /ECX
 E       -       -        AMRIL_7                                     -
 an_008  -       -        Keywords: GSC:MIxS MIGS:5.0                 -
 0_0058
 _g10

Columns 141-145:
 sample_prep_i  sample_prep_inter  sample_stor  sample_storage_p  sampling_ca 
 nterval        val_units          age          rocessing         mpaign      
 -              -                  -            -                 -
 -              -                  -            -                 -
 -              -                  -            -                 -
 -              -                  -            -                 -
 -              -                  -            -                 -
 -              -                  -            -                 -
 -              -                  -            -                 -
 -              -                  -            -                 -
 -              -                  -            -                 -

Columns 146-150:
 sampling_pl  sampling_sit  secondary_p  secondary_sample_a  secondary_study_ 
 atform       e             roject       ccession            accession        
 -            -             -            ERS3746069          ERP117164
 -            -             -            SRS4303892          SRP182798
 -            -             -            SRS4306477          SRP182873
 -            -             -            SRS4731159          SRP195771
 -            -             -            SRS4805681          SRP197605
 -            -             -            SRS4895497          SRP200548
 -            -             -            ERS3535664          ERP115939
 -            -             -            ERS3760037          ERP117428
 -            -             -            SRS4896365          SRP200548

Columns 151-155:
 sequencing_da  sequencing_date_  sequencing_lo  sequencing_lon  sequencing_m 
 te             format            cation         gitude          ethod        
 -              -                 -              -               -
 -              -                 -              -               -
 -              -                 -              -               -
 -              -                 -              -               -
 -              -                 -              -               -
 -              -                 -              -               -
 -              -                 -              -               -
 -              -                 -              -               -
 -              -                 -              -               -

Columns 156-160:
 sequencing_primer  sequencing_pri  sequencing_primer_                        
 _catalog           mer_lot         provider            serotype    serovar   
 -                  -               -                   -           -
 -                  -               -                   -           -
 -                  -               -                   -           -
 -                  -               -                   -           -
 -                  -               -                   -           -
 -                  -               -                   -           -
 -                  -               -                   -           -
 -                  -               -                   -           -
 -                  -               -                   -           -

Columns 161-165:
 sex            specimen_voucher   sra_aspera     sra_bytes     sra_file_role 
 -              -                  -              -             -
 -              -                  -              -             -
 -              -                  -              -             -
 -              -                  -              -             -
 -              -                  -              -             -
 -              -                  -              -             -
 -              -                  -              -             -
 -              -                  -              -             -
 -              -                  -              -             -

Columns 166-170:
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.11.15/x64/bin/pysradb", line 6, in <module>
    sys.exit(parse_args())
             ^^^^^^^^^^^^
  File "/home/runner/work/pysradb/pysradb/pysradb/cli.py", line 1905, in parse_args
    search(
  File "/home/runner/work/pysradb/pysradb/pysradb/cli.py", line 598, in search
    _print_save_df(instance.get_df(), saveto)
  File "/home/runner/work/pysradb/pysradb/pysradb/cli.py", line 216, in _print_save_df
    pretty_print_df(df, enriched_cols=enriched_cols)
  File "/home/runner/work/pysradb/pysradb/pysradb/cli.py", line 96, in pretty_print_df
    _create_table(
  File "/home/runner/work/pysradb/pysradb/pysradb/cli.py", line 165, in _create_table
    console.print(table)
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/console.py", line 1731, in print
    extend(render(renderable, render_options))
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/console.py", line 1339, in render
    for render_output in iter_render:
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/table.py", line 515, in __rich_console__
    yield from self._render(console, render_options, widths)
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/table.py", line 838, in _render
    lines = console.render_lines(
            ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/console.py", line 1379, in render_lines
    lines = list(
            ^^^^^
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/segment.py", line 333, in split_and_crop_lines
    for segment in segments:
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/segment.py", line 208, in <genexpr>
    result_segments = (
                      ^
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/console.py", line 1339, in render
    for render_output in iter_render:
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/padding.py", line 97, in __rich_console__
    lines = console.render_lines(
            ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/console.py", line 1379, in render_lines
    lines = list(
            ^^^^^
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/segment.py", line 333, in split_and_crop_lines
    for segment in segments:
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/console.py", line 1339, in render
    for render_output in iter_render:
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/text.py", line 696, in __rich_console__
    lines = self.wrap(
            ^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/text.py", line 1244, in wrap
    new_lines.justify(
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/containers.py", line 131, in justify
    line.truncate(width, overflow=overflow, pad=True)
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/rich/text.py", line 883, in truncate
    self._text = [f"{self.plain}{' ' * spaces}"]
                                 ~~~~^~~~~~~~
TypeError: can't multiply sequence by non-int of type 'numpy.float64'

[ ]: