Skip to main content
Ctrl+K
Logo image

Site Navigation

  • Installation
  • Quickstart
  • CLI
  • Python API
  • Case Studies
  • API Documentation
  • Contributing
  • Credits
  • History
  • pysradb

Site Navigation

  • Installation
  • Quickstart
  • CLI
  • Python API
  • Case Studies
  • API Documentation
  • Contributing
  • Credits
  • History
  • pysradb

Section Navigation

  • download
  • metadata
  • Metadata Search
  • gse-to-gsm
  • gse-to-srp
  • srp-to-gse
  • srp-to-srr
  • srp-to-srs
  • srp-to-srx
  • srr-to-srs
  • srr-to-srx
  • srs-to-srx
  • srx-to-srr
  • srx-to-srs
  • Quickstart
  • Metadata Search

Metadata Search#

As a python module, pysradb search organises each search query as an instance of either the SraSearch, EnaSearch or the GeoSearch class. These classes take in the following parameters in their constructor:

SraSearch (verbosity=2, return_max=20, query=None, accession=None, organism=None, layout=None, mbases=None, publication_date=None, platform=None, selection=None, source=None, strategy=None, title=None, suppress_validation=False,)


EnaSearch (verbosity=2, return_max=20, query=None, accession=None, organism=None, layout=None, mbases=None, publication_date=None, platform=None, selection=None, source=None, strategy=None, title=None, suppress_validation=False,)


GeoSearch (verbosity=2, return_max=20, query=None, accession=None, organism=None, layout=None, mbases=None, publication_date=None, platform=None, selection=None, source=None, strategy=None, title=None, geo_query=None, geo_dataset_type=None, geo_entry_type=None, suppress_validation=False,)



Parameters#

verbosityint

This determines how much details are retrieved and shown in the search result:

0: run_accession only

1: run_accession and experiment_description only

2: **(default)* study_accession, experiment_accession, experiment_title, description, tax_id, scientific_name, library_strategy, library_source, library_selection, sample_accession, sample_title, instrument_model, run_accession, read_count, base_count, pmid

3: Everything in verbosity level 2, followed by all other retrievable information from the database

return_maxint

Maximum number of returned entries. Default number is 20.

Note

If the maximum number is set to be large, querying the SRA and GEO DataSets databases will take significantly longer due to API limits, retrieving the matadata at about 10000 entries every 5-6 minutes (Shallow tested on Colab). EnaSearch is able to retrieve 500000 entries in ~ 1 minute.

querystr

The main query string. Note: if this parameter is left empty, at least one of the following search parameters must be supplied:

accessionstr

A relevant study / experiment / sample / run accession number

organismstr

The scientific name of the sample organism

layoutstr

Library layout. Accepted inputs: single, paired

mbasesint

Size of the sample rounded to the nearest megabase

publication_datestr

The publication date of the run in the format dd-mm-yyyy. If a date range is desired, enter the start date, followed by end date, separated by a colon ‘:’ in the format dd-mm-yyyy:dd-mm-yyyy

Example: 01-01-2010:31-12-2010

platformstr

Sequencing platform used for the run.

Some possible inputs: illumina, ion torrent, oxford nanopore

selectionstr

Library selection. Some possible inputs: cdna, chip, dnase, pcr, polya

sourcestr

Library source. Some possible inputs: genomic, metagenomic, transcriptomic

strategystr

Library Preparation strategy. Possible inputs: wgs, amplicon, rna seq

titlestr

The title (in part or in whole) of the experiment of interest



GeoSearch specific parameters:#

geo_querystr

The main query string to be sent to Geo DataSets.

geo_dataset_typestr

Dataset type. Possible inputs: expression profiling by array, expression profiling by high throughput sequencing, non coding rna profiling by high throughput sequencing

geo_entry_typestr

Entry type. Accepted inputs: gds, gpl, gse, gsm

GeoSearch works somewhat differently compared to SraSearch and EnaSearch: A query comprising geo_query, geo_dataset_type and geo_entry_type is first sent to GEO Databases. The list of uids in the response are converted to SRA uids via NCBI’s ELink web service. Another set of uids is retrieved from SRA using the SRA search parameters, which has an added search parameter of filter[“sra gds”], which restricts the output to those with GEO Dataset accession numbers. The two lists of uids are then unioned together and used to retrieve metadata from SRA.



Command-line Documentation#

$ pysradb search -h
usage: pysradb search [-h] [-o SAVETO] [-s] [-g [GRAPHS]] [-d {ena,geo,sra}]
                      [-v {0,1,2,3}] [--run-description] [--detailed] [-m MAX]
                      [-q QUERY [QUERY ...]] [-A ACCESSION]
                      [-O ORGANISM [ORGANISM ...]] [-L {SINGLE,PAIRED}]
                      [-M MBASES] [-D PUBLICATION_DATE]
                      [-P PLATFORM [PLATFORM ...]]
                      [-E SELECTION [SELECTION ...]] [-C SOURCE [SOURCE ...]]
                      [-S STRATEGY [STRATEGY ...]] [-T TITLE [TITLE ...]]
                      [-G GEO_QUERY [GEO_QUERY ...]]
                      [-Y GEO_DATASET_TYPE [GEO_DATASET_TYPE ...]]
                      [-Z GEO_ENTRY_TYPE [GEO_ENTRY_TYPE ...]]

optional arguments:
  -h, --help            show this help message and exit
  -o SAVETO, --saveto SAVETO
                        Save search result dataframe to file
  -s, --stats           Displays some useful statistics for the search
                        results.
  -g [GRAPHS], --graphs [GRAPHS]
                        Generates graphs to illustrate the search result. By
                        default all graphs are generated. Alternatively,
                        select a subset from the options below in a space-
                        separated string: daterange, organism, source,
                        selection, platform, basecount
  -d {ena,geo,sra}, --db {ena,geo,sra}
                        Select the db API (sra, ena, or geo) to query, default
                        = sra
  -v {0,1,2,3}, --verbosity {0,1,2,3}
                        Level of search result details (0, 1, 2 or 3), default
                        = 2
  --run-description     Displays run accessions and descriptions only.
                        Equivalent to --verbosity 1
  --detailed            Displays detailed search results. Equivalent to
                        --verbosity 3.
  -m MAX, --max MAX     Maximum number of entries to return, default = 20
  -q QUERY [QUERY ...], --query QUERY [QUERY ...]
                        Main query string. Note that if no query is supplied,
                        at least one of the following flags must be present:
  -A ACCESSION, --accession ACCESSION
                        Accession number
  -O ORGANISM [ORGANISM ...], --organism ORGANISM [ORGANISM ...]
                        Scientific name of the sample organism
  -L {SINGLE,PAIRED}, --layout {SINGLE,PAIRED}
                        Library layout
  -M MBASES, --mbases MBASES
                        Size of the sample rounded to the nearest megabase
  -D PUBLICATION_DATE, --publication-date PUBLICATION_DATE
                        Publication date of the run in the format dd-mm-yyyy.
                        If a date range is desired, enter the start date,
                        followed by end date, separated by a colon ':'.
                        Example: 01-01-2010:31-12-2010
  -P PLATFORM [PLATFORM ...], --platform PLATFORM [PLATFORM ...]
                        Sequencing platform
  -E SELECTION [SELECTION ...], --selection SELECTION [SELECTION ...]
                        Library selection
  -C SOURCE [SOURCE ...], --source SOURCE [SOURCE ...]
                        Library source
  -S STRATEGY [STRATEGY ...], --strategy STRATEGY [STRATEGY ...]
                        Library preparation strategy
  -T TITLE [TITLE ...], --title TITLE [TITLE ...]
                        Experiment title
  -G GEO_QUERY [GEO_QUERY ...], --geo-query GEO_QUERY [GEO_QUERY ...]
                        Main query string for GEO DataSet. This flag is only
                        used when db is set to be geo.
  -Y GEO_DATASET_TYPE [GEO_DATASET_TYPE ...], --geo-dataset-type GEO_DATASET_TYPE [GEO_DATASET_TYPE ...]
                        GEO DataSet Type. This flag is only used when --db is
                        set to be geo.
  -Z GEO_ENTRY_TYPE [GEO_ENTRY_TYPE ...], --geo-entry-type GEO_ENTRY_TYPE [GEO_ENTRY_TYPE ...]
                        GEO Entry Type. This flag is only used when --db is
                        set to be geo.


Usage Examples#

The features shown below for pysradb search are accessible from both the command-line interface and within python. Metadata retrieved are printed out on the command line or returned as a pandas DataFrame object respectively.

pysradb usage on the two platforms will be displayed by selecting the corresponding tab below.

Searching SRA database and retrieving metadata#

Let’s take for example we are interested in coronavirus sequences published on Short Reads Archive (SRA) in the first week of August 2020.

$ pysradb search -q coronavirus --publication-date 01-08-2020:07-08-2020
from pysradb.search import SraSearch

instance = SraSearch(query="coronavirus", publication_date="01-08-2020:07-08-2020")
instance.search()
instance.get_df()

Output:

study_accession  experiment_accession    experiment_title        sample_taxon_id sample_scientific_name  experiment_library_strategy     experiment_library_source       experiment_library_selection    sample_accession        sample_alias    experiment_instrument_model     pool_member_spots       run_1_size     run_1_accession  run_1_total_spots       run_1_total_bases       pmid
SRP270658       SRX8679965      GSM4658808: SARS-CoV-2-infected 24h 3; Chlorocebus sabaeus; Severe acute respiratory syndrome coronavirus 2; RNA-Seq    60711   Chlorocebus sabaeus     RNA-Seq TRANSCRIPTOMIC  cDNA    SRS6959042      GSM4658808      NextSeq 500     104223040       9743267247      SRR12164500     104223040       31475358080     11295714
SRP270658       SRX8679964      GSM4658807: SARS-CoV-2-infected 24h 2; Chlorocebus sabaeus; Severe acute respiratory syndrome coronavirus 2; RNA-Seq    60711   Chlorocebus sabaeus     RNA-Seq TRANSCRIPTOMIC  cDNA    SRS6959041      GSM4658807      NextSeq 500     92813819        8703506222      SRR12164499     92813819        28029773338     11295713
SRP253798       SRX8677889      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956975      hCoV-19/Australia/VIC1898/2020  NextSeq 500    456828   51422072        SRR12162149     456828  130280958       11292876
SRP253798       SRX8677888      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956974      hCoV-19/Australia/VIC1886/2020  NextSeq 500    268832   29923966        SRR12162150     268832  75885223        11292875
SRP253798       SRX8677887      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956973      hCoV-19/Australia/VIC1890/2020  NextSeq 500    483526   54629557        SRR12162151     483526  139019404       11292874
SRP253798       SRX8677886      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956971      hCoV-19/Australia/VIC1888/2020  NextSeq 500    473895   53675126        SRR12162152     473895  136058655       11292873
SRP253798       SRX8677885      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956972      hCoV-19/Australia/VIC1891/2020  NextSeq 500    482373   53331905        SRR12162153     482373  135769259       11292872
SRP253798       SRX8677884      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956970      hCoV-19/Australia/VIC1816/2020  NextSeq 550    357052   41111134        SRR12162154     357052  103693201       11292871
SRP253798       SRX8677883      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956969      hCoV-19/Australia/VIC1815/2020  NextSeq 550    307106   35306959        SRR12162155     307106  89866234        11292870
SRP253798       SRX8677882      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956968      hCoV-19/Australia/VIC1814/2020  NextSeq 550    353704   40652239        SRR12162156     353704  103366580       11292869
SRP253798       SRX8677881      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956967      hCoV-19/Australia/VIC1813/2020  NextSeq 550    327705   38035344        SRR12162157     327705  95931939        11292868
SRP253798       SRX8677880      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956966      hCoV-19/Australia/VIC1812/2020  NextSeq 550    321428   36795893        SRR12162158     321428  92821030        11292867
SRP253798       SRX8677879      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956965      hCoV-19/Australia/VIC1865/2020  NextSeq 500    565592   61755215        SRR12162159     565592  156629119       11292866
SRP253798       SRX8677878      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956964      hCoV-19/Australia/VIC1811/2020  NextSeq 550    295014   33818926        SRR12162160     295014  85816216        11292865
SRP253798       SRX8677877      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956963      hCoV-19/Australia/VIC1809/2020  NextSeq 550    367784   43112211        SRR12162161     367784  107949010       11292864
SRP253798       SRX8677876      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956962      hCoV-19/Australia/VIC1807/2020  NextSeq 550    256832   29447818        SRR12162162     256832  74949831        11292863
SRP253798       SRX8677875      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956961      hCoV-19/Australia/VIC1806/2020  NextSeq 550    317415   36523725        SRR12162163     317415  92494821        11292862
SRP253798       SRX8677874      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956960      hCoV-19/Australia/VIC1805/2020  NextSeq 550    362866   41227860        SRR12162164     362866  105727450       11292861
SRP253798       SRX8677873      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956959      hCoV-19/Australia/VIC1804/2020  NextSeq 550    349048   39605824        SRR12162165     349048  101279219       11292860
SRP253798       SRX8677872      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956958      hCoV-19/Australia/VIC1803/2020  NextSeq 550    273575   31019982        SRR12162166     273575  78519046        11292859

Searching ENA database and retrieving metadata#

To query European Nucleotide Archive (ENA) instead:

$ pysradb search --db ena -q coronavirus --publication-date 01-08-2020:07-08-2020
from pysradb.search import EnaSearch

instance = EnaSearch(query="coronavirus", publication_date="01-08-2020:07-08-2020")
instance.search()
instance.get_df()

Output:

study_accession experiment_accession    experiment_title        description     tax_id  scientific_name library_strategylibrary_source  library_selection       sample_accession        sample_title    instrument_model        run_accession  read_count       base_count
PRJEB12126      ERX1264364      Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    10090   Mus musculus    OTHER   TRANSCRIPTOMIC  other   SAMEA3708907    Sample 1        Illumina HiSeq 2000     ERR1190989      38883498        1161289538
PRJEB12126      ERX1264365      Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    10090   Mus musculus    OTHER   TRANSCRIPTOMIC  other   SAMEA3708908    Sample 10       Illumina HiSeq 2000     ERR1190990      55544297        1779600908
PRJEB12126      ERX1264366      Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    10090   Mus musculus    OTHER   TRANSCRIPTOMIC  other   SAMEA3708909    Sample 11       Illumina HiSeq 2000     ERR1190991      54474851        1713994365
PRJEB12126      ERX1264367      Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    10090   Mus musculus    OTHER   TRANSCRIPTOMIC  other   SAMEA3708910    Sample 12       Illumina HiSeq 2000     ERR1190992      78497711        2489092061
PRJEB12126      ERX1264368      Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    10090   Mus musculus    RNA-Seq TRANSCRIPTOMIC  RANDOM  SAMEA3708911    Sample 13       Illumina HiSeq 2000     ERR1190993      84955423        2627276298
PRJEB12126      ERX1264369      Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    10090   Mus musculus    RNA-Seq TRANSCRIPTOMIC  RANDOM  SAMEA3708912    Sample 14       Illumina HiSeq 2000     ERR1190994      75097651        2293097872
PRJEB12126      ERX1264370      Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    10090   Mus musculus    RNA-Seq TRANSCRIPTOMIC  RANDOM  SAMEA3708913    Sample 15       Illumina HiSeq 2000     ERR1190995      67177553        2060926619
PRJEB12126      ERX1264371      Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    10090   Mus musculus    RNA-Seq TRANSCRIPTOMIC  RANDOM  SAMEA3708914    Sample 16       Illumina HiSeq 2000     ERR1190996      62940694        2061757111
PRJEB12126      ERX1264372      Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    10090   Mus musculus    RNA-Seq TRANSCRIPTOMIC  RANDOM  SAMEA3708915    Sample 17       Illumina HiSeq 2000     ERR1190997      80591061        2475034240
PRJEB12126      ERX1264373      Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    10090   Mus musculus    RNA-Seq TRANSCRIPTOMIC  RANDOM  SAMEA3708916    Sample 18       Illumina HiSeq 2000     ERR1190998      68575621        2149386138
PRJEB12126      ERX1264374      Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    10090   Mus musculus    OTHER   TRANSCRIPTOMIC  other   SAMEA3708917    Sample 19       Illumina HiSeq 2000     ERR1190999      59543450        1840946911
PRJEB12126      ERX1264375      Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    10090   Mus musculus    OTHER   TRANSCRIPTOMIC  other   SAMEA3708918    Sample 2        Illumina HiSeq 2000     ERR1191000      48420348        1429402558
PRJEB12126      ERX1264376      Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    10090   Mus musculus    OTHER   TRANSCRIPTOMIC  other   SAMEA3708919    Sample 20       Illumina HiSeq 2000     ERR1191001      39413642        1197490271
PRJEB12126      ERX1264377      Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    10090   Mus musculus    OTHER   TRANSCRIPTOMIC  other   SAMEA3708920    Sample 21       Illumina HiSeq 2000     ERR1191002      43109202        1310217152
PRJEB12126      ERX1264378      Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    10090   Mus musculus    OTHER   TRANSCRIPTOMIC  other   SAMEA3708921    Sample 22       Illumina HiSeq 2000     ERR1191003      48048678        1464094378
PRJEB12126      ERX1264379      Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    10090   Mus musculus    OTHER   TRANSCRIPTOMIC  other   SAMEA3708922    Sample 23       Illumina HiSeq 2000     ERR1191004      55458988        1762359654
PRJEB12126      ERX1264380      Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    10090   Mus musculus    OTHER   TRANSCRIPTOMIC  other   SAMEA3708923    Sample 24       Illumina HiSeq 2000     ERR1191005      47426381        1463185679
PRJEB12126      ERX1264381      Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    10090   Mus musculus    OTHER   TRANSCRIPTOMIC  other   SAMEA3708924    Sample 25       Illumina HiSeq 2000     ERR1191006      53368431        1671809961
PRJEB12126      ERX1264382      Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    10090   Mus musculus    OTHER   TRANSCRIPTOMIC  other   SAMEA3708925    Sample 26       Illumina HiSeq 2000     ERR1191007      63008359        1879252598
PRJEB12126      ERX1264383      Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene expression through RNA sequencing and ribosome profiling    10090   Mus musculus    OTHER   TRANSCRIPTOMIC  other   SAMEA3708926    Sample 27       Illumina HiSeq 2000     ERR1191008      54398154        1665685103

Searching GEO Datasets database and retrieving metadata#

To query GEO Datasets instead:

$ pysradb search --db geo -q coronavirus --publication-date 01-08-2020:07-08-2020
from pysradb.search import GeoSearch

instance = GeoSearch(query="coronavirus", publication_date="01-08-2020:07-08-2020")
instance.search()
instance.get_df()

Output:

study_accession    experiment_accession    experiment_title        sample_taxon_id sample_scientific_name  experiment_library_strategy     experiment_library_source       experiment_library_selection    sample_accession        sample_alias    experiment_instrument_model     pool_member_spots       run_1_size     run_1_accession  run_1_total_spots       run_1_total_bases
SRP270658       SRX8679965      GSM4658808: SARS-CoV-2-infected 24h 3; Chlorocebus sabaeus; Severe acute respiratory syndrome coronavirus 2; RNA-Seq    60711   Chlorocebus sabaeus     RNA-Seq TRANSCRIPTOMIC  cDNA    SRS6959042      GSM4658808      NextSeq 500     104223040       9743267247      SRR12164500     104223040       31475358080
SRP270658       SRX8679964      GSM4658807: SARS-CoV-2-infected 24h 2; Chlorocebus sabaeus; Severe acute respiratory syndrome coronavirus 2; RNA-Seq    60711   Chlorocebus sabaeus     RNA-Seq TRANSCRIPTOMIC  cDNA    SRS6959041      GSM4658807      NextSeq 500     92813819        8703506222      SRR12164499     92813819        28029773338

Controlling the level of detail of the metadata retrieved#

We can control the maximum number of result entries to retrieve using the -m / –max flag or the return_max parameter:

$ pysradb search -q coronavirus --publication-date 01-08-2020:07-08-2020 -m 5
from pysradb.search import SraSearch

instance = SraSearch(return_max=5, query="coronavirus", publication_date="01-08-2020:07-08-2020")
instance.search()
instance.get_df()

Output:

study_accession    experiment_accession    experiment_title        sample_taxon_id sample_scientific_name  experiment_library_strategy     experiment_library_source       experiment_library_selection    sample_accession        sample_alias    experiment_instrument_model     pool_member_spots       run_1_size     run_1_accession  run_1_total_spots       run_1_total_bases       pmid
SRP270658       SRX8679965      GSM4658808: SARS-CoV-2-infected 24h 3; Chlorocebus sabaeus; Severe acute respiratory syndrome coronavirus 2; RNA-Seq    60711   Chlorocebus sabaeus     RNA-Seq TRANSCRIPTOMIC  cDNA    SRS6959042      GSM4658808      NextSeq 500     104223040       9743267247      SRR12164500     104223040       31475358080     11295714
SRP270658       SRX8679964      GSM4658807: SARS-CoV-2-infected 24h 2; Chlorocebus sabaeus; Severe acute respiratory syndrome coronavirus 2; RNA-Seq    60711   Chlorocebus sabaeus     RNA-Seq TRANSCRIPTOMIC  cDNA    SRS6959041      GSM4658807      NextSeq 500     92813819        8703506222      SRR12164499     92813819        28029773338     11295713
SRP253798       SRX8677889      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956975      hCoV-19/Australia/VIC1898/2020  NextSeq 500    456828   51422072        SRR12162149     456828  130280958       11292876
SRP253798       SRX8677888      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956974      hCoV-19/Australia/VIC1886/2020  NextSeq 500    268832   29923966        SRR12162150     268832  75885223        11292875
SRP253798       SRX8677887      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956973      hCoV-19/Australia/VIC1890/2020  NextSeq 500    483526   54629557        SRR12162151     483526  139019404       11292874

To control the number of columns of the metadata output, we can use the -v / --verbosity flags or the verbosity parameter. The default verbosity, which is shown above, is 2.

We can set verbosity to be 1 to only see run_accession and experiment title. On the command-line, we can use the more intuitive --run-description flag in place of the more obscure -v 1 as shown below:

$ pysradb search -v 1 -q coronavirus --publication-date 01-08-2020:07-08-2020
from pysradb.search import SraSearch

instance = SraSearch(verbosity=1, query="coronavirus", publication_date="01-08-2020:07-08-2020")
instance.search()
instance.get_df()

Output:

run_accession    experiment_title
SRR12164500     GSM4658808: SARS-CoV-2-infected 24h 3; Chlorocebus sabaeus; Severe acute respiratory syndrome coronavirus 2; RNA-Seq
SRR12164499     GSM4658807: SARS-CoV-2-infected 24h 2; Chlorocebus sabaeus; Severe acute respiratory syndrome coronavirus 2; RNA-Seq
SRR12162149     Severe acute respiratory syndrome coronavirus 2
SRR12162150     Severe acute respiratory syndrome coronavirus 2
SRR12162151     Severe acute respiratory syndrome coronavirus 2
SRR12162152     Severe acute respiratory syndrome coronavirus 2
SRR12162153     Severe acute respiratory syndrome coronavirus 2
SRR12162154     Severe acute respiratory syndrome coronavirus 2
SRR12162155     Severe acute respiratory syndrome coronavirus 2
SRR12162156     Severe acute respiratory syndrome coronavirus 2
SRR12162157     Severe acute respiratory syndrome coronavirus 2
SRR12162158     Severe acute respiratory syndrome coronavirus 2
SRR12162159     Severe acute respiratory syndrome coronavirus 2
SRR12162160     Severe acute respiratory syndrome coronavirus 2
SRR12162161     Severe acute respiratory syndrome coronavirus 2
SRR12162162     Severe acute respiratory syndrome coronavirus 2
SRR12162163     Severe acute respiratory syndrome coronavirus 2
SRR12162164     Severe acute respiratory syndrome coronavirus 2
SRR12162165     Severe acute respiratory syndrome coronavirus 2
SRR12162166     Severe acute respiratory syndrome coronavirus 2

To view a more detailed metadata, including download URLs and sample attributes, we can set verbosity to be 3. Similar to the previous example, we can use the more intuitive --detailed flag in place of the more obscure -v 3 as shown below:

$ pysradb search -v 3 -q coronavirus --publication-date 01-08-2020:07-08-2020
from pysradb.search import SraSearch

instance = SraSearch(verbosity=3, query="coronavirus", publication_date="01-08-2020:07-08-2020")
instance.search()
instance.get_df()

Output:

study_accession  experiment_accession    experiment_title        sample_taxon_id sample_scientific_name  experiment_library_strategy     experiment_library_source       experiment_library_selection    sample_accession        sample_alias    experiment_instrument_model     pool_member_spots       run_1_size     run_1_accession  run_1_total_spots       run_1_total_bases       experiment_alias        experiment_attributes_1_tag    experiment_attributes_1_value    experiment_design_description   experiment_external_id  experiment_library_construction_protocol        experiment_library_name experiment_link_1_type  experiment_link_1_value_1       experiment_link_1_value_2       experiment_link_1_value_3       experiment_platform     experiment_sample_descriptor_accession  library_layout pool_external_id pool_member_accession   pool_member_bases       pool_member_member_name pool_member_organism    pool_member_sample_name pool_member_sample_title        pool_member_tax_id      run_1_alias     run_1_base_A_count      run_1_base_C_count      run_1_base_G_count      run_1_base_N_count      run_1_base_T_count      run_1_cloudfile_1_filetype     run_1_cloudfile_1_location       run_1_cloudfile_1_provider      run_1_cloudfile_2_filetype      run_1_cloudfile_2_location      run_1_cloudfile_2_provider      run_1_cloudfile_3_filetype      run_1_cloudfile_3_location      run_1_cloudfile_3_provider      run_1_cloudfile_4_filetype      run_1_cloudfile_4_location      run_1_cloudfile_4_provider      run_1_cluster_name      run_1_database_1        run_1_is_public run_1_load_done run_1_published run_1_srafile_1_alternative_1_access_type       run_1_srafile_1_alternative_1_free_egress       run_1_srafile_1_alternative_1_org       run_1_srafile_1_alternative_1_url       run_1_srafile_1_alternative_2_access_type       run_1_srafile_1_alternative_2_free_egress      run_1_srafile_1_alternative_2_org        run_1_srafile_1_alternative_2_url       run_1_srafile_1_alternative_3_access_type       run_1_srafile_1_alternative_3_free_egress       run_1_srafile_1_alternative_3_org       run_1_srafile_1_alternative_3_url       run_1_srafile_1_cluster run_1_srafile_1_date    run_1_srafile_1_filename        run_1_srafile_1_md5    run_1_srafile_1_semantic_name    run_1_srafile_1_size    run_1_srafile_1_sratoolkit      run_1_srafile_1_supertype      run_1_srafile_1_url      run_1_srafile_2_alternative_1_access_type       run_1_srafile_2_alternative_1_free_egress      run_1_srafile_2_alternative_1_org        run_1_srafile_2_alternative_1_url       run_1_srafile_2_alternative_2_access_type       run_1_srafile_2_alternative_2_free_egress       run_1_srafile_2_alternative_2_org       run_1_srafile_2_alternative_2_url       run_1_srafile_2_alternative_3_access_type       run_1_srafile_2_alternative_3_free_egress       run_1_srafile_2_alternative_3_org       run_1_srafile_2_alternative_3_url       run_1_srafile_2_cluster run_1_srafile_2_date   run_1_srafile_2_filename run_1_srafile_2_md5     run_1_srafile_2_semantic_name   run_1_srafile_2_size    run_1_srafile_2_sratoolkit      run_1_srafile_2_supertype       run_1_srafile_2_url     run_1_srafile_3_alternative_1_access_type      run_1_srafile_3_alternative_1_free_egress        run_1_srafile_3_alternative_1_org       run_1_srafile_3_alternative_1_url       run_1_srafile_3_alternative_2_access_type       run_1_srafile_3_alternative_2_free_egress       run_1_srafile_3_alternative_2_org       run_1_srafile_3_alternative_2_url       run_1_srafile_3_alternative_3_access_type       run_1_srafile_3_alternative_3_free_egress       run_1_srafile_3_alternative_3_org       run_1_srafile_3_alternative_3_url      run_1_srafile_3_alternative_4_access_type        run_1_srafile_3_alternative_4_free_egress       run_1_srafile_3_alternative_4_org       run_1_srafile_3_alternative_4_url       run_1_srafile_3_cluster run_1_srafile_3_date    run_1_srafile_3_filename        run_1_srafile_3_md5     run_1_srafile_3_semantic_name   run_1_srafile_3_size    run_1_srafile_3_sratoolkit      run_1_srafile_3_supertype       run_1_srafile_3_url     run_1_static_data_available     run_1_total_base_count run_1_total_base_cs_native       sample_attributes_10_tag        sample_attributes_10_value      sample_attributes_11_tagsample_attributes_11_value      sample_attributes_12_tag        sample_attributes_12_value      sample_attributes_1_tagsample_attributes_1_value        sample_attributes_2_tag sample_attributes_2_value       sample_attributes_3_tag sample_attributes_3_value       sample_attributes_4_tag sample_attributes_4_value       sample_attributes_5_tag sample_attributes_5_value       sample_attributes_6_tag sample_attributes_6_value       sample_attributes_7_tag sample_attributes_7_value       sample_attributes_8_tag sample_attributes_8_value       sample_attributes_9_tag sample_attributes_9_value      sample_description       sample_external_id_1    sample_external_id_1_namespace  sample_link_1_type      sample_link_1_value_1   sample_link_1_value_2   sample_link_1_value_3   sample_taxon_id sample_title    study_alias     study_center_name       study_center_project_name       study_external_id_1     study_external_id_1_namespace   study_study_abstract   study_study_title        study_study_type_existing_study_type    submission_accession    submission_alias        submission_broker_name  submission_center_name  submission_lab_name     submission_submission_comment   pmid
SRP270658       SRX8679965      GSM4658808: SARS-CoV-2-infected 24h 3; Chlorocebus sabaeus; Severe acute respiratory syndrome coronavirus 2; RNA-Seq    60711   Chlorocebus sabaeus     RNA-Seq TRANSCRIPTOMIC  cDNA    SRS6959042      GSM4658808      NextSeq 500     104223040       9743267247      SRR12164500     104223040       31475358080     GSM4658808     GEO Accession    GSM4658808      N/A     GSM4658808      Cells were harvested, and total RNA was extracted using the Qiagen RNeasy Plus Mini Kit. The quality of the extracted RNA was assessed with the Agilent 2100 Bioanalyzer. RNA libraries were prepared for sequencing using standard Illumina protocols. N/A     XREF_LINK       DB: gds ID: 304658808   LABEL: GSM4658808       ILLUMINA        SRS6959042      PAIRED  SAMN15464189    SRS6959042      31475358080     N/A     Chlorocebus sabaeus     GSM4658808      SARS-CoV-2-infected 24h 3       60711   GSM4658808_r1   7955582672      7851434515     7958217565       273003  7709850325      fastq   gs.US   gs      fastq   s3.us-east-1    s3      run     gs.US   gs     run      s3.us-east-1    s3      public  <Database><Table name="SEQUENCE"><Statistics source="meta"><Rows count="104223040" /><Elements count="31475358080" /></Statistics></Table></Database>   true    true    2020-07-08 18:19:30     Use Cloud Data Delivery -       GCP     gs://sra-pub-src-12/SRR12164500/6_CoV2_24h_3_S5_R1_001.fastq.gz.1       Use Cloud Data Delivery -       AWS     s3://sra-pub-src-12/SRR12164500/6_CoV2_24h_3_S5_R1_001.fastq.gz.1       N/A     N/A     N/A    N/A      public  2020-07-07 13:36:52     N/A     9ca5526761cf0716bfb6802c0fb31297        fastq   7139762726      0      Original N/A     Use Cloud Data Delivery -       GCP     gs://sra-pub-src-12/SRR12164500/6_CoV2_24h_3_S5_R2_001.fastq.gz.1       Use Cloud Data Delivery -       AWS     s3://sra-pub-src-12/SRR12164500/6_CoV2_24h_3_S5_R2_001.fastq.gz.1      N/A      N/A     N/A     N/A     public  2020-07-07 13:36:20     N/A     d2c92af7effd76563a8133011ec2275e        fastq  7448441689       0       Original        N/A     anonymous       worldwide       NCBI    https://sra-download.ncbi.nlm.nih.gov/traces/sra76/SRR/011879/SRR12164500       aws identity    s3.us-east-1    AWS     s3://sra-pub-run-8/SRR12164500/SRR12164500.1    gcp identity    gs.US   GCP     gs://sra-pub-run-9/SRR12164500/SRR12164500.1    N/A     N/A     N/A    N/A      public  2020-07-07 13:38:09     SRR12164500     2e349fddeeed6377a84638e8a6f3b055        run     9743268772     1Primary ETL     https://sra-download.ncbi.nlm.nih.gov/traces/sra76/SRR/011879/SRR12164500       1       31475358080    false    N/A     N/A     N/A     N/A     N/A     N/A     source_name     SARS-CoV-2-infected Vero E6 cells       cell   Vero E6 cells    treatment       SARS-CoV-2 infection    time    24h     N/A     N/A     N/A     N/A     N/A     N/A    N/A      N/A     N/A     N/A     N/A     SAMN15464189    BioSample       XREF_LINK       DB: bioproject  ID: 644588     LABEL: PRJNA644588       60711   SARS-CoV-2-infected 24h 3       GSE153940       GEO     GSE153940       PRJNA644588    BioProject       We conducted a high-throughput drug repositioning screen using the LOPAC?1280 and the ReFRAME drug libraries to identify existing drugs that harbor antiviral activity against SARS-CoV-2, in a Vero E6 cell-based assay. We additionally performed RNA sequencing on control and SARS-CoV-2 infected Vero E6 cells to study the biological changes after SARS-CoV-2 infection and to elucidate the potential mechanisms underlying the positive hits identified from our high-throughput screen. Vero E6 cells were either mock-infected or infected with SARS-CoV-2 USA-WA1/2020 (MOI = 0.3) with three replicates. Cells were harvested 24 hours after infection, and total RNA was extracted using the Qiagen? RNeasy? Plus Mini Kit. The quality of the extracted RNA was assessed with the Agilent? 2100 Bioanalyzer. Libraries were prepared from total RNA following ribosome RNA depletion using standard protocol according to Illumina?. Total RNA sequencing was then performed on the Illumina? NextSeq system; 150bp paired-end runs were performed and 100 million raw reads per sample were generated. Overall design: mRNA profiles of control (mock-infected) and 24h post-SARS-CoV-2-infection Vero E6 cells with three replicates.  Gene expression of SARS-CoV-2-infected Vero E6 cells    Other   SRA1095806      GEO: GSE153940 GEO      GEO     N/A     submission brokered by GEO      11295714
SRP270658       SRX8679964      GSM4658807: SARS-CoV-2-infected 24h 2; Chlorocebus sabaeus; Severe acute respiratory syndrome coronavirus 2; RNA-Seq    60711   Chlorocebus sabaeus     RNA-Seq TRANSCRIPTOMIC  cDNA    SRS6959041      GSM4658807      NextSeq 500     92813819        8703506222      SRR12164499     92813819        28029773338     GSM4658807     GEO Accession    GSM4658807      N/A     GSM4658807      Cells were harvested, and total RNA was extracted using the Qiagen RNeasy Plus Mini Kit. The quality of the extracted RNA was assessed with the Agilent 2100 Bioanalyzer. RNA libraries were prepared for sequencing using standard Illumina protocols. N/A     XREF_LINK       DB: gds ID: 304658807   LABEL: GSM4658807       ILLUMINA        SRS6959041      PAIRED  SAMN15464190    SRS6959041      28029773338     N/A     Chlorocebus sabaeus     GSM4658807      SARS-CoV-2-infected 24h 2       60711   GSM4658807_r1   7064191719      7025296945     7068860505       241911  6871182258      fastq   gs.US   gs      fastq   s3.us-east-1    s3      run     gs.US   gs     run      s3.us-east-1    s3      public  <Database><Table name="SEQUENCE"><Statistics source="meta"><Rows count="92813819" /><Elements count="28029773338" /></Statistics></Table></Database>    true    true    2020-07-08 18:19:30     Use Cloud Data Delivery -       GCP     gs://sra-pub-src-9/SRR12164499/5_CoV2_24h_2_S4_R1_001.fastq.gz.1        Use Cloud Data Delivery -       AWS     s3://sra-pub-src-9/SRR12164499/5_CoV2_24h_2_S4_R1_001.fastq.gz.1        N/A     N/A     N/A    N/A      public  2020-07-07 13:34:31     N/A     4666a6d924bb05c5ee967762a6d2fbe5        fastq   6383247475      0      Original N/A     Use Cloud Data Delivery -       GCP     gs://sra-pub-src-9/SRR12164499/5_CoV2_24h_2_S4_R2_001.fastq.gz.1Use Cloud Data Delivery -       AWS     s3://sra-pub-src-9/SRR12164499/5_CoV2_24h_2_S4_R2_001.fastq.gz.1        N/A    N/A      N/A     N/A     public  2020-07-07 13:37:05     N/A     9f4cb927c184d1dc8c89d47e83c79a4e        fastq   6689694994      0       Original        N/A     anonymous       worldwide       NCBI    https://sra-download.ncbi.nlm.nih.gov/traces/sra60/SRR/011879/SRR12164499       aws identity    s3.us-east-1    AWS     s3://sra-pub-run-9/SRR12164499/SRR12164499.1    gcp identity    gs.US   GCP     gs://sra-pub-run-8/SRR12164499/SRR12164499.1    N/A     N/A     N/A     N/A    public   2020-07-07 13:40:47     SRR12164499     35767b7633482d339f0c96bbb21e58c9        run     8703507747      1      Primary ETL      https://sra-download.ncbi.nlm.nih.gov/traces/sra60/SRR/011879/SRR12164499       1       28029773338    false    N/A     N/A     N/A     N/A     N/A     N/A     source_name     SARS-CoV-2-infected Vero E6 cells       cell   Vero E6 cells    treatment       SARS-CoV-2 infection    time    24h     N/A     N/A     N/A     N/A     N/A     N/A    N/A      N/A     N/A     N/A     N/A     SAMN15464190    BioSample       XREF_LINK       DB: bioproject  ID: 644588     LABEL: PRJNA644588       60711   SARS-CoV-2-infected 24h 2       GSE153940       GEO     GSE153940       PRJNA644588    BioProject       We conducted a high-throughput drug repositioning screen using the LOPAC?1280 and the ReFRAME drug libraries to identify existing drugs that harbor antiviral activity against SARS-CoV-2, in a Vero E6 cell-based assay. We additionally performed RNA sequencing on control and SARS-CoV-2 infected Vero E6 cells to study the biological changes after SARS-CoV-2 infection and to elucidate the potential mechanisms underlying the positive hits identified from our high-throughput screen. Vero E6 cells were either mock-infected or infected with SARS-CoV-2 USA-WA1/2020 (MOI = 0.3) with three replicates. Cells were harvested 24 hours after infection, and total RNA was extracted using the Qiagen? RNeasy? Plus Mini Kit. The quality of the extracted RNA was assessed with the Agilent? 2100 Bioanalyzer. Libraries were prepared from total RNA following ribosome RNA depletion using standard protocol according to Illumina?. Total RNA sequencing was then performed on the Illumina? NextSeq system; 150bp paired-end runs were performed and 100 million raw reads per sample were generated. Overall design: mRNA profiles of control (mock-infected) and 24h post-SARS-CoV-2-infection Vero E6 cells with three replicates.  Gene expression of SARS-CoV-2-infected Vero E6 cells    Other   SRA1095806      GEO: GSE153940 GEO      GEO     N/A     submission brokered by GEO      11295713
SRP253798       SRX8677889      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956975      hCoV-19/Australia/VIC1898/2020  NextSeq 500    456828   51422072        SRR12162149     456828  130280958       VIC1898_illumina        N/A     N/A     ARTIC v3, minimap2 v2.17, ivar v1.2.2, samtools v1.10. Using minimap2, short reads mapped to SARS-CoV-2 NCBI accession MN908947.3. Using samtools, proper_pairs (samflag 2) mapping to MN908947.3 retained, unmapped reads (samflag 4) discarded (to filter out non-SARS-CoV-2 cDNA). Filtered reads submitted to NCBI  SAMN15459145    N/A     VIC1898_illumina        N/A     N/A    N/A      N/A     ILLUMINA        SRS6956975      PAIRED  SAMN15459145    SRS6956975      130280958       N/A     Severe acute respiratory syndrome coronavirus 2 hCoV-19/Australia/VIC1898/2020  SARS-Cov-2 VIC1898 (GISAID EPI_ISL_480645)     2697049  VIC1898_R1.fq.gz        40296742        24826904        24644946        2414    40509952        fastq   gs.US  gs       fastq   s3.us-east-1    s3      run     gs.US   gs      run     s3.us-east-1    s3      public  <Database><Table name="SEQUENCE"><Statistics source="meta"><Rows count="456828" /><Elements count="130280958" /></Statistics></Table></Database>        true    true    2020-07-07 09:35:31     Use Cloud Data Delivery -       GCP     gs://sra-pub-src-12/SRR12162149/VIC1898_R1.fq.gz.1      Use Cloud Data Delivery -       AWS     s3://sra-pub-src-12/SRR12162149/VIC1898_R1.fq.gz.1      anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162149/VIC1898_R1.fq.gz.1       public  2020-07-07 09:29:51     VIC1898_R1.fq.gz        01a47ca96701c890901dff4568f5dcfd        fastq  36157796 0       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162149/VIC1898_R1.fq.gz.1      Use Cloud Data Delivery  -       GCP     gs://sra-pub-src-12/SRR12162149/VIC1898_R2.fq.gz.1      Use Cloud Data Delivery-AWS     s3://sra-pub-src-12/SRR12162149/VIC1898_R2.fq.gz.1      anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162149/VIC1898_R2.fq.gz.1       public  2020-07-07 09:29:53     VIC1898_R2.fq.gz1fe742c26d5097d22a5760940f8aa113        fastq   35886034        0       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162149/VIC1898_R2.fq.gz.1       anonymous       worldwide       NCBI    https://sra-download.ncbi.nlm.nih.gov/traces/sra39/SRR/011877/SRR12162149       anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR12162149/SRR12162149  aws identity    s3.us-east-1    AWS     s3://sra-pub-run-8/SRR12162149/SRR12162149.1    gcp identity    gs.US   GCP     gs://sra-pub-run-9/SRR12162149/SRR12162149.1    public  2020-07-07 09:30:03     SRR12162149     a812f270939cf1941b3015f47736d050        run     51423889        1       Primary ETL     https://sra-download.ncbi.nlm.nih.gov/traces/sra39/SRR/011877/SRR12162149       1       130280958       false   host_sex       female   passage_history Original        BioSampleModel  Pathogen.cl     isolate VIC1898 collected_by    Victorian Infectious Diseases Reference Laboratory (VIDRL)      collection_date 2020-06-01      geo_loc_name    Australia: Victoria    host     Homo sapiens    host_disease    COVID-19        isolation_source        missing lat_lon missing host_age       22       EPI_ISL_480645  SAMN15459145    BioSample       XREF_LINK       DB: bioproject  ID: 613958      LABEL: PRJNA613958      2697049 SARS-Cov-2 VIC1898 (GISAID EPI_ISL_480645)      PRJNA613958     BioProject      Severe acute respiratory syndrome coronavirus 2 PRJNA613958     BioProject      Genomic sequence data of clinical SARS-CoV-2 samples.   Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome sequencing  Other   SRA1095659      SUB7730753      N/A    The Peter Doherty Institute for Infection and Immunity   Microbiology and Immunology     N/A     11292876
SRP253798       SRX8677888      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956974      hCoV-19/Australia/VIC1886/2020  NextSeq 500    268832   29923966        SRR12162150     268832  75885223        VIC1886_illumina        N/A     N/A     ARTIC v3, minimap2 v2.17, ivar v1.2.2, samtools v1.10. Using minimap2, short reads mapped to SARS-CoV-2 NCBI accession MN908947.3. Using samtools, proper_pairs (samflag 2) mapping to MN908947.3 retained, unmapped reads (samflag 4) discarded (to filter out non-SARS-CoV-2 cDNA). Filtered reads submitted to NCBI  SAMN15459144    N/A     VIC1886_illumina        N/A     N/A    N/A      N/A     ILLUMINA        SRS6956974      PAIRED  SAMN15459144    SRS6956974      75885223        N/A     Severe acute respiratory syndrome coronavirus 2 hCoV-19/Australia/VIC1886/2020  SARS-Cov-2 VIC1886 (GISAID EPI_ISL_480644)     2697049  VIC1886_R1.fq.gz        23251534        14479976        14377143        1605    23774965        fastq   gs.US  gs       fastq   s3.us-east-1    s3      run     gs.US   gs      run     s3.us-east-1    s3      public  <Database><Table name="SEQUENCE"><Statistics source="meta"><Rows count="268832" /><Elements count="75885223" /></Statistics></Table></Database> true    true    2020-07-07 09:35:31     Use Cloud Data Delivery -       GCP     gs://sra-pub-src-11/SRR12162150/VIC1886_R1.fq.gz.1      Use Cloud Data Delivery -       AWS     s3://sra-pub-src-11/SRR12162150/VIC1886_R1.fq.gz.1     anonymous        worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162150/VIC1886_R1.fq.gz.1       public  2020-07-07 09:29:41     VIC1886_R1.fq.gz        0e7cb97ad7b038954a1a280d2082a1a9        fastq   201569690       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162150/VIC1886_R1.fq.gz.1       Use Cloud Data Delivery -       AWS     s3://sra-pub-src-11/SRR12162150/VIC1886_R2.fq.gz.1      anonymous       worldwide      AWS      https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162150/VIC1886_R2.fq.gz.1       anonymous       worldwide       NCBI    https://sra-download.ncbi.nlm.nih.gov/traces/sra52/SRZ/012162/SRR12162150/VIC1886_R2.fq.gz      public 2020-07-07 09:29:40      VIC1886_R2.fq.gz        d361d616985ebf2966716ec2d0af38a7        fastq   20299385        0      Original https://sra-download.ncbi.nlm.nih.gov/traces/sra52/SRZ/012162/SRR12162150/VIC1886_R2.fq.gz      anonymous      worldwide        NCBI    https://sra-download.ncbi.nlm.nih.gov/traces/sra69/SRR/011877/SRR12162150       anonymous      worldwide        AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR12162150/SRR12162150  aws identity    s3.us-east-1    AWS     s3://sra-pub-run-8/SRR12162150/SRR12162150.1    gcp identity    gs.US   GCP     gs://sra-pub-run-9/SRR12162150/SRR12162150.1    public  2020-07-07 09:29:50     SRR12162150     f911722720480ebd389aaab0761bb8b6        run    29925787 1       Primary ETL     https://sra-download.ncbi.nlm.nih.gov/traces/sra69/SRR/011877/SRR12162150       1      75885223 false   host_sex        female  passage_history Original        BioSampleModel  Pathogen.cl     isolate VIC1886collected_by     Victorian Infectious Diseases Reference Laboratory (VIDRL)      collection_date 2020-05-29      geo_loc_name    Australia: Victoria     host    Homo sapiens    host_disease    COVID-19        isolation_source        missinglat_lon  missing host_age        35      EPI_ISL_480644  SAMN15459144    BioSample       XREF_LINK       DB: bioproject ID: 613958       LABEL: PRJNA613958      2697049 SARS-Cov-2 VIC1886 (GISAID EPI_ISL_480644)      PRJNA613958     BioProject      Severe acute respiratory syndrome coronavirus 2 PRJNA613958     BioProject      Genomic sequence data of clinical SARS-CoV-2 samples.   Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome sequencing  Other   SRA1095659      SUB7730753      N/A     The Peter Doherty Institute for Infection and Immunity  Microbiology and Immunology    N/A      11292875
SRP253798       SRX8677887      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956973      hCoV-19/Australia/VIC1890/2020  NextSeq 500    483526   54629557        SRR12162151     483526  139019404       VIC1890_illumina        N/A     N/A     ARTIC v3, minimap2 v2.17, ivar v1.2.2, samtools v1.10. Using minimap2, short reads mapped to SARS-CoV-2 NCBI accession MN908947.3. Using samtools, proper_pairs (samflag 2) mapping to MN908947.3 retained, unmapped reads (samflag 4) discarded (to filter out non-SARS-CoV-2 cDNA). Filtered reads submitted to NCBI  SAMN15459143    N/A     VIC1890_illumina        N/A     N/A    N/A      N/A     ILLUMINA        SRS6956973      PAIRED  SAMN15459143    SRS6956973      139019404       N/A     Severe acute respiratory syndrome coronavirus 2 hCoV-19/Australia/VIC1890/2020  SARS-Cov-2 VIC1890 (GISAID EPI_ISL_480643)     2697049  VIC1890_R1.fq.gz        43067455        26436884        26213342        2531    43299192        fastq   gs.US  gs       fastq   s3.us-east-1    s3      run     gs.US   gs      run     s3.us-east-1    s3      public  <Database><Table name="SEQUENCE"><Statistics source="meta"><Rows count="483526" /><Elements count="139019404" /></Statistics></Table></Database>        true    true    2020-07-07 09:35:31     Use Cloud Data Delivery -       GCP     gs://sra-pub-src-10/SRR12162151/VIC1890_R1.fq.gz.1      Use Cloud Data Delivery -       AWS     s3://sra-pub-src-10/SRR12162151/VIC1890_R1.fq.gz.1      anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162151/VIC1890_R1.fq.gz.1       public  2020-07-07 09:29:51     VIC1890_R1.fq.gz        ea01f8e763119c7ba2a6d1fc2efd7c48        fastq  38106091 0       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162151/VIC1890_R1.fq.gz.1      Use Cloud Data Delivery  -       GCP     gs://sra-pub-src-10/SRR12162151/VIC1890_R2.fq.gz.1      Use Cloud Data Delivery-AWS     s3://sra-pub-src-10/SRR12162151/VIC1890_R2.fq.gz.1      anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162151/VIC1890_R2.fq.gz.1       public  2020-07-07 09:29:50     VIC1890_R2.fq.gz77a4c917d81b118439c140a65171b100        fastq   38420606        0       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162151/VIC1890_R2.fq.gz.1       anonymous       worldwide       NCBI    https://sra-download.ncbi.nlm.nih.gov/traces/sra24/SRR/011877/SRR12162151       anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR12162151/SRR12162151  aws identity    s3.us-east-1    AWS     s3://sra-pub-run-9/SRR12162151/SRR12162151.1    gcp identity    gs.US   GCP     gs://sra-pub-run-8/SRR12162151/SRR12162151.1    public  2020-07-07 09:30:02     SRR12162151     2a2c0b808b724dbbe2ac866daef597d7        run     54631373        1       Primary ETL     https://sra-download.ncbi.nlm.nih.gov/traces/sra24/SRR/011877/SRR12162151       1       139019404       false   host_sex       male     passage_history Original        BioSampleModel  Pathogen.cl     isolate VIC1890 collected_by    Victorian Infectious Diseases Reference Laboratory (VIDRL)      collection_date 2020-05-30      geo_loc_name    Australia: Victoria    host     Homo sapiens    host_disease    COVID-19        isolation_source        missing lat_lon missing host_age       19       EPI_ISL_480643  SAMN15459143    BioSample       XREF_LINK       DB: bioproject  ID: 613958      LABEL: PRJNA613958      2697049 SARS-Cov-2 VIC1890 (GISAID EPI_ISL_480643)      PRJNA613958     BioProject      Severe acute respiratory syndrome coronavirus 2 PRJNA613958     BioProject      Genomic sequence data of clinical SARS-CoV-2 samples.   Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome sequencing  Other   SRA1095659      SUB7730753      N/A    The Peter Doherty Institute for Infection and Immunity   Microbiology and Immunology     N/A     11292874
SRP253798       SRX8677886      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956971      hCoV-19/Australia/VIC1888/2020  NextSeq 500    473895   53675126        SRR12162152     473895  136058655       VIC1888_illumina        N/A     N/A     ARTIC v3, minimap2 v2.17, ivar v1.2.2, samtools v1.10. Using minimap2, short reads mapped to SARS-CoV-2 NCBI accession MN908947.3. Using samtools, proper_pairs (samflag 2) mapping to MN908947.3 retained, unmapped reads (samflag 4) discarded (to filter out non-SARS-CoV-2 cDNA). Filtered reads submitted to NCBI  SAMN15459142    N/A     VIC1888_illumina        N/A     N/A    N/A      N/A     ILLUMINA        SRS6956971      PAIRED  SAMN15459142    SRS6956971      136058655       N/A     Severe acute respiratory syndrome coronavirus 2 hCoV-19/Australia/VIC1888/2020  SARS-Cov-2 VIC1888 (GISAID EPI_ISL_480642)     2697049  VIC1888_R1.fq.gz        42091928        25945569        25704913        2584    42313661        fastq   gs.US  gs       fastq   s3.us-east-1    s3      run     gs.US   gs      run     s3.us-east-1    s3      public  <Database><Table name="SEQUENCE"><Statistics source="meta"><Rows count="473895" /><Elements count="136058655" /></Statistics></Table></Database>        true    true    2020-07-07 09:35:31     Use Cloud Data Delivery -       GCP     gs://sra-pub-src-10/SRR12162152/VIC1888_R1.fq.gz.1      Use Cloud Data Delivery -       AWS     s3://sra-pub-src-10/SRR12162152/VIC1888_R1.fq.gz.1      anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162152/VIC1888_R1.fq.gz.1       public  2020-07-07 09:29:52     VIC1888_R1.fq.gz        b9f7f507feb86c2630ccf8daf5d20b58        fastq  37409094 0       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162152/VIC1888_R1.fq.gz.1      Use Cloud Data Delivery  -       GCP     gs://sra-pub-src-10/SRR12162152/VIC1888_R2.fq.gz.1      Use Cloud Data Delivery-AWS     s3://sra-pub-src-10/SRR12162152/VIC1888_R2.fq.gz.1      anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162152/VIC1888_R2.fq.gz.1       public  2020-07-07 09:29:51     VIC1888_R2.fq.gz4aad09e8a93d75cf468559505fc72662        fastq   37729286        0       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162152/VIC1888_R2.fq.gz.1       anonymous       worldwide       NCBI    https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/011877/SRR12162152       anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR12162152/SRR12162152  aws identity    s3.us-east-1    AWS     s3://sra-pub-run-9/SRR12162152/SRR12162152.1    gcp identity    gs.US   GCP     gs://sra-pub-run-8/SRR12162152/SRR12162152.1    public  2020-07-07 09:30:01     SRR12162152     793750aea426e65c0e8fc1d9a5ba26d4        run     53676944        1       Primary ETL     https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/011877/SRR12162152       1       136058655       false   host_sex       male     passage_history Original        BioSampleModel  Pathogen.cl     isolate VIC1888 collected_by    Victorian Infectious Diseases Reference Laboratory (VIDRL)      collection_date 2020-05-30      geo_loc_name    Australia: Victoria    host     Homo sapiens    host_disease    COVID-19        isolation_source        missing lat_lon missing host_age       25       EPI_ISL_480642  SAMN15459142    BioSample       XREF_LINK       DB: bioproject  ID: 613958      LABEL: PRJNA613958      2697049 SARS-Cov-2 VIC1888 (GISAID EPI_ISL_480642)      PRJNA613958     BioProject      Severe acute respiratory syndrome coronavirus 2 PRJNA613958     BioProject      Genomic sequence data of clinical SARS-CoV-2 samples.   Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome sequencing  Other   SRA1095659      SUB7730753      N/A    The Peter Doherty Institute for Infection and Immunity   Microbiology and Immunology     N/A     11292873
SRP253798       SRX8677885      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956972      hCoV-19/Australia/VIC1891/2020  NextSeq 500    482373   53331905        SRR12162153     482373  135769259       VIC1891_illumina        N/A     N/A     ARTIC v3, minimap2 v2.17, ivar v1.2.2, samtools v1.10. Using minimap2, short reads mapped to SARS-CoV-2 NCBI accession MN908947.3. Using samtools, proper_pairs (samflag 2) mapping to MN908947.3 retained, unmapped reads (samflag 4) discarded (to filter out non-SARS-CoV-2 cDNA). Filtered reads submitted to NCBI  SAMN15459141    N/A     VIC1891_illumina        N/A     N/A    N/A      N/A     ILLUMINA        SRS6956972      PAIRED  SAMN15459141    SRS6956972      135769259       N/A     Severe acute respiratory syndrome coronavirus 2 hCoV-19/Australia/VIC1891/2020  SARS-Cov-2 VIC1891 (GISAID EPI_ISL_480641)     2697049  VIC1891_R1.fq.gz        42029260        25869628        25687184        2666    42180521        fastq   gs.US  gs       fastq   s3.us-east-1    s3      run     gs.US   gs      run     s3.us-east-1    s3      public  <Database><Table name="SEQUENCE"><Statistics source="meta"><Rows count="482373" /><Elements count="135769259" /></Statistics></Table></Database>        true    true    2020-07-07 09:35:31     Use Cloud Data Delivery -       GCP     gs://sra-pub-src-9/SRR12162153/VIC1891_R1.fq.gz.1       Use Cloud Data Delivery -       AWS     s3://sra-pub-src-9/SRR12162153/VIC1891_R1.fq.gz.1       anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162153/VIC1891_R1.fq.gz.1       public  2020-07-07 09:29:52     VIC1891_R1.fq.gz        df4160fb2bcab5dfb6d9f980063f68df        fastq  37386203 0       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162153/VIC1891_R1.fq.gz.1      Use Cloud Data Delivery  -       GCP     gs://sra-pub-src-9/SRR12162153/VIC1891_R2.fq.gz.1       Use Cloud Data Delivery-AWS     s3://sra-pub-src-9/SRR12162153/VIC1891_R2.fq.gz.1       anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162153/VIC1891_R2.fq.gz.1       public  2020-07-07 09:29:50     VIC1891_R2.fq.gzdc5cfad9aa3a9b9f9550e628af1504dd        fastq   37508923        0       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162153/VIC1891_R2.fq.gz.1       anonymous       worldwide       NCBI    https://sra-download.ncbi.nlm.nih.gov/traces/sra77/SRR/011877/SRR12162153       anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR12162153/SRR12162153  aws identity    s3.us-east-1    AWS     s3://sra-pub-run-9/SRR12162153/SRR12162153.1    gcp identity    gs.US   GCP     gs://sra-pub-run-8/SRR12162153/SRR12162153.1    public  2020-07-07 09:30:00     SRR12162153     81444b98bc09c01f8ddc4c2fdb502ebd        run     53333724        1       Primary ETL     https://sra-download.ncbi.nlm.nih.gov/traces/sra77/SRR/011877/SRR12162153       1       135769259       false   host_sex       male     passage_history Original        BioSampleModel  Pathogen.cl     isolate VIC1891 collected_by    Victorian Infectious Diseases Reference Laboratory (VIDRL)      collection_date 2020-05-30      geo_loc_name    Australia: Victoria    host     Homo sapiens    host_disease    COVID-19        isolation_source        missing lat_lon missing host_age       23       EPI_ISL_480641  SAMN15459141    BioSample       XREF_LINK       DB: bioproject  ID: 613958      LABEL: PRJNA613958      2697049 SARS-Cov-2 VIC1891 (GISAID EPI_ISL_480641)      PRJNA613958     BioProject      Severe acute respiratory syndrome coronavirus 2 PRJNA613958     BioProject      Genomic sequence data of clinical SARS-CoV-2 samples.   Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome sequencing  Other   SRA1095659      SUB7730753      N/A    The Peter Doherty Institute for Infection and Immunity   Microbiology and Immunology     N/A     11292872
SRP253798       SRX8677884      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956970      hCoV-19/Australia/VIC1816/2020  NextSeq 550    357052   41111134        SRR12162154     357052  103693201       VIC1816_illumina        N/A     N/A     ARTIC v3, minimap2 v2.17, ivar v1.2.2, samtools v1.10. Using minimap2, short reads mapped to SARS-CoV-2 NCBI accession MN908947.3. Using samtools, proper_pairs (samflag 2) mapping to MN908947.3 retained, unmapped reads (samflag 4) discarded (to filter out non-SARS-CoV-2 cDNA). Filtered reads submitted to NCBI  SAMN15459140    N/A     VIC1816_illumina        N/A     N/A    N/A      N/A     ILLUMINA        SRS6956970      PAIRED  SAMN15459140    SRS6956970      103693201       N/A     Severe acute respiratory syndrome coronavirus 2 hCoV-19/Australia/VIC1816/2020  SARS-Cov-2 VIC1816 (GISAID EPI_ISL_480640)     2697049  VIC1816_R1.fq.gz        31884733        19921529        19810575        658     32075706        fastq   gs.US  gs       fastq   s3.us-east-1    s3      run     gs.US   gs      run     s3.us-east-1    s3      public  <Database><Table name="SEQUENCE"><Statistics source="meta"><Rows count="357052" /><Elements count="103693201" /></Statistics></Table></Database>        true    true    2020-07-07 09:35:31     Use Cloud Data Delivery -       GCP     gs://sra-pub-src-13/SRR12162154/VIC1816_R1.fq.gz.1      Use Cloud Data Delivery -       AWS     s3://sra-pub-src-14/SRR12162154/VIC1816_R1.fq.gz.1      anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162154/VIC1816_R1.fq.gz.1       public  2020-07-07 09:29:47     VIC1816_R1.fq.gz        e55f6baa6b6e51c6594742671c527064        fastq  28430968 0       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162154/VIC1816_R1.fq.gz.1      Use Cloud Data Delivery  -       GCP     gs://sra-pub-src-13/SRR12162154/VIC1816_R2.fq.gz.1      Use Cloud Data Delivery-AWS     s3://sra-pub-src-14/SRR12162154/VIC1816_R2.fq.gz.1      anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162154/VIC1816_R2.fq.gz.1       public  2020-07-07 09:29:48     VIC1816_R2.fq.gz16b592babfeabc99ecdc3c88d455f517        fastq   28901378        0       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162154/VIC1816_R2.fq.gz.1       anonymous       worldwide       NCBI    https://sra-download.ncbi.nlm.nih.gov/traces/sra33/SRR/011877/SRR12162154       anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR12162154/SRR12162154  aws identity    s3.us-east-1    AWS     s3://sra-pub-run-3/SRR12162154/SRR12162154.1    gcp identity    gs.US   GCP     gs://sra-pub-run-5/SRR12162154/SRR12162154.1    public  2020-07-07 09:29:57     SRR12162154     d72cfe9b26d99004ea5ec23b478b919b        run     41112954        1       Primary ETL     https://sra-download.ncbi.nlm.nih.gov/traces/sra33/SRR/011877/SRR12162154       1       103693201       false   host_sex       female   passage_history Original        BioSampleModel  Pathogen.cl     isolate VIC1816 collected_by    Victorian Infectious Diseases Reference Laboratory (VIDRL)      collection_date 2020-05-30      geo_loc_name    Australia: Victoria    host     Homo sapiens    host_disease    COVID-19        isolation_source        missing lat_lon missing host_age       missing  EPI_ISL_480640  SAMN15459140    BioSample       XREF_LINK       DB: bioproject  ID: 613958      LABEL: PRJNA613958      2697049 SARS-Cov-2 VIC1816 (GISAID EPI_ISL_480640)      PRJNA613958     BioProject      Severe acute respiratory syndrome coronavirus 2 PRJNA613958     BioProject      Genomic sequence data of clinical SARS-CoV-2 samples.   Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome sequencing  Other   SRA1095659      SUB7730753      N/A    The Peter Doherty Institute for Infection and Immunity   Microbiology and Immunology     N/A     11292871
SRP253798       SRX8677883      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956969      hCoV-19/Australia/VIC1815/2020  NextSeq 550    307106   35306959        SRR12162155     307106  89866234        VIC1815_illumina        N/A     N/A     ARTIC v3, minimap2 v2.17, ivar v1.2.2, samtools v1.10. Using minimap2, short reads mapped to SARS-CoV-2 NCBI accession MN908947.3. Using samtools, proper_pairs (samflag 2) mapping to MN908947.3 retained, unmapped reads (samflag 4) discarded (to filter out non-SARS-CoV-2 cDNA). Filtered reads submitted to NCBI  SAMN15459139    N/A     VIC1815_illumina        N/A     N/A    N/A      N/A     ILLUMINA        SRS6956969      PAIRED  SAMN15459139    SRS6956969      89866234        N/A     Severe acute respiratory syndrome coronavirus 2 hCoV-19/Australia/VIC1815/2020  SARS-Cov-2 VIC1815 (GISAID EPI_ISL_480639)     2697049  VIC1815_R1.fq.gz        27650006        17041648        16984987        685     28188908        fastq   gs.US  gs       fastq   s3.us-east-1    s3      run     gs.US   gs      run     s3.us-east-1    s3      public  <Database><Table name="SEQUENCE"><Statistics source="meta"><Rows count="307106" /><Elements count="89866234" /></Statistics></Table></Database> true    true    2020-07-07 09:35:31     Use Cloud Data Delivery -       GCP     gs://sra-pub-src-13/SRR12162155/VIC1815_R1.fq.gz.1      Use Cloud Data Delivery -       AWS     s3://sra-pub-src-14/SRR12162155/VIC1815_R1.fq.gz.1     anonymous        worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162155/VIC1815_R1.fq.gz.1       public  2020-07-07 09:29:47     VIC1815_R1.fq.gz        c3eb2d61396671209d729528a38fe991        fastq   237993890       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162155/VIC1815_R1.fq.gz.1       Use Cloud Data Delivery -       GCP     gs://sra-pub-src-13/SRR12162155/VIC1815_R2.fq.gz.1      Use Cloud Data Delivery -      AWS      s3://sra-pub-src-14/SRR12162155/VIC1815_R2.fq.gz.1      anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162155/VIC1815_R2.fq.gz.1       public  2020-07-07 09:29:46     VIC1815_R2.fq.gz7e525466b72d7a4ffc76e8263b8b21eb        fastq   24296605        0       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162155/VIC1815_R2.fq.gz.1       anonymous       worldwide       NCBI    https://sra-download.ncbi.nlm.nih.gov/traces/sra14/SRR/011877/SRR12162155       anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR12162155/SRR12162155  aws identity    s3.us-east-1    AWS     s3://sra-pub-run-6/SRR12162155/SRR12162155.1    gcp identity    gs.US   GCP     gs://sra-pub-run-7/SRR12162155/SRR12162155.1    public  2020-07-07 09:29:56     SRR12162155     6bf8eb2ff09f4df672a5d158fc008342        run     35308778        1       Primary ETL     https://sra-download.ncbi.nlm.nih.gov/traces/sra14/SRR/011877/SRR12162155       1       89866234        false   host_sex       female   passage_history Original        BioSampleModel  Pathogen.cl     isolate VIC1815 collected_by    Victorian Infectious Diseases Reference Laboratory (VIDRL)      collection_date 2020-05-28      geo_loc_name    Australia: Victoria    host     Homo sapiens    host_disease    COVID-19        isolation_source        missing lat_lon missing host_age       56       EPI_ISL_480639  SAMN15459139    BioSample       XREF_LINK       DB: bioproject  ID: 613958      LABEL: PRJNA613958      2697049 SARS-Cov-2 VIC1815 (GISAID EPI_ISL_480639)      PRJNA613958     BioProject      Severe acute respiratory syndrome coronavirus 2 PRJNA613958     BioProject      Genomic sequence data of clinical SARS-CoV-2 samples.   Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome sequencing  Other   SRA1095659      SUB7730753      N/A    The Peter Doherty Institute for Infection and Immunity   Microbiology and Immunology     N/A     11292870
SRP253798       SRX8677882      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956968      hCoV-19/Australia/VIC1814/2020  NextSeq 550    353704   40652239        SRR12162156     353704  103366580       VIC1814_illumina        N/A     N/A     ARTIC v3, minimap2 v2.17, ivar v1.2.2, samtools v1.10. Using minimap2, short reads mapped to SARS-CoV-2 NCBI accession MN908947.3. Using samtools, proper_pairs (samflag 2) mapping to MN908947.3 retained, unmapped reads (samflag 4) discarded (to filter out non-SARS-CoV-2 cDNA). Filtered reads submitted to NCBI  SAMN15459138    N/A     VIC1814_illumina        N/A     N/A    N/A      N/A     ILLUMINA        SRS6956968      PAIRED  SAMN15459138    SRS6956968      103366580       N/A     Severe acute respiratory syndrome coronavirus 2 hCoV-19/Australia/VIC1814/2020  SARS-Cov-2 VIC1814 (GISAID EPI_ISL_480638)     2697049  VIC1814_R1.fq.gz        31849258        19579180        19504589        756     32432797        fastq   gs.US  gs       fastq   s3.us-east-1    s3      run     gs.US   gs      run     s3.us-east-1    s3      public  <Database><Table name="SEQUENCE"><Statistics source="meta"><Rows count="353704" /><Elements count="103366580" /></Statistics></Table></Database>        true    true    2020-07-07 09:36:21     Use Cloud Data Delivery -       AWS     s3://sra-pub-src-13/SRR12162156/VIC1814_R1.fq.gz.1      Use Cloud Data Delivery -       GCP     gs://sra-pub-src-14/SRR12162156/VIC1814_R1.fq.gz.1      anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162156/VIC1814_R1.fq.gz.1       public  2020-07-07 09:29:47     VIC1814_R1.fq.gz        f304a2b1c0f4da708cb7f63f24d6a7b5        fastq  27549676 0       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162156/VIC1814_R1.fq.gz.1      Use Cloud Data Delivery  -       AWS     s3://sra-pub-src-13/SRR12162156/VIC1814_R2.fq.gz.1      Use Cloud Data Delivery-GCP     gs://sra-pub-src-14/SRR12162156/VIC1814_R2.fq.gz.1      anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162156/VIC1814_R2.fq.gz.1       public  2020-07-07 09:29:46     VIC1814_R2.fq.gz210ef61aee2ce8b82bfc0e757716e398        fastq   28072542        0       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162156/VIC1814_R2.fq.gz.1       anonymous       worldwide       NCBI    https://sra-download.ncbi.nlm.nih.gov/traces/sra1/SRR/011877/SRR12162156        anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR12162156/SRR12162156  aws identity    s3.us-east-1    AWS     s3://sra-pub-run-1/SRR12162156/SRR12162156.1    gcp identity    gs.US   GCP     gs://sra-pub-run-1/SRR12162156/SRR12162156.1    public  2020-07-07 09:29:55     SRR12162156     3fd5aed61d6d4459b809ae7965b08e27        run     40654059        1       Primary ETL     https://sra-download.ncbi.nlm.nih.gov/traces/sra1/SRR/011877/SRR12162156        1       103366580       false   host_sex       missing  passage_history Original        BioSampleModel  Pathogen.cl     isolate VIC1814 collected_by    Victorian Infectious Diseases Reference Laboratory (VIDRL)      collection_date 2020-05-28      geo_loc_name    Australia: Victoria    host     Homo sapiens    host_disease    COVID-19        isolation_source        missing lat_lon missing host_age       missing  EPI_ISL_480638  SAMN15459138    BioSample       XREF_LINK       DB: bioproject  ID: 613958      LABEL: PRJNA613958      2697049 SARS-Cov-2 VIC1814 (GISAID EPI_ISL_480638)      PRJNA613958     BioProject      Severe acute respiratory syndrome coronavirus 2 PRJNA613958     BioProject      Genomic sequence data of clinical SARS-CoV-2 samples.   Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome sequencing  Other   SRA1095659      SUB7730753      N/A    The Peter Doherty Institute for Infection and Immunity   Microbiology and Immunology     N/A     11292869
SRP253798       SRX8677881      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956967      hCoV-19/Australia/VIC1813/2020  NextSeq 550    327705   38035344        SRR12162157     327705  95931939        VIC1813_illumina        N/A     N/A     ARTIC v3, minimap2 v2.17, ivar v1.2.2, samtools v1.10. Using minimap2, short reads mapped to SARS-CoV-2 NCBI accession MN908947.3. Using samtools, proper_pairs (samflag 2) mapping to MN908947.3 retained, unmapped reads (samflag 4) discarded (to filter out non-SARS-CoV-2 cDNA). Filtered reads submitted to NCBI  SAMN15459137    N/A     VIC1813_illumina        N/A     N/A    N/A      N/A     ILLUMINA        SRS6956967      PAIRED  SAMN15459137    SRS6956967      95931939        N/A     Severe acute respiratory syndrome coronavirus 2 hCoV-19/Australia/VIC1813/2020  SARS-Cov-2 VIC1813 (GISAID EPI_ISL_480637)     2697049  VIC1813_R1.fq.gz        29590938        18205433        18122632        872     30012064        fastq   gs.US  gs       fastq   s3.us-east-1    s3      run     gs.US   gs      run     s3.us-east-1    s3      public  <Database><Table name="SEQUENCE"><Statistics source="meta"><Rows count="327705" /><Elements count="95931939" /></Statistics></Table></Database> true    true    2020-07-07 09:35:31     Use Cloud Data Delivery -       AWS     s3://sra-pub-src-13/SRR12162157/VIC1813_R1.fq.gz.1      Use Cloud Data Delivery -       GCP     gs://sra-pub-src-14/SRR12162157/VIC1813_R1.fq.gz.1     anonymous        worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162157/VIC1813_R1.fq.gz.1       public  2020-07-07 09:29:47     VIC1813_R1.fq.gz        5c7889fd13fb36f3ddec1437027575ce        fastq   259842050       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162157/VIC1813_R1.fq.gz.1       Use Cloud Data Delivery -       AWS     s3://sra-pub-src-13/SRR12162157/VIC1813_R2.fq.gz.1      Use Cloud Data Delivery -      GCP      gs://sra-pub-src-14/SRR12162157/VIC1813_R2.fq.gz.1      anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162157/VIC1813_R2.fq.gz.1       public  2020-07-07 09:29:46     VIC1813_R2.fq.gzd95cebe7a4e0e3e3905fc2c840a9fa72        fastq   26381378        0       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162157/VIC1813_R2.fq.gz.1       anonymous       worldwide       NCBI    https://sra-download.ncbi.nlm.nih.gov/traces/sra0/SRR/011877/SRR12162157        anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR12162157/SRR12162157  aws identity    s3.us-east-1    AWS     s3://sra-pub-run-1/SRR12162157/SRR12162157.1    gcp identity    gs.US   GCP     gs://sra-pub-run-1/SRR12162157/SRR12162157.1    public  2020-07-07 09:29:55     SRR12162157     a5439295b52c66bdfe6b176f1a2922e9        run     38037163        1       Primary ETL     https://sra-download.ncbi.nlm.nih.gov/traces/sra0/SRR/011877/SRR12162157        1       95931939        false   host_sex       female   passage_history Original        BioSampleModel  Pathogen.cl     isolate VIC1813 collected_by    Victorian Infectious Diseases Reference Laboratory (VIDRL)      collection_date 2020-05-28      geo_loc_name    Australia: Victoria    host     Homo sapiens    host_disease    COVID-19        isolation_source        missing lat_lon missing host_age       28       EPI_ISL_480637  SAMN15459137    BioSample       XREF_LINK       DB: bioproject  ID: 613958      LABEL: PRJNA613958      2697049 SARS-Cov-2 VIC1813 (GISAID EPI_ISL_480637)      PRJNA613958     BioProject      Severe acute respiratory syndrome coronavirus 2 PRJNA613958     BioProject      Genomic sequence data of clinical SARS-CoV-2 samples.   Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome sequencing  Other   SRA1095659      SUB7730753      N/A    The Peter Doherty Institute for Infection and Immunity   Microbiology and Immunology     N/A     11292868
SRP253798       SRX8677880      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956966      hCoV-19/Australia/VIC1812/2020  NextSeq 550    321428   36795893        SRR12162158     321428  92821030        VIC1812_illumina        N/A     N/A     ARTIC v3, minimap2 v2.17, ivar v1.2.2, samtools v1.10. Using minimap2, short reads mapped to SARS-CoV-2 NCBI accession MN908947.3. Using samtools, proper_pairs (samflag 2) mapping to MN908947.3 retained, unmapped reads (samflag 4) discarded (to filter out non-SARS-CoV-2 cDNA). Filtered reads submitted to NCBI  SAMN15459136    N/A     VIC1812_illumina        N/A     N/A    N/A      N/A     ILLUMINA        SRS6956966      PAIRED  SAMN15459136    SRS6956966      92821030        N/A     Severe acute respiratory syndrome coronavirus 2 hCoV-19/Australia/VIC1812/2020  SARS-Cov-2 VIC1812 (GISAID EPI_ISL_480636)     2697049  VIC1812_R1.fq.gz        28461946        17915982        17843283        811     28599008        fastq   gs.US  gs       fastq   s3.us-east-1    s3      run     gs.US   gs      run     s3.us-east-1    s3      public  <Database><Table name="SEQUENCE"><Statistics source="meta"><Rows count="321428" /><Elements count="92821030" /></Statistics></Table></Database> true    true    2020-07-07 09:38:41     Use Cloud Data Delivery -       AWS     s3://sra-pub-src-4/SRR12162158/VIC1812_R1.fq.gz.1       Use Cloud Data Delivery -       GCP     gs://sra-pub-src-7/SRR12162158/VIC1812_R1.fq.gz.1      anonymous        worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162158/VIC1812_R1.fq.gz.1       public  2020-07-07 09:29:45     VIC1812_R1.fq.gz        c6888d7dcaf52a73689fe4ffd1705ae0        fastq   253736400       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162158/VIC1812_R1.fq.gz.1       Use Cloud Data Delivery -       GCP     gs://sra-pub-src-8/SRR12162158/VIC1812_R2.fq.gz.1       Use Cloud Data Delivery -      AWS      s3://sra-pub-src-8/SRR12162158/VIC1812_R2.fq.gz.1       anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162158/VIC1812_R2.fq.gz.1       public  2020-07-07 09:29:44     VIC1812_R2.fq.gz436b22634415958cf7511599cb282476        fastq   25831519        0       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162158/VIC1812_R2.fq.gz.1       anonymous       worldwide       NCBI    https://sra-download.ncbi.nlm.nih.gov/traces/sra1/SRR/011877/SRR12162158        anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR12162158/SRR12162158  aws identity    s3.us-east-1    AWS     s3://sra-pub-run-4/SRR12162158/SRR12162158.1    gcp identity    gs.US   GCP     gs://sra-pub-run-2/SRR12162158/SRR12162158.1    public  2020-07-07 09:29:58     SRR12162158     24497d78bc8566ba5e1bf2732bd7b43a        run     36797712        1       Primary ETL     https://sra-download.ncbi.nlm.nih.gov/traces/sra1/SRR/011877/SRR12162158        1       92821030        false   host_sex       male     passage_history Original        BioSampleModel  Pathogen.cl     isolate VIC1812 collected_by    Victorian Infectious Diseases Reference Laboratory (VIDRL)      collection_date 2020-05-27      geo_loc_name    Australia: Victoria    host     Homo sapiens    host_disease    COVID-19        isolation_source        missing lat_lon missing host_age       27       EPI_ISL_480636  SAMN15459136    BioSample       XREF_LINK       DB: bioproject  ID: 613958      LABEL: PRJNA613958      2697049 SARS-Cov-2 VIC1812 (GISAID EPI_ISL_480636)      PRJNA613958     BioProject      Severe acute respiratory syndrome coronavirus 2 PRJNA613958     BioProject      Genomic sequence data of clinical SARS-CoV-2 samples.   Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome sequencing  Other   SRA1095659      SUB7730753      N/A    The Peter Doherty Institute for Infection and Immunity   Microbiology and Immunology     N/A     11292867
SRP253798       SRX8677879      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956965      hCoV-19/Australia/VIC1865/2020  NextSeq 500    565592   61755215        SRR12162159     565592  156629119       VIC1865_illumina        N/A     N/A     ARTIC v3, minimap2 v2.17, ivar v1.2.2, samtools v1.10. Using minimap2, short reads mapped to SARS-CoV-2 NCBI accession MN908947.3. Using samtools, proper_pairs (samflag 2) mapping to MN908947.3 retained, unmapped reads (samflag 4) discarded (to filter out non-SARS-CoV-2 cDNA). Filtered reads submitted to NCBI  SAMN15459055    N/A     VIC1865_illumina        N/A     N/A    N/A      N/A     ILLUMINA        SRS6956965      PAIRED  SAMN15459055    SRS6956965      156629119       N/A     Severe acute respiratory syndrome coronavirus 2 hCoV-19/Australia/VIC1865/2020  SARS-Cov-2 VIC1865 (GISAID EPI_ISL_480566)     2697049  VIC1865_R1.fq.gz        48227950        29876506        29617873        2751    48904039        fastq   gs.US  gs       fastq   s3.us-east-1    s3      run     gs.US   gs      run     s3.us-east-1    s3      public  <Database><Table name="SEQUENCE"><Statistics source="meta"><Rows count="565592" /><Elements count="156629119" /></Statistics></Table></Database>        true    true    2020-07-07 09:36:21     Use Cloud Data Delivery -       GCP     gs://sra-pub-src-12/SRR12162159/VIC1865_R1.fq.gz.1      Use Cloud Data Delivery -       AWS     s3://sra-pub-src-12/SRR12162159/VIC1865_R1.fq.gz.1      anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162159/VIC1865_R1.fq.gz.1       public  2020-07-07 09:29:53     VIC1865_R1.fq.gz        255b9fa29db2a9facaa9bda1a252e900        fastq  43027710 0       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162159/VIC1865_R1.fq.gz.1      Use Cloud Data Delivery  -       GCP     gs://sra-pub-src-12/SRR12162159/VIC1865_R2.fq.gz.1      Use Cloud Data Delivery-AWS     s3://sra-pub-src-12/SRR12162159/VIC1865_R2.fq.gz.1      anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162159/VIC1865_R2.fq.gz.1       public  2020-07-07 09:29:52     VIC1865_R2.fq.gz469125ebcd05e7ee9f7dbafded17c579        fastq   43069774        0       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162159/VIC1865_R2.fq.gz.1       anonymous       worldwide       NCBI    https://sra-download.ncbi.nlm.nih.gov/traces/sra38/SRR/011877/SRR12162159       anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR12162159/SRR12162159  aws identity    s3.us-east-1    AWS     s3://sra-pub-run-8/SRR12162159/SRR12162159.1    gcp identity    gs.US   GCP     gs://sra-pub-run-9/SRR12162159/SRR12162159.1    public  2020-07-07 09:30:02     SRR12162159     76fd132f010716faa537cc1c62d4498a        run     61757038        1       Primary ETL     https://sra-download.ncbi.nlm.nih.gov/traces/sra38/SRR/011877/SRR12162159       1       156629119       false   host_sex       female   passage_history Original        BioSampleModel  Pathogen.cl     isolate VIC1865 collected_by    Victorian Infectious Diseases Reference Laboratory (VIDRL)      collection_date 2020-03-27      geo_loc_name    Australia: Victoria    host     Homo sapiens    host_disease    COVID-19        isolation_source        missing lat_lon missing host_age       24       EPI_ISL_480566  SAMN15459055    BioSample       XREF_LINK       DB: bioproject  ID: 613958      LABEL: PRJNA613958      2697049 SARS-Cov-2 VIC1865 (GISAID EPI_ISL_480566)      PRJNA613958     BioProject      Severe acute respiratory syndrome coronavirus 2 PRJNA613958     BioProject      Genomic sequence data of clinical SARS-CoV-2 samples.   Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome sequencing  Other   SRA1095659      SUB7730753      N/A    The Peter Doherty Institute for Infection and Immunity   Microbiology and Immunology     N/A     11292866
SRP253798       SRX8677878      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956964      hCoV-19/Australia/VIC1811/2020  NextSeq 550    295014   33818926        SRR12162160     295014  85816216        VIC1811_illumina        N/A     N/A     ARTIC v3, minimap2 v2.17, ivar v1.2.2, samtools v1.10. Using minimap2, short reads mapped to SARS-CoV-2 NCBI accession MN908947.3. Using samtools, proper_pairs (samflag 2) mapping to MN908947.3 retained, unmapped reads (samflag 4) discarded (to filter out non-SARS-CoV-2 cDNA). Filtered reads submitted to NCBI  SAMN15459135    N/A     VIC1811_illumina        N/A     N/A    N/A      N/A     ILLUMINA        SRS6956964      PAIRED  SAMN15459135    SRS6956964      85816216        N/A     Severe acute respiratory syndrome coronavirus 2 hCoV-19/Australia/VIC1811/2020  SARS-Cov-2 VIC1811 (GISAID EPI_ISL_480635)     2697049  VIC1811_R1.fq.gz        26165918        16347010        16330674        722     26971892        fastq   gs.US  gs       fastq   s3.us-east-1    s3      N/A     N/A     N/A     N/A     N/A     N/A     public  <Database><Table name="SEQUENCE"><Statistics source="meta"><Rows count="295014" /><Elements count="85816216" /></Statistics></Table></Database>true     true    2020-07-07 09:36:21     Use Cloud Data Delivery -       GCP     gs://sra-pub-src-5/SRR12162160/VIC1811_R1.fq.gz.1       Use Cloud Data Delivery -       AWS     s3://sra-pub-src-6/SRR12162160/VIC1811_R1.fq.gz.1       anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162160/VIC1811_R1.fq.gz.1      public   2020-07-07 09:29:47     VIC1811_R1.fq.gz        e5ab6fe75617ba9d9b703a8de52ac257        fastq   22378431       0Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162160/VIC1811_R1.fq.gz.1       Use Cloud Data Delivery -       GCP     gs://sra-pub-src-6/SRR12162160/VIC1811_R2.fq.gz.1       Use Cloud Data Delivery -       AWS    s3://sra-pub-src-7/SRR12162160/VIC1811_R2.fq.gz.1        anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162160/VIC1811_R2.fq.gz.1       public  2020-07-07 09:29:48     VIC1811_R2.fq.gz       fc15284cd3ee8eef3b1563205fc16d46 fastq   22783583        0       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162160/VIC1811_R2.fq.gz.1       anonymous       worldwide       NCBI    https://sra-download.ncbi.nlm.nih.gov/traces/sra45/SRR/011877/SRR12162160       N/A     N/A     N/A     N/A     N/A     N/A     N/A     N/A     N/A    N/A      N/A     N/A     public  2020-07-07 09:29:58     SRR12162160     3c0d1ce45172852e19e1cff0dbee2f0f        run    33820747 1       Primary ETL     https://sra-download.ncbi.nlm.nih.gov/traces/sra45/SRR/011877/SRR12162160       1      85816216 false   host_sex        male    passage_history Original        BioSampleModel  Pathogen.cl     isolate VIC1811collected_by     Victorian Infectious Diseases Reference Laboratory (VIDRL)      collection_date 2020-05-28      geo_loc_name    Australia: Victoria     host    Homo sapiens    host_disease    COVID-19        isolation_source        missinglat_lon  missing host_age        49      EPI_ISL_480635  SAMN15459135    BioSample       XREF_LINK       DB: bioproject ID: 613958       LABEL: PRJNA613958      2697049 SARS-Cov-2 VIC1811 (GISAID EPI_ISL_480635)      PRJNA613958     BioProject      Severe acute respiratory syndrome coronavirus 2 PRJNA613958     BioProject      Genomic sequence data of clinical SARS-CoV-2 samples.   Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome sequencing  Other   SRA1095659      SUB7730753      N/A     The Peter Doherty Institute for Infection and Immunity  Microbiology and Immunology    N/A      11292865
SRP253798       SRX8677877      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956963      hCoV-19/Australia/VIC1809/2020  NextSeq 550    367784   43112211        SRR12162161     367784  107949010       VIC1809_illumina        N/A     N/A     ARTIC v3, minimap2 v2.17, ivar v1.2.2, samtools v1.10. Using minimap2, short reads mapped to SARS-CoV-2 NCBI accession MN908947.3. Using samtools, proper_pairs (samflag 2) mapping to MN908947.3 retained, unmapped reads (samflag 4) discarded (to filter out non-SARS-CoV-2 cDNA). Filtered reads submitted to NCBI  SAMN15459134    N/A     VIC1809_illumina        N/A     N/A    N/A      N/A     ILLUMINA        SRS6956963      PAIRED  SAMN15459134    SRS6956963      107949010       N/A     Severe acute respiratory syndrome coronavirus 2 hCoV-19/Australia/VIC1809/2020  SARS-Cov-2 VIC1809 (GISAID EPI_ISL_480634)     2697049  VIC1809_R1.fq.gz        33295569        20476632        20372214        861     33803734        fastq   gs.US  gs       fastq   s3.us-east-1    s3      run     gs.US   gs      run     s3.us-east-1    s3      public  <Database><Table name="SEQUENCE"><Statistics source="meta"><Rows count="367784" /><Elements count="107949010" /></Statistics></Table></Database>        true    true    2020-07-07 09:38:41     Use Cloud Data Delivery -       AWS     s3://sra-pub-src-13/SRR12162161/VIC1809_R1.fq.gz.1      Use Cloud Data Delivery -       GCP     gs://sra-pub-src-14/SRR12162161/VIC1809_R1.fq.gz.1      anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162161/VIC1809_R1.fq.gz.1       public  2020-07-07 09:29:48     VIC1809_R1.fq.gz        64cfe3abfe86fd6de4a6d9c92c6762a1        fastq  29252690 0       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162161/VIC1809_R1.fq.gz.1      Use Cloud Data Delivery  -       AWS     s3://sra-pub-src-13/SRR12162161/VIC1809_R2.fq.gz.1      Use Cloud Data Delivery-GCP     gs://sra-pub-src-14/SRR12162161/VIC1809_R2.fq.gz.1      anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162161/VIC1809_R2.fq.gz.1       public  2020-07-07 09:29:49     VIC1809_R2.fq.gzf067ca4b1e5dc1e5c7ef7b9493f64283        fastq   30030812        0       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162161/VIC1809_R2.fq.gz.1       anonymous       worldwide       NCBI    https://sra-download.ncbi.nlm.nih.gov/traces/sra62/SRR/011877/SRR12162161       anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR12162161/SRR12162161  aws identity    s3.us-east-1    AWS     s3://sra-pub-run-1/SRR12162161/SRR12162161.1    gcp identity    gs.US   GCP     gs://sra-pub-run-1/SRR12162161/SRR12162161.1    public  2020-07-07 09:30:08     SRR12162161     70041f189fa3e0b91101bd278edf10a1        run     43114033        1       Primary ETL     https://sra-download.ncbi.nlm.nih.gov/traces/sra62/SRR/011877/SRR12162161       1       107949010       false   host_sex       missing  passage_history Original        BioSampleModel  Pathogen.cl     isolate VIC1809 collected_by    Victorian Infectious Diseases Reference Laboratory (VIDRL)      collection_date 2020-05-29      geo_loc_name    Australia: Victoria    host     Homo sapiens    host_disease    COVID-19        isolation_source        missing lat_lon missing host_age       missing  EPI_ISL_480634  SAMN15459134    BioSample       XREF_LINK       DB: bioproject  ID: 613958      LABEL: PRJNA613958      2697049 SARS-Cov-2 VIC1809 (GISAID EPI_ISL_480634)      PRJNA613958     BioProject      Severe acute respiratory syndrome coronavirus 2 PRJNA613958     BioProject      Genomic sequence data of clinical SARS-CoV-2 samples.   Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome sequencing  Other   SRA1095659      SUB7730753      N/A    The Peter Doherty Institute for Infection and Immunity   Microbiology and Immunology     N/A     11292864
SRP253798       SRX8677876      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956962      hCoV-19/Australia/VIC1807/2020  NextSeq 550    256832   29447818        SRR12162162     256832  74949831        VIC1807_illumina        N/A     N/A     ARTIC v3, minimap2 v2.17, ivar v1.2.2, samtools v1.10. Using minimap2, short reads mapped to SARS-CoV-2 NCBI accession MN908947.3. Using samtools, proper_pairs (samflag 2) mapping to MN908947.3 retained, unmapped reads (samflag 4) discarded (to filter out non-SARS-CoV-2 cDNA). Filtered reads submitted to NCBI  SAMN15459133    N/A     VIC1807_illumina        N/A     N/A    N/A      N/A     ILLUMINA        SRS6956962      PAIRED  SAMN15459133    SRS6956962      74949831        N/A     Severe acute respiratory syndrome coronavirus 2 hCoV-19/Australia/VIC1807/2020  SARS-Cov-2 VIC1807 (GISAID EPI_ISL_480633)     2697049  VIC1807_R1.fq.gz        23110238        14250328        14197083        635     23391547        fastq   gs.US  gs       fastq   s3.us-east-1    s3      run     gs.US   gs      run     s3.us-east-1    s3      public  <Database><Table name="SEQUENCE"><Statistics source="meta"><Rows count="256832" /><Elements count="74949831" /></Statistics></Table></Database> true    true    2020-07-07 09:36:21     Use Cloud Data Delivery -       GCP     gs://sra-pub-src-13/SRR12162162/VIC1807_R1.fq.gz.1      Use Cloud Data Delivery -       AWS     s3://sra-pub-src-14/SRR12162162/VIC1807_R1.fq.gz.1     anonymous        worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162162/VIC1807_R1.fq.gz.1       public  2020-07-07 09:29:47     VIC1807_R1.fq.gz        cc3ef2144bdc64e05583e2d4bd827b43        fastq   201893990       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162162/VIC1807_R1.fq.gz.1       Use Cloud Data Delivery -       GCP     gs://sra-pub-src-13/SRR12162162/VIC1807_R2.fq.gz.1      Use Cloud Data Delivery -      AWS      s3://sra-pub-src-14/SRR12162162/VIC1807_R2.fq.gz.1      anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162162/VIC1807_R2.fq.gz.1       public  2020-07-07 09:29:48     VIC1807_R2.fq.gz80a8d955d712f241c04aa8c4df23ac5e        fastq   20579052        0       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162162/VIC1807_R2.fq.gz.1       anonymous       worldwide       NCBI    https://sra-download.ncbi.nlm.nih.gov/traces/sra76/SRR/011877/SRR12162162       anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR12162162/SRR12162162  aws identity    s3.us-east-1    AWS     s3://sra-pub-run-3/SRR12162162/SRR12162162.1    gcp identity    gs.US   GCP     gs://sra-pub-run-5/SRR12162162/SRR12162162.1    public  2020-07-07 09:29:57     SRR12162162     2b9aca651726a3d2bfe6953654969966        run     29449640        1       Primary ETL     https://sra-download.ncbi.nlm.nih.gov/traces/sra76/SRR/011877/SRR12162162       1       74949831        false   host_sex       male     passage_history Original        BioSampleModel  Pathogen.cl     isolate VIC1807 collected_by    Victorian Infectious Diseases Reference Laboratory (VIDRL)      collection_date 2020-05-27      geo_loc_name    Australia: Victoria    host     Homo sapiens    host_disease    COVID-19        isolation_source        missing lat_lon missing host_age       20       EPI_ISL_480633  SAMN15459133    BioSample       XREF_LINK       DB: bioproject  ID: 613958      LABEL: PRJNA613958      2697049 SARS-Cov-2 VIC1807 (GISAID EPI_ISL_480633)      PRJNA613958     BioProject      Severe acute respiratory syndrome coronavirus 2 PRJNA613958     BioProject      Genomic sequence data of clinical SARS-CoV-2 samples.   Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome sequencing  Other   SRA1095659      SUB7730753      N/A    The Peter Doherty Institute for Infection and Immunity   Microbiology and Immunology     N/A     11292863
SRP253798       SRX8677875      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956961      hCoV-19/Australia/VIC1806/2020  NextSeq 550    317415   36523725        SRR12162163     317415  92494821        VIC1806_illumina        N/A     N/A     ARTIC v3, minimap2 v2.17, ivar v1.2.2, samtools v1.10. Using minimap2, short reads mapped to SARS-CoV-2 NCBI accession MN908947.3. Using samtools, proper_pairs (samflag 2) mapping to MN908947.3 retained, unmapped reads (samflag 4) discarded (to filter out non-SARS-CoV-2 cDNA). Filtered reads submitted to NCBI  SAMN15459132    N/A     VIC1806_illumina        N/A     N/A    N/A      N/A     ILLUMINA        SRS6956961      PAIRED  SAMN15459132    SRS6956961      92494821        N/A     Severe acute respiratory syndrome coronavirus 2 hCoV-19/Australia/VIC1806/2020  SARS-Cov-2 VIC1806 (GISAID EPI_ISL_480632)     2697049  VIC1806_R1.fq.gz        28550455        17637130        17557318        871     28749047        fastq   gs.US  gs       fastq   s3.us-east-1    s3      run     gs.US   gs      run     s3.us-east-1    s3      public  <Database><Table name="SEQUENCE"><Statistics source="meta"><Rows count="317415" /><Elements count="92494821" /></Statistics></Table></Database> true    true    2020-07-07 09:35:31     Use Cloud Data Delivery -       GCP     gs://sra-pub-src-12/SRR12162163/VIC1806_R1.fq.gz.1      Use Cloud Data Delivery -       AWS     s3://sra-pub-src-12/SRR12162163/VIC1806_R1.fq.gz.1     anonymous        worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162163/VIC1806_R1.fq.gz.1       public  2020-07-07 09:29:48     VIC1806_R1.fq.gz        9bf5c34a8fe597743a4544c67ecc37a3        fastq   251884900       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162163/VIC1806_R1.fq.gz.1       Use Cloud Data Delivery -       GCP     gs://sra-pub-src-12/SRR12162163/VIC1806_R2.fq.gz.1      Use Cloud Data Delivery -      AWS      s3://sra-pub-src-12/SRR12162163/VIC1806_R2.fq.gz.1      anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162163/VIC1806_R2.fq.gz.1       public  2020-07-07 09:29:49     VIC1806_R2.fq.gza130ed3ccca73e183a84ba4c3381d8d5        fastq   25756376        0       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162163/VIC1806_R2.fq.gz.1       anonymous       worldwide       NCBI    https://sra-download.ncbi.nlm.nih.gov/traces/sra58/SRR/011877/SRR12162163       anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR12162163/SRR12162163  aws identity    s3.us-east-1    AWS     s3://sra-pub-run-8/SRR12162163/SRR12162163.1    gcp identity    gs.US   GCP     gs://sra-pub-run-9/SRR12162163/SRR12162163.1    public  2020-07-07 09:30:02     SRR12162163     c6edcb94437d2a9161d7ffd142bee2bb        run     36525549        1       Primary ETL     https://sra-download.ncbi.nlm.nih.gov/traces/sra58/SRR/011877/SRR12162163       1       92494821        false   host_sex       male     passage_history Original        BioSampleModel  Pathogen.cl     isolate VIC1806 collected_by    Victorian Infectious Diseases Reference Laboratory (VIDRL)      collection_date 2020-05-28      geo_loc_name    Australia: Victoria    host     Homo sapiens    host_disease    COVID-19        isolation_source        missing lat_lon missing host_age       19       EPI_ISL_480632  SAMN15459132    BioSample       XREF_LINK       DB: bioproject  ID: 613958      LABEL: PRJNA613958      2697049 SARS-Cov-2 VIC1806 (GISAID EPI_ISL_480632)      PRJNA613958     BioProject      Severe acute respiratory syndrome coronavirus 2 PRJNA613958     BioProject      Genomic sequence data of clinical SARS-CoV-2 samples.   Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome sequencing  Other   SRA1095659      SUB7730753      N/A    The Peter Doherty Institute for Infection and Immunity   Microbiology and Immunology     N/A     11292862
SRP253798       SRX8677874      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956960      hCoV-19/Australia/VIC1805/2020  NextSeq 550    362866   41227860        SRR12162164     362866  105727450       VIC1805_illumina        N/A     N/A     ARTIC v3, minimap2 v2.17, ivar v1.2.2, samtools v1.10. Using minimap2, short reads mapped to SARS-CoV-2 NCBI accession MN908947.3. Using samtools, proper_pairs (samflag 2) mapping to MN908947.3 retained, unmapped reads (samflag 4) discarded (to filter out non-SARS-CoV-2 cDNA). Filtered reads submitted to NCBI  SAMN15459131    N/A     VIC1805_illumina        N/A     N/A    N/A      N/A     ILLUMINA        SRS6956960      PAIRED  SAMN15459131    SRS6956960      105727450       N/A     Severe acute respiratory syndrome coronavirus 2 hCoV-19/Australia/VIC1805/2020  SARS-Cov-2 VIC1805 (GISAID EPI_ISL_480631)     2697049  VIC1805_R1.fq.gz        32624297        20177470        20088293        887     32836503        fastq   gs.US  gs       fastq   s3.us-east-1    s3      run     gs.US   gs      run     s3.us-east-1    s3      public  <Database><Table name="SEQUENCE"><Statistics source="meta"><Rows count="362866" /><Elements count="105727450" /></Statistics></Table></Database>        true    true    2020-07-07 09:36:21     Use Cloud Data Delivery -       GCP     gs://sra-pub-src-13/SRR12162164/VIC1805_R1.fq.gz.1      Use Cloud Data Delivery -       AWS     s3://sra-pub-src-14/SRR12162164/VIC1805_R1.fq.gz.1      anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162164/VIC1805_R1.fq.gz.1       public  2020-07-07 09:29:45     VIC1805_R1.fq.gz        b8b9808b08754caabbc69215a3022d34        fastq  28518688 0       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162164/VIC1805_R1.fq.gz.1      Use Cloud Data Delivery  -       GCP     gs://sra-pub-src-13/SRR12162164/VIC1805_R2.fq.gz.1      Use Cloud Data Delivery-AWS     s3://sra-pub-src-14/SRR12162164/VIC1805_R2.fq.gz.1      anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162164/VIC1805_R2.fq.gz.1       public  2020-07-07 09:29:45     VIC1805_R2.fq.gz68602ffab6c09a35016781e9374d41ad        fastq   29007781        0       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162164/VIC1805_R2.fq.gz.1       anonymous       worldwide       NCBI    https://sra-download.ncbi.nlm.nih.gov/traces/sra1/SRR/011877/SRR12162164        anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR12162164/SRR12162164  aws identity    s3.us-east-1    AWS     s3://sra-pub-run-7/SRR12162164/SRR12162164.1    gcp identity    gs.US   GCP     gs://sra-pub-run-3/SRR12162164/SRR12162164.1    public  2020-07-07 09:29:57     SRR12162164     d77c10aa36baa3b68d5e31b775dc5ce0        run     41229681        1       Primary ETL     https://sra-download.ncbi.nlm.nih.gov/traces/sra1/SRR/011877/SRR12162164        1       105727450       false   host_sex       male     passage_history Original        BioSampleModel  Pathogen.cl     isolate VIC1805 collected_by    Victorian Infectious Diseases Reference Laboratory (VIDRL)      collection_date 2020-05-29      geo_loc_name    Australia: Victoria    host     Homo sapiens    host_disease    COVID-19        isolation_source        missing lat_lon missing host_age       48       EPI_ISL_480631  SAMN15459131    BioSample       XREF_LINK       DB: bioproject  ID: 613958      LABEL: PRJNA613958      2697049 SARS-Cov-2 VIC1805 (GISAID EPI_ISL_480631)      PRJNA613958     BioProject      Severe acute respiratory syndrome coronavirus 2 PRJNA613958     BioProject      Genomic sequence data of clinical SARS-CoV-2 samples.   Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome sequencing  Other   SRA1095659      SUB7730753      N/A    The Peter Doherty Institute for Infection and Immunity   Microbiology and Immunology     N/A     11292861
SRP253798       SRX8677873      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956959      hCoV-19/Australia/VIC1804/2020  NextSeq 550    349048   39605824        SRR12162165     349048  101279219       VIC1804_illumina        N/A     N/A     ARTIC v3, minimap2 v2.17, ivar v1.2.2, samtools v1.10. Using minimap2, short reads mapped to SARS-CoV-2 NCBI accession MN908947.3. Using samtools, proper_pairs (samflag 2) mapping to MN908947.3 retained, unmapped reads (samflag 4) discarded (to filter out non-SARS-CoV-2 cDNA). Filtered reads submitted to NCBI  SAMN15459130    N/A     VIC1804_illumina        N/A     N/A    N/A      N/A     ILLUMINA        SRS6956959      PAIRED  SAMN15459130    SRS6956959      101279219       N/A     Severe acute respiratory syndrome coronavirus 2 hCoV-19/Australia/VIC1804/2020  SARS-Cov-2 VIC1804 (GISAID EPI_ISL_480630)     2697049  VIC1804_R1.fq.gz        31280053        19268066        19170574        815     31559711        fastq   gs.US  gs       fastq   s3.us-east-1    s3      run     gs.US   gs      run     s3.us-east-1    s3      public  <Database><Table name="SEQUENCE"><Statistics source="meta"><Rows count="349048" /><Elements count="101279219" /></Statistics></Table></Database>        true    true    2020-07-07 09:35:31     Use Cloud Data Delivery -       GCP     gs://sra-pub-src-11/SRR12162165/VIC1804_R1.fq.gz.1      Use Cloud Data Delivery -       AWS     s3://sra-pub-src-11/SRR12162165/VIC1804_R1.fq.gz.1      anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162165/VIC1804_R1.fq.gz.1       public  2020-07-07 09:29:45     VIC1804_R1.fq.gz        661d6f378dbb13fa39a621c1bb6e1c73        fastq  27327463 0       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162165/VIC1804_R1.fq.gz.1      Use Cloud Data Delivery  -       GCP     gs://sra-pub-src-11/SRR12162165/VIC1804_R2.fq.gz.1      Use Cloud Data Delivery-AWS     s3://sra-pub-src-11/SRR12162165/VIC1804_R2.fq.gz.1      anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162165/VIC1804_R2.fq.gz.1       public  2020-07-07 09:29:44     VIC1804_R2.fq.gzb98035bb6e0546e85de019c669cfc065        fastq   27789519        0       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162165/VIC1804_R2.fq.gz.1       anonymous       worldwide       NCBI    https://sra-download.ncbi.nlm.nih.gov/traces/sra38/SRR/011877/SRR12162165       anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR12162165/SRR12162165  aws identity    s3.us-east-1    AWS     s3://sra-pub-run-8/SRR12162165/SRR12162165.1    gcp identity    gs.US   GCP     gs://sra-pub-run-9/SRR12162165/SRR12162165.1    public  2020-07-07 09:29:53     SRR12162165     1dfd799cbb71bc43433546693b0ab050        run     39607641        1       Primary ETL     https://sra-download.ncbi.nlm.nih.gov/traces/sra38/SRR/011877/SRR12162165       1       101279219       false   host_sex       female   passage_history Original        BioSampleModel  Pathogen.cl     isolate VIC1804 collected_by    Victorian Infectious Diseases Reference Laboratory (VIDRL)      collection_date 2020-05-28      geo_loc_name    Australia: Victoria    host     Homo sapiens    host_disease    COVID-19        isolation_source        missing lat_lon missing host_age       37       EPI_ISL_480630  SAMN15459130    BioSample       XREF_LINK       DB: bioproject  ID: 613958      LABEL: PRJNA613958      2697049 SARS-Cov-2 VIC1804 (GISAID EPI_ISL_480630)      PRJNA613958     BioProject      Severe acute respiratory syndrome coronavirus 2 PRJNA613958     BioProject      Genomic sequence data of clinical SARS-CoV-2 samples.   Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome sequencing  Other   SRA1095659      SUB7730753      N/A    The Peter Doherty Institute for Infection and Immunity   Microbiology and Immunology     N/A     11292860
SRP253798       SRX8677872      Severe acute respiratory syndrome coronavirus 2 2697049 Severe acute respiratory syndrome coronavirus 2 AMPLICON        VIRAL RNA       PCR     SRS6956958      hCoV-19/Australia/VIC1803/2020  NextSeq 550    273575   31019982        SRR12162166     273575  78519046        VIC1803_illumina        N/A     N/A     ARTIC v3, minimap2 v2.17, ivar v1.2.2, samtools v1.10. Using minimap2, short reads mapped to SARS-CoV-2 NCBI accession MN908947.3. Using samtools, proper_pairs (samflag 2) mapping to MN908947.3 retained, unmapped reads (samflag 4) discarded (to filter out non-SARS-CoV-2 cDNA). Filtered reads submitted to NCBI  SAMN15459129    N/A     VIC1803_illumina        N/A     N/A    N/A      N/A     ILLUMINA        SRS6956958      PAIRED  SAMN15459129    SRS6956958      78519046        N/A     Severe acute respiratory syndrome coronavirus 2 hCoV-19/Australia/VIC1803/2020  SARS-Cov-2 VIC1803 (GISAID EPI_ISL_480629)     2697049  VIC1803_R1.fq.gz        24164366        14920735        14904609        596     24528740        fastq   gs.US  gs       fastq   s3.us-east-1    s3      run     gs.US   gs      run     s3.us-east-1    s3      public  <Database><Table name="SEQUENCE"><Statistics source="meta"><Rows count="273575" /><Elements count="78519046" /></Statistics></Table></Database> true    true    2020-07-07 09:36:21     Use Cloud Data Delivery -       GCP     gs://sra-pub-src-11/SRR12162166/VIC1803_R1.fq.gz.1      Use Cloud Data Delivery -       AWS     s3://sra-pub-src-11/SRR12162166/VIC1803_R1.fq.gz.1     anonymous        worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162166/VIC1803_R1.fq.gz.1       public  2020-07-07 09:29:45     VIC1803_R1.fq.gz        00b581c1d39d8b0031f5530efaedabe3        fastq   211061190       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162166/VIC1803_R1.fq.gz.1       Use Cloud Data Delivery -       GCP     gs://sra-pub-src-11/SRR12162166/VIC1803_R2.fq.gz.1      Use Cloud Data Delivery -      AWS      s3://sra-pub-src-11/SRR12162166/VIC1803_R2.fq.gz.1      anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162166/VIC1803_R2.fq.gz.1       public  2020-07-07 09:29:44     VIC1803_R2.fq.gzb03492e8051acd22ee08d4a006f88e20        fastq   21503972        0       Original        https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR12162166/VIC1803_R2.fq.gz.1       anonymous       worldwide       NCBI    https://sra-download.ncbi.nlm.nih.gov/traces/sra74/SRR/011877/SRR12162166       anonymous       worldwide       AWS     https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR12162166/SRR12162166  aws identity    s3.us-east-1    AWS     s3://sra-pub-run-8/SRR12162166/SRR12162166.1    gcp identity    gs.US   GCP     gs://sra-pub-run-9/SRR12162166/SRR12162166.1    public  2020-07-07 09:29:53     SRR12162166     232d6a49fe5c639023a6b2b49caf8855        run     31021804        1       Primary ETL     https://sra-download.ncbi.nlm.nih.gov/traces/sra74/SRR/011877/SRR12162166       1       78519046        false   host_sex       male     passage_history Original        BioSampleModel  Pathogen.cl     isolate VIC1803 collected_by    Victorian Infectious Diseases Reference Laboratory (VIDRL)      collection_date 2020-05-29      geo_loc_name    Australia: Victoria    host     Homo sapiens    host_disease    COVID-19        isolation_source        missing lat_lon missing host_age       37       EPI_ISL_480629  SAMN15459129    BioSample       XREF_LINK       DB: bioproject  ID: 613958      LABEL: PRJNA613958      2697049 SARS-Cov-2 VIC1803 (GISAID EPI_ISL_480629)      PRJNA613958     BioProject      Severe acute respiratory syndrome coronavirus 2 PRJNA613958     BioProject      Genomic sequence data of clinical SARS-CoV-2 samples.   Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome sequencing  Other   SRA1095659      SUB7730753      N/A    The Peter Doherty Institute for Infection and Immunity   Microbiology and Immunology     N/A     11292859

Saving Metadata to File#

Metadata retrieved can be saved in either a comma separated format or as a tab separated format.

$ pysradb search --db ena -q coronavirus --publication-date 01-08-2020:07-08-2020 --saveto coronavirus.csv

from pysradb.search import EnaSearch

instance = EnaSearch(verbosity=3, query=”coronavirus”, publication_date=”01-08-2020:07-08-2020”) instance.search() instance.get_df().to_csv(“coronavirus.csv”, index=False)


Generating Statistics and Graphs#

If the number of returned entries is large, it might be troublesome to filter through the metadata to find any information of interest. As a starting point, we can use the search feature to generate summary statistics and graphs for the search result:


Statistics:

$ pysradb search --db ena --organism "Severe acute respiratory syndrome coronavirus 2" --max 10000 -s
from pysradb.search import EnaSearch

instance = EnaSearch(return_max=10000, organism="Severe acute respiratory syndrome coronavirus 2")
instance.search()
instance.show_result_statistics()

Output:

Statistics for the search query:
=================================
Number of unique studies: 7
Number of unique experiments: 10000
Number of unique runs: 10000
Number of unique samples: 9797
Mean base count of samples: 238380171.626
Median base count of samples: 164470138.000
Sample base count standard deviation: 261654776.053
Date range:
    2020-04:  1299
    2020-05:  2518
    2020-06:  6181
    2020-07:  2

Organisms:
    Severe acute respiratory syndrome coronavirus 2:  10000

Platform:
    ILLUMINA:  5175
    OXFORD_NANOPORE:  4825

Library strategy:
    AMPLICON:  9789
    RNA-Seq:  1
    Targeted-Capture:  202
    WGS:  8

Library source:
    GENOMIC:  8
    METATRANSCRIPTOMIC:  1
    TRANSCRIPTOMIC:  1
    VIRAL RNA:  9990

Library selection:
    PCR:  9789
    RANDOM:  9
    other:  202

Library layout:
    PAIRED:  5059
    SINGLE:  4941

Graphs:

$ pysradb search --db ena -q e --max 500000 -g
from pysradb.search import EnaSearch

instance = EnaSearch(return_max=500000, "e")
instance.search()
instance.visualise_results()

Output: Graphs generated will automatically be saved under ./search_plots/. Optionally, graphs can be shown in python by including the argument show=True. You may refer to this Colab Notebook. for the full set of graphs.

Here are some of the available graphs that will be generated:

../_images/e1.png ../_images/e2.png ../_images/e3.png

previous

metadata

next

gse-to-gsm

On this page
  • Parameters
    • GeoSearch specific parameters:
      • Command-line Documentation
      • Usage Examples
  • Searching SRA database and retrieving metadata
  • Searching ENA database and retrieving metadata
  • Searching GEO Datasets database and retrieving metadata
    • Controlling the level of detail of the metadata retrieved
  • Saving Metadata to File
  • Generating Statistics and Graphs
Show Source

© Copyright 2023, Saket Choudhary.

Created using Sphinx 5.3.0.

Built with the PyData Sphinx Theme 0.13.3.