Open In Colab

Converting SRA to FASTQ using Conda

This notebook demonstrates how to convert SRA files to FASTQ format using conda and parallel-fastq-dump.

[ ]:
# Install pysradb if not already installed
try:
    import pysradb

    print(f"pysradb {pysradb.__version__} is already installed")
except ImportError:
    print("Installing pysradb from GitHub...")
    import sys

    !{sys.executable} -m pip install -q git+https://github.com/saketkc/pysradb
    print("pysradb installed successfully!")

Install Conda

[ ]:
!wget -c https://repo.continuum.io/archive/Anaconda3-5.1.0-Linux-x86_64.sh
!chmod +x Anaconda3-5.1.0-Linux-x86_64.sh
!bash ./Anaconda3-5.1.0-Linux-x86_64.sh -b -f -p /usr/local

import sys

sys.path.append("/usr/local/lib/python3.6/site-packages/")

!conda config --add channels defaults
!conda config --add channels bioconda
!conda config --add channels conda-forge
--2020-02-13 07:19:36--  https://repo.continuum.io/archive/Anaconda3-5.1.0-Linux-x86_64.sh
Resolving repo.continuum.io (repo.continuum.io)... 104.18.201.79, 104.18.200.79, 2606:4700::6812:c94f, ...
Connecting to repo.continuum.io (repo.continuum.io)|104.18.201.79|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 577996269 (551M) [application/x-sh]
Saving to: ‘Anaconda3-5.1.0-Linux-x86_64.sh’

Anaconda3-5.1.0-Lin 100%[===================>] 551.22M   131MB/s    in 4.3s

2020-02-13 07:19:46 (127 MB/s) - ‘Anaconda3-5.1.0-Linux-x86_64.sh’ saved [577996269/577996269]

PREFIX=/usr/local
installing: python-3.6.4-hc3d631a_1 ...
Python 3.6.4 :: Anaconda, Inc.
installing: ca-certificates-2017.08.26-h1d4fec5_0 ...
installing: conda-env-2.6.0-h36134e3_1 ...
installing: intel-openmp-2018.0.0-hc7b2577_8 ...
installing: libgcc-ng-7.2.0-h7cc24e2_2 ...
installing: libgfortran-ng-7.2.0-h9f7466a_2 ...
installing: libstdcxx-ng-7.2.0-h7a57d05_2 ...
installing: bzip2-1.0.6-h9a117a8_4 ...
installing: expat-2.2.5-he0dffb1_0 ...
installing: gmp-6.1.2-h6c8ec71_1 ...
installing: graphite2-1.3.10-hf63cedd_1 ...
installing: icu-58.2-h9c2bf20_1 ...
installing: jbig-2.1-hdba287a_0 ...
installing: jpeg-9b-h024ee3a_2 ...
installing: libffi-3.2.1-hd88cf55_4 ...
installing: libsodium-1.0.15-hf101ebd_0 ...
installing: libtool-2.4.6-h544aabb_3 ...
installing: libxcb-1.12-hcd93eb1_4 ...
installing: lzo-2.10-h49e0be7_2 ...
installing: mkl-2018.0.1-h19d6760_4 ...
installing: ncurses-6.0-h9df7e31_2 ...
installing: openssl-1.0.2n-hb7f436b_0 ...
installing: patchelf-0.9-hf79760b_2 ...
installing: pcre-8.41-hc27e229_1 ...
installing: pixman-0.34.0-hceecf20_3 ...
installing: tk-8.6.7-hc745277_3 ...
installing: unixodbc-2.3.4-hc36303a_1 ...
installing: xz-5.2.3-h55aa19d_2 ...
installing: yaml-0.1.7-had09818_2 ...
installing: zlib-1.2.11-ha838bed_2 ...
installing: glib-2.53.6-h5d9569c_2 ...
installing: hdf5-1.10.1-h9caa474_1 ...
installing: libedit-3.1-heed3624_0 ...
installing: libpng-1.6.34-hb9fc6fc_0 ...
installing: libssh2-1.8.0-h9cfc8f7_4 ...
installing: libtiff-4.0.9-h28f6b97_0 ...
installing: libxml2-2.9.7-h26e45fe_0 ...
installing: mpfr-3.1.5-h11a74b3_2 ...
installing: pandoc-1.19.2.1-hea2e7c5_1 ...
installing: readline-7.0-ha6073c6_4 ...
installing: zeromq-4.2.2-hbedb6e5_2 ...
installing: dbus-1.12.2-hc3f9b76_1 ...
installing: freetype-2.8-hab7d2ae_1 ...
installing: gstreamer-1.12.4-hb53b477_0 ...
installing: libcurl-7.58.0-h1ad7b7a_0 ...
installing: libxslt-1.1.32-h1312cb7_0 ...
installing: mpc-1.0.3-hec55b23_5 ...
installing: sqlite-3.22.0-h1bed415_0 ...
installing: curl-7.58.0-h84994c4_0 ...
installing: fontconfig-2.12.4-h88586e7_1 ...
installing: gst-plugins-base-1.12.4-h33fb286_0 ...
installing: alabaster-0.7.10-py36h306e16b_0 ...
installing: asn1crypto-0.24.0-py36_0 ...
installing: attrs-17.4.0-py36_0 ...
installing: backports-1.0-py36hfa02d7e_1 ...
installing: beautifulsoup4-4.6.0-py36h49b8c8c_1 ...
installing: bitarray-0.8.1-py36h14c3975_1 ...
installing: boto-2.48.0-py36h6e4cd66_1 ...
installing: cairo-1.14.12-h77bcde2_0 ...
installing: certifi-2018.1.18-py36_0 ...
installing: chardet-3.0.4-py36h0f667ec_1 ...
installing: click-6.7-py36h5253387_0 ...
installing: cloudpickle-0.5.2-py36_1 ...
installing: colorama-0.3.9-py36h489cec4_0 ...
installing: contextlib2-0.5.5-py36h6c84a62_0 ...
installing: dask-core-0.16.1-py36_0 ...
installing: decorator-4.2.1-py36_0 ...
installing: docutils-0.14-py36hb0f60f5_0 ...
installing: entrypoints-0.2.3-py36h1aec115_2 ...
installing: et_xmlfile-1.0.1-py36hd6bccc3_0 ...
installing: fastcache-1.0.2-py36h14c3975_2 ...
installing: filelock-2.0.13-py36h646ffb5_0 ...
installing: glob2-0.6-py36he249c77_0 ...
installing: gmpy2-2.0.8-py36hc8893dd_2 ...
installing: greenlet-0.4.12-py36h2d503a6_0 ...
installing: heapdict-1.0.0-py36_2 ...
installing: idna-2.6-py36h82fb2a8_1 ...
installing: imagesize-0.7.1-py36h52d8127_0 ...
installing: ipython_genutils-0.2.0-py36hb52b0d5_0 ...
installing: itsdangerous-0.24-py36h93cc618_1 ...
installing: jdcal-1.3-py36h4c697fb_0 ...
installing: lazy-object-proxy-1.3.1-py36h10fcdad_0 ...
installing: llvmlite-0.21.0-py36ha241eea_0 ...
installing: locket-0.2.0-py36h787c0ad_1 ...
installing: lxml-4.1.1-py36hf71bdeb_1 ...
installing: markupsafe-1.0-py36hd9260cd_1 ...
installing: mccabe-0.6.1-py36h5ad9710_1 ...
installing: mistune-0.8.3-py36_0 ...
installing: mkl-service-1.1.2-py36h17a0993_4 ...
installing: mpmath-1.0.0-py36hfeacd6b_2 ...
installing: msgpack-python-0.5.1-py36h6bb024c_0 ...
installing: multipledispatch-0.4.9-py36h41da3fb_0 ...
installing: numpy-1.14.0-py36h3dfced4_1 ...
installing: olefile-0.45.1-py36_0 ...
installing: pandocfilters-1.4.2-py36ha6701b7_1 ...
installing: parso-0.1.1-py36h35f843b_0 ...
installing: path.py-10.5-py36h55ceabb_0 ...
installing: pep8-1.7.1-py36_0 ...
installing: pickleshare-0.7.4-py36h63277f8_0 ...
installing: pkginfo-1.4.1-py36h215d178_1 ...
installing: pluggy-0.6.0-py36hb689045_0 ...
installing: ply-3.10-py36hed35086_0 ...
installing: psutil-5.4.3-py36h14c3975_0 ...
installing: ptyprocess-0.5.2-py36h69acd42_0 ...
installing: py-1.5.2-py36h29bf505_0 ...
installing: pycodestyle-2.3.1-py36hf609f19_0 ...
installing: pycosat-0.6.3-py36h0a5515d_0 ...
installing: pycparser-2.18-py36hf9f622e_1 ...
installing: pycrypto-2.6.1-py36h14c3975_7 ...
installing: pycurl-7.43.0.1-py36hb7f436b_0 ...
installing: pyodbc-4.0.22-py36hf484d3e_0 ...
installing: pyparsing-2.2.0-py36hee85983_1 ...
installing: pysocks-1.6.7-py36hd97a5b1_1 ...
installing: pytz-2017.3-py36h63b9c63_0 ...
installing: pyyaml-3.12-py36hafb9ca4_1 ...
installing: pyzmq-16.0.3-py36he2533c7_0 ...
installing: qt-5.6.2-h974d657_12 ...
installing: qtpy-1.3.1-py36h3691cc8_0 ...
installing: rope-0.10.7-py36h147e2ec_0 ...
installing: ruamel_yaml-0.15.35-py36h14c3975_1 ...
installing: send2trash-1.4.2-py36_0 ...
installing: simplegeneric-0.8.1-py36_2 ...
installing: sip-4.18.1-py36h51ed4ed_2 ...
installing: six-1.11.0-py36h372c433_1 ...
installing: snowballstemmer-1.2.1-py36h6febd40_0 ...
installing: sortedcontainers-1.5.9-py36_0 ...
installing: sphinxcontrib-1.0-py36h6d0f590_1 ...
installing: sqlalchemy-1.2.1-py36h14c3975_0 ...
installing: tblib-1.3.2-py36h34cf8b6_0 ...
installing: testpath-0.3.1-py36h8cadb63_0 ...
installing: toolz-0.9.0-py36_0 ...
installing: tornado-4.5.3-py36_0 ...
installing: typing-3.6.2-py36h7da032a_0 ...
installing: unicodecsv-0.14.1-py36ha668878_0 ...
installing: wcwidth-0.1.7-py36hdf4376a_0 ...
installing: webencodings-0.5.1-py36h800622e_1 ...
installing: werkzeug-0.14.1-py36_0 ...
installing: wrapt-1.10.11-py36h28b7045_0 ...
installing: xlrd-1.1.0-py36h1db9f0c_1 ...
installing: xlsxwriter-1.0.2-py36h3de1aca_0 ...
installing: xlwt-1.3.0-py36h7b00a1f_0 ...
installing: babel-2.5.3-py36_0 ...
installing: backports.shutil_get_terminal_size-1.0.0-py36hfea85ff_2 ...
installing: bottleneck-1.2.1-py36haac1ea0_0 ...
installing: cffi-1.11.4-py36h9745a5d_0 ...
installing: conda-verify-2.0.0-py36h98955d8_0 ...
installing: cycler-0.10.0-py36h93f1223_0 ...
installing: cytoolz-0.9.0-py36h14c3975_0 ...
installing: h5py-2.7.1-py36h3585f63_0 ...
installing: harfbuzz-1.7.4-hc5b324e_0 ...
installing: html5lib-1.0.1-py36h2f9c1c0_0 ...
installing: jedi-0.11.1-py36_0 ...
installing: networkx-2.1-py36_0 ...
installing: nltk-3.2.5-py36h7532b22_0 ...
installing: numba-0.36.2-np114py36hc6662d5_0 ...
installing: numexpr-2.6.4-py36hc4a3f9a_0 ...
installing: openpyxl-2.4.10-py36_0 ...
installing: packaging-16.8-py36ha668100_1 ...
installing: partd-0.3.8-py36h36fd896_0 ...
installing: pathlib2-2.3.0-py36h49efa8e_0 ...
installing: pexpect-4.3.1-py36_0 ...
installing: pillow-5.0.0-py36h3deb7b8_0 ...
installing: pyqt-5.6.0-py36h0386399_5 ...
installing: python-dateutil-2.6.1-py36h88d3b88_1 ...
installing: pywavelets-0.5.2-py36he602eb0_0 ...
installing: qtawesome-0.4.4-py36h609ed8c_0 ...
installing: scipy-1.0.0-py36hbf646e7_0 ...
installing: setuptools-38.4.0-py36_0 ...
installing: singledispatch-3.4.0.3-py36h7a266c3_0 ...
installing: sortedcollections-0.5.3-py36h3c761f9_0 ...
installing: sphinxcontrib-websupport-1.0.1-py36hb5cb234_1 ...
installing: sympy-1.1.1-py36hc6d1c1c_0 ...
installing: terminado-0.8.1-py36_1 ...
installing: traitlets-4.3.2-py36h674d592_0 ...
installing: zict-0.1.3-py36h3a3bf81_0 ...
installing: astroid-1.6.1-py36_0 ...
installing: bleach-2.1.2-py36_0 ...
installing: clyent-1.2.2-py36h7e57e65_1 ...
installing: cryptography-2.1.4-py36hd09be54_0 ...
installing: cython-0.27.3-py36h1860423_0 ...
installing: datashape-0.5.4-py36h3ad6b5c_0 ...
installing: distributed-1.20.2-py36_0 ...
installing: get_terminal_size-1.0.0-haa9412d_0 ...
installing: gevent-1.2.2-py36h2fe25dc_0 ...
installing: imageio-2.2.0-py36he555465_0 ...
installing: isort-4.2.15-py36had401c0_0 ...
installing: jinja2-2.10-py36ha16c418_0 ...
installing: jsonschema-2.6.0-py36h006f8b5_0 ...
installing: jupyter_core-4.4.0-py36h7c827e3_0 ...
installing: matplotlib-2.1.2-py36h0e671d2_0 ...
installing: navigator-updater-0.1.0-py36h14770f7_0 ...
installing: nose-1.3.7-py36hcdf7029_2 ...
installing: pandas-0.22.0-py36hf484d3e_0 ...
installing: pango-1.41.0-hd475d92_0 ...
installing: patsy-0.5.0-py36_0 ...
installing: pyflakes-1.6.0-py36h7bd6a15_0 ...
installing: pygments-2.2.0-py36h0d3125c_0 ...
installing: pytables-3.4.2-py36h3b5282a_2 ...
installing: pytest-3.3.2-py36_0 ...
installing: scikit-learn-0.19.1-py36h7aa7ec6_0 ...
installing: wheel-0.30.0-py36hfd4bba0_1 ...
installing: astropy-2.0.3-py36h14c3975_0 ...
installing: bkcharts-0.2-py36h735825a_0 ...
installing: bokeh-0.12.13-py36h2f9c1c0_0 ...
installing: flask-0.12.2-py36hb24657c_0 ...
installing: jupyter_client-5.2.2-py36_0 ...
installing: nbformat-4.4.0-py36h31c9010_0 ...
installing: pip-9.0.1-py36h6c6f9ce_4 ...
installing: prompt_toolkit-1.0.15-py36h17d85b1_0 ...
installing: pylint-1.8.2-py36_0 ...
installing: pyopenssl-17.5.0-py36h20ba746_0 ...
installing: statsmodels-0.8.0-py36h8533d0b_0 ...
installing: dask-0.16.1-py36_0 ...
installing: flask-cors-3.0.3-py36h2d857d3_0 ...
installing: ipython-6.2.1-py36h88c514a_1 ...
installing: nbconvert-5.3.1-py36hb41ffb7_0 ...
installing: seaborn-0.8.1-py36hfad7ec4_0 ...
installing: urllib3-1.22-py36hbe7ace6_0 ...
installing: ipykernel-4.8.0-py36_0 ...
installing: odo-0.5.1-py36h90ed295_0 ...
installing: requests-2.18.4-py36he2e5f8d_1 ...
installing: scikit-image-0.13.1-py36h14c3975_1 ...
installing: anaconda-client-1.6.9-py36_0 ...
installing: blaze-0.11.3-py36h4e06776_0 ...
installing: jupyter_console-5.2.0-py36he59e554_1 ...
installing: notebook-5.4.0-py36_0 ...
installing: qtconsole-4.3.1-py36h8f73b5b_0 ...
installing: sphinx-1.6.6-py36_0 ...
installing: anaconda-project-0.8.2-py36h44fb852_0 ...
installing: jupyterlab_launcher-0.10.2-py36_0 ...
installing: numpydoc-0.7.0-py36h18f165f_0 ...
installing: widgetsnbextension-3.1.0-py36_0 ...
installing: anaconda-navigator-1.7.0-py36_0 ...
installing: ipywidgets-7.1.1-py36_0 ...
installing: jupyterlab-0.31.5-py36_0 ...
installing: spyder-3.2.6-py36_0 ...
installing: _ipyw_jlab_nb_ext_conf-0.1.0-py36he11e457_0 ...
installing: jupyter-1.0.0-py36_4 ...
installing: anaconda-5.1.0-py36_2 ...
installing: conda-4.4.10-py36_0 ...
installing: conda-build-3.4.1-py36_0 ...
installation finished.
WARNING:
    You currently have a PYTHONPATH environment variable set. This may cause
    unexpected behavior when running the Python interpreter in Anaconda3.
    For best results, please verify that your PYTHONPATH only points to
    directories of packages that are compatible with the Python interpreter
    in Anaconda3: /usr/local
Warning: 'defaults' already in 'channels' list, moving to the top

Install parallel-fastq-dump

[ ]:
!conda install -y parallel-fastq-dump
Solving environment: - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ done


==> WARNING: A newer version of conda exists. <==
  current version: 4.4.10
  latest version: 4.8.2

Please update conda by running

    $ conda update -n base conda



## Package Plan ##

  environment location: /usr/local

  added / updated specs:
    - parallel-fastq-dump


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    bzip2-1.0.8                |       h516909a_2         396 KB  conda-forge
    parallel-fastq-dump-0.6.6  |             py_0           8 KB  bioconda
    python-3.7.1               |       h5001a0f_0        26.8 MB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    _libgcc_mutex-0.1          |      conda_forge           3 KB  conda-forge
    libgcc-ng-9.2.0            |       h24d8f2e_2         8.2 MB  conda-forge
    sqlite-3.28.0              |       h8b20d00_0         1.9 MB  conda-forge
    sra-tools-2.9.1_1          |       h470a237_0        38.0 MB  bioconda
    libgomp-9.2.0              |       h24d8f2e_2         816 KB  conda-forge
    certifi-2019.11.28         |           py37_0         148 KB  conda-forge
    ncurses-6.1                |       hfc679d8_2         1.3 MB  conda-forge
    pip-20.0.2                 |             py_2         1.0 MB  conda-forge
    readline-7.0               |    hf8c457e_1001         391 KB  conda-forge
    xz-5.2.4                   |    h14c3975_1001         366 KB  conda-forge
    libffi-3.2.1               |       hfc679d8_5          51 KB  conda-forge
    zlib-1.2.11                |    h516909a_1006         105 KB  conda-forge
    openssl-1.0.2u             |       h516909a_0         3.2 MB  conda-forge
    tk-8.6.10                  |       hed695b0_0         3.2 MB  conda-forge
    setuptools-45.2.0          |           py37_0         654 KB  conda-forge
    _openmp_mutex-4.5          |            0_gnu         435 KB  conda-forge
    wheel-0.34.2               |             py_1          24 KB  conda-forge
    ------------------------------------------------------------
                                           Total:        87.1 MB

The following NEW packages will be INSTALLED:

    _libgcc_mutex:       0.1-conda_forge       conda-forge
    _openmp_mutex:       4.5-0_gnu             conda-forge
    libgomp:             9.2.0-h24d8f2e_2      conda-forge
    parallel-fastq-dump: 0.6.6-py_0            bioconda
    sra-tools:           2.9.1_1-h470a237_0    bioconda

The following packages will be UPDATED:

    bzip2:               1.0.6-h9a117a8_4                  --> 1.0.8-h516909a_2      conda-forge
    ca-certificates:     2017.08.26-h1d4fec5_0             --> 2019.11.28-hecc5488_0 conda-forge
    certifi:             2018.1.18-py36_0                  --> 2019.11.28-py37_0     conda-forge
    libffi:              3.2.1-hd88cf55_4                  --> 3.2.1-hfc679d8_5      conda-forge
    libgcc-ng:           7.2.0-h7cc24e2_2                  --> 9.2.0-h24d8f2e_2      conda-forge
    ncurses:             6.0-h9df7e31_2                    --> 6.1-hfc679d8_2        conda-forge
    openssl:             1.0.2n-hb7f436b_0                 --> 1.0.2u-h516909a_0     conda-forge
    pip:                 9.0.1-py36h6c6f9ce_4              --> 20.0.2-py_2           conda-forge
    python:              3.6.4-hc3d631a_1                  --> 3.7.1-h5001a0f_0      conda-forge
    readline:            7.0-ha6073c6_4                    --> 7.0-hf8c457e_1001     conda-forge
    setuptools:          38.4.0-py36_0                     --> 45.2.0-py37_0         conda-forge
    sqlite:              3.22.0-h1bed415_0                 --> 3.28.0-h8b20d00_0     conda-forge
    tk:                  8.6.7-hc745277_3                  --> 8.6.10-hed695b0_0     conda-forge
    wheel:               0.30.0-py36hfd4bba0_1             --> 0.34.2-py_1           conda-forge
    xz:                  5.2.3-h55aa19d_2                  --> 5.2.4-h14c3975_1001   conda-forge
    zlib:                1.2.11-ha838bed_2                 --> 1.2.11-h516909a_1006  conda-forge


Downloading and Extracting Packages
bzip2 1.0.8: 100% 1.0/1 [00:00<00:00,  4.95it/s]
parallel-fastq-dump 0.6.6: 100% 1.0/1 [00:00<00:00, 25.61it/s]
python 3.7.1: 100% 1.0/1 [00:08<00:00,  8.74s/it]
ca-certificates 2019.11.28: 100% 1.0/1 [00:00<00:00, 12.07it/s]
_libgcc_mutex 0.1: 100% 1.0/1 [00:00<00:00, 22.60it/s]
libgcc-ng 9.2.0: 100% 1.0/1 [00:02<00:00,  2.84s/it]
sqlite 3.28.0: 100% 1.0/1 [00:00<00:00,  1.35it/s]
sra-tools 2.9.1_1: 100% 1.0/1 [00:13<00:00, 30.18s/it]
libgomp 9.2.0: 100% 1.0/1 [00:00<00:00,  3.97it/s]
certifi 2019.11.28: 100% 1.0/1 [00:00<00:00, 11.58it/s]
ncurses 6.1: 100% 1.0/1 [00:01<00:00,  1.22s/it]
pip 20.0.2: 100% 1.0/1 [00:00<00:00,  1.83it/s]
readline 7.0: 100% 1.0/1 [00:00<00:00,  4.98it/s]
xz 5.2.4: 100% 1.0/1 [00:00<00:00,  5.48it/s]
libffi 3.2.1: 100% 1.0/1 [00:00<00:00, 16.06it/s]
zlib 1.2.11: 100% 1.0/1 [00:00<00:00, 14.34it/s]
openssl 1.0.2u: 100% 1.0/1 [00:01<00:00,  2.86s/it]
tk 8.6.10: 100% 1.0/1 [00:01<00:00,  1.28s/it]
setuptools 45.2.0: 100% 1.0/1 [00:00<00:00,  2.78it/s]
_openmp_mutex 4.5: 100% 1.0/1 [00:00<00:00,  7.34it/s]
wheel 0.34.2: 100% 1.0/1 [00:00<00:00, 21.98it/s]
Preparing transaction: / - \ done
Verifying transaction: / - \ | / - \ | / - \ | / - \ | / - \ | / - done
Executing transaction: | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | done

Install latest pysradb

[ ]:
pip install git+https://github.com/saketkc/pysradb
Collecting git+https://github.com/saketkc/pysradb
  Cloning https://github.com/saketkc/pysradb to /tmp/pip-req-build-bd1zhhoz
  Running command git clone -q https://github.com/saketkc/pysradb /tmp/pip-req-build-bd1zhhoz
Collecting pandas==0.25.3
  Using cached pandas-0.25.3-cp37-cp37m-manylinux1_x86_64.whl (10.4 MB)
Collecting tqdm==4.41.1
  Using cached tqdm-4.41.1-py2.py3-none-any.whl (56 kB)
Collecting requests==2.22.0
  Using cached requests-2.22.0-py2.py3-none-any.whl (57 kB)
Collecting xmltodict==0.12.0
  Using cached xmltodict-0.12.0-py2.py3-none-any.whl (9.2 kB)
Collecting python-dateutil>=2.6.1
  Using cached python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB)
Collecting numpy>=1.13.3
  Using cached numpy-1.18.1-cp37-cp37m-manylinux1_x86_64.whl (20.1 MB)
Collecting pytz>=2017.2
  Using cached pytz-2019.3-py2.py3-none-any.whl (509 kB)
Collecting idna<2.9,>=2.5
  Using cached idna-2.8-py2.py3-none-any.whl (58 kB)
Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1
  Using cached urllib3-1.25.8-py2.py3-none-any.whl (125 kB)
Collecting chardet<3.1.0,>=3.0.2
  Using cached chardet-3.0.4-py2.py3-none-any.whl (133 kB)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/site-packages (from requests==2.22.0->pysradb==0.10.3.dev0) (2019.11.28)
Collecting six>=1.5
  Using cached six-1.14.0-py2.py3-none-any.whl (10 kB)
Building wheels for collected packages: pysradb
  Building wheel for pysradb (setup.py) ... done
  Created wheel for pysradb: filename=pysradb-0.10.3.dev0-py3-none-any.whl size=147407 sha256=b498f377cda436cca6ee34c470c8aabcbe9a75f5fe8af7a5e6c56796c1be9041
  Stored in directory: /tmp/pip-ephem-wheel-cache-a6fjccpo/wheels/3f/06/98/98805e85e0909f2d0920ce73557c06d3802e4baaa2616920e8
Successfully built pysradb
Installing collected packages: six, python-dateutil, numpy, pytz, pandas, tqdm, idna, urllib3, chardet, requests, xmltodict, pysradb
Successfully installed chardet-3.0.4 idna-2.8 numpy-1.18.1 pandas-0.25.3 pysradb-0.10.3.dev0 python-dateutil-2.8.1 pytz-2019.3 requests-2.22.0 six-1.14.0 tqdm-4.41.1 urllib3-1.25.8 xmltodict-0.12.0

Data type cannot be displayed: application/vnd.colab-display-data+json

Get metadata

[ ]:
!pysradb metadata --detailed        SRP063852
study_accession experiment_accession experiment_title                                         experiment_desc                                          organism_taxid  organism_name library_strategy library_source  library_selection   sample_accession sample_title instrument           total_spots total_size run_accession run_total_spots run_total_bases run_alias      sra_url                                                                                 experiment_alias source_name cell line
SRP063852       SRX1254413           GSM1887643: ribosome profiling; Homo sapiens; miRNA-Seq  GSM1887643: ribosome profiling; Homo sapiens; miRNA-Seq  9606            Homo sapiens  miRNA-Seq        TRANSCRIPTOMIC  size fractionation  SRS1072728       N/A          Illumina HiSeq 2000  31967082    626381849  SRR2433794    31967082        916773615       GSM1887643_r1  https://sra-download.st-va.ncbi.nlm.nih.gov/sos2/sra-pub-run-3/SRR2433794/SRR2433794.1  GSM1887643       HEK293      HEK293

Download data

[ ]:
!pysradb download -y -p SRP063852
The following files will be downloaded:

study_accession experiment_accession experiment_title                                         experiment_desc                                          organism_taxid  organism_name library_strategy library_source  library_selection   sample_accession sample_title instrument           total_spots total_size run_accession run_total_spots run_total_bases run_alias      srapath_url                                                                             experiment_alias source_name cell line download_url
 SRP063852       SRX1254413           GSM1887643: ribosome profiling; Homo sapiens; miRNA-Seq  GSM1887643: ribosome profiling; Homo sapiens; miRNA-Seq  9606            Homo sapiens  miRNA-Seq        TRANSCRIPTOMIC  size fractionation  SRS1072728       N/A          Illumina HiSeq 2000  31967082    626381849  SRR2433794    31967082        916773615       GSM1887643_r1  https://sra-download.st-va.ncbi.nlm.nih.gov/sos2/sra-pub-run-3/SRR2433794/SRR2433794.1  GSM1887643       HEK293      HEK293


Total size: 626.4 MB


SRP063852/SRX1254413/SRR2433794:   0% 0/1 [00:00<?, ?it/s]
Downloading SRR2433794.1:   0% 0.00/626M [00:00<?, ?B/s]
Downloading SRR2433794.1:   0% 1.05M/626M [00:00<01:04, 9.72MB/s]
Downloading SRR2433794.1:   3% 21.0M/626M [00:00<00:45, 13.3MB/s]
Downloading SRR2433794.1:   6% 35.7M/626M [00:00<00:32, 18.2MB/s]
Downloading SRR2433794.1:   7% 43.0M/626M [00:00<00:25, 23.3MB/s]
Downloading SRR2433794.1:   9% 58.7M/626M [00:00<00:18, 31.2MB/s]
Downloading SRR2433794.1:  11% 68.2M/626M [00:00<00:14, 38.2MB/s]
Downloading SRR2433794.1:  13% 81.8M/626M [00:00<00:11, 48.6MB/s]
Downloading SRR2433794.1:  15% 92.3M/626M [00:00<00:09, 56.3MB/s]
Downloading SRR2433794.1:  16% 103M/626M [00:01<00:08, 64.1MB/s]
Downloading SRR2433794.1:  18% 113M/626M [00:01<00:07, 68.4MB/s]
Downloading SRR2433794.1:  20% 123M/626M [00:01<00:06, 74.3MB/s]
Downloading SRR2433794.1:  21% 132M/626M [00:01<00:06, 72.8MB/s]
Downloading SRR2433794.1:  23% 142M/626M [00:01<00:06, 75.6MB/s]
Downloading SRR2433794.1:  24% 151M/626M [00:01<00:06, 74.0MB/s]
Downloading SRR2433794.1:  26% 161M/626M [00:01<00:05, 80.8MB/s]
Downloading SRR2433794.1:  27% 171M/626M [00:01<00:05, 78.8MB/s]
Downloading SRR2433794.1:  29% 184M/626M [00:02<00:05, 87.8MB/s]
Downloading SRR2433794.1:  31% 193M/626M [00:02<00:04, 88.0MB/s]
Downloading SRR2433794.1:  33% 207M/626M [00:02<00:04, 97.5MB/s]
Downloading SRR2433794.1:  35% 217M/626M [00:02<00:04, 91.6MB/s]
Downloading SRR2433794.1:  37% 231M/626M [00:02<00:04, 86.9MB/s]
Downloading SRR2433794.1:  39% 245M/626M [00:02<00:03, 98.8MB/s]
Downloading SRR2433794.1:  41% 257M/626M [00:02<00:03, 94.1MB/s]
Downloading SRR2433794.1:  43% 268M/626M [00:02<00:03, 99.1MB/s]
Downloading SRR2433794.1:  45% 279M/626M [00:03<00:03, 97.1MB/s]
Downloading SRR2433794.1:  46% 289M/626M [00:03<00:03, 98.1MB/s]
Downloading SRR2433794.1:  48% 300M/626M [00:03<00:03, 92.6MB/s]
Downloading SRR2433794.1:  50% 312M/626M [00:03<00:03, 99.9MB/s]
Downloading SRR2433794.1:  52% 325M/626M [00:03<00:03, 99.9MB/s]
Downloading SRR2433794.1:  54% 336M/626M [00:03<00:03, 94.2MB/s]
Downloading SRR2433794.1:  56% 348M/626M [00:03<00:02, 101MB/s]
Downloading SRR2433794.1:  57% 359M/626M [00:03<00:02, 99.1MB/s]
Downloading SRR2433794.1:  59% 372M/626M [00:03<00:02, 107MB/s]
Downloading SRR2433794.1:  61% 384M/626M [00:04<00:02, 93.7MB/s]
Downloading SRR2433794.1:  63% 394M/626M [00:04<00:02, 96.0MB/s]
Downloading SRR2433794.1:  65% 405M/626M [00:04<00:02, 92.5MB/s]
Downloading SRR2433794.1:  67% 419M/626M [00:04<00:02, 90.3MB/s]
Downloading SRR2433794.1:  69% 430M/626M [00:04<00:02, 92.1MB/s]
Downloading SRR2433794.1:  70% 440M/626M [00:04<00:02, 88.8MB/s]
Downloading SRR2433794.1:  72% 453M/626M [00:04<00:01, 97.0MB/s]
Downloading SRR2433794.1:  74% 463M/626M [00:04<00:01, 92.5MB/s]
Downloading SRR2433794.1:  76% 477M/626M [00:05<00:01, 102MB/s]
Downloading SRR2433794.1:  78% 489M/626M [00:05<00:01, 99.6MB/s]
Downloading SRR2433794.1:  80% 500M/626M [00:05<00:01, 103MB/s]
Downloading SRR2433794.1:  82% 512M/626M [00:05<00:01, 95.6MB/s]
Downloading SRR2433794.1:  84% 524M/626M [00:05<00:01, 88.6MB/s]
Downloading SRR2433794.1:  85% 535M/626M [00:05<00:00, 91.6MB/s]
Downloading SRR2433794.1:  87% 545M/626M [00:05<00:00, 87.5MB/s]
Downloading SRR2433794.1:  89% 556M/626M [00:05<00:00, 90.1MB/s]
Downloading SRR2433794.1:  90% 566M/626M [00:06<00:00, 83.4MB/s]
Downloading SRR2433794.1:  92% 578M/626M [00:06<00:00, 90.2MB/s]
Downloading SRR2433794.1:  94% 587M/626M [00:06<00:00, 88.8MB/s]
Downloading SRR2433794.1:  96% 599M/626M [00:06<00:00, 94.0MB/s]
Downloading SRR2433794.1:  97% 609M/626M [00:06<00:00, 91.4MB/s]
Downloading SRR2433794.1: 627MB [00:06, 94.5MB/s]
SRP063852/SRX1254413/SRR2433794: 100% 1/1 [00:06<00:00,  6.84s/it]

Run parallel-fastq-dump

[ ]:
!ls -ltrh pysradb_downloads
total 4.0K
drwxr-xr-x 3 root root 4.0K Feb 13 07:28 SRP063852
[ ]:
!ls -ltrh pysradb_downloads/SRP063852/SRX1254413
total 598M
-rw-r--r-- 1 root root 598M Feb 13 07:28 SRR2433794.sra

SRA to fastq

[ ]:
!mkdir -p sratofastq && mkdir -p tmpdir && parallel-fastq-dump --threads 4 --outdir sratofastq/ --split-files --tmpdir tmpdir --gzip -s pysradb_downloads/SRP063852/SRX1254413/SRR2433794.sra
SRR ids: ['pysradb_downloads/SRP063852/SRX1254413/SRR2433794.sra']
extra args: ['--split-files', '--gzip']
tempdir: tmpdir/pfd_wgclvuwy
pysradb_downloads/SRP063852/SRX1254413/SRR2433794.sra spots: 31967082
blocks: [[1, 7991770], [7991771, 15983540], [15983541, 23975310], [23975311, 31967082]]
Rejected 7991770 READS because READLEN < 1
Read 7991770 spots for pysradb_downloads/SRP063852/SRX1254413/SRR2433794.sra
Written 7991770 spots for pysradb_downloads/SRP063852/SRX1254413/SRR2433794.sra
Rejected 7991772 READS because READLEN < 1
Read 7991772 spots for pysradb_downloads/SRP063852/SRX1254413/SRR2433794.sra
Written 7991772 spots for pysradb_downloads/SRP063852/SRX1254413/SRR2433794.sra
Rejected 7991770 READS because READLEN < 1
Read 7991770 spots for pysradb_downloads/SRP063852/SRX1254413/SRR2433794.sra
Written 7991770 spots for pysradb_downloads/SRP063852/SRX1254413/SRR2433794.sra
Rejected 7991770 READS because READLEN < 1
Read 7991770 spots for pysradb_downloads/SRP063852/SRX1254413/SRR2433794.sra
Written 7991770 spots for pysradb_downloads/SRP063852/SRX1254413/SRR2433794.sra
[ ]:
!ls -ltrh sratofastq
ls: cannot access 'sratofastq': No such file or directory