pysradb package¶
Submodules¶
pysradb.cli module¶
Command line interface for pysradb
- class pysradb.cli.ArgParser(prog=None, usage=None, description=None, epilog=None, parents=[], formatter_class=<class 'argparse.HelpFormatter'>, prefix_chars='-', fromfile_prefix_chars=None, argument_default=None, conflict_handler='error', add_help=True, allow_abbrev=True, exit_on_error=True)[source]¶
Bases:
ArgumentParser
- class pysradb.cli.CustomFormatterArgP(prog, indent_increment=2, max_help_position=24, width=None)[source]¶
Bases:
ArgumentDefaultsHelpFormatter,RawDescriptionHelpFormatter
- pysradb.cli.download(out_dir, srx, srp, geo, skip_confirmation, col='public_url', use_ascp=False, threads=1)[source]¶
- pysradb.cli.metadata(srp_id, assay, desc, detailed, expand, saveto, enrich=False, enrich_backend=None, embed_model=None)[source]¶
pysradb.download module¶
Utility function to download data
- pysradb.download.download_file(url, file_path, md5_hash=None, timeout=10, block_size=1048576, show_progress=False)[source]¶
Resumable download. Expect the server to support byte ranges.
- Parameters:
- url: string
URL
- file_path: string
Local file path to store the downloaded file
- md5_hash: string
Expected MD5 string of downloaded file
- timeout: int
Seconds to wait before terminating request
- block_size: int
Chunkx of bytes to read (default: 1024 * 1024 = 1MB)
- show_progress: bool
Show progress bar
- pysradb.download.get_file_size(row, url_col)[source]¶
Get size of file to be downloaded.
- Parameters:
- row: pd.DataFrame row
- url_col: str
url_column
- Returns:
- content_length: int
pysradb.enrichment module¶
pysradb.exceptions module¶
This file contains custom Exceptions for pysradb
pysradb.filter_attrs module¶
- pysradb.filter_attrs.expand_sample_attribute_columns(metadata_df)[source]¶
Expand sample attribute columns to individual columns.
Since the sample_attribute column content can be different for differnt rows even if coming from the same project (SRP), we explicitly iterate through the rows to first determine what additional columns need to be created.
- Parameters:
- metadata_df: DataFrame
Dataframe as obtained from sra_metadata or equivalent
- Returns:
- expanded_df: DataFrame
Dataframe with additionals columns pertaining to sample_attribute appended
- pysradb.filter_attrs.guess_cell_type(sample_attribute)[source]¶
Guess possible cell line from sample_attribute data.
- Parameters:
- sample_attribute: string
sample_attribute string as in the metadata column
- Returns:
- cell_type: string
Possible cell type of sample. Returns None if no match found.
pysradb.geoweb module¶
Utilities to interact with GEO online
- class pysradb.geoweb.GEOweb(verbose=True)[source]¶
Bases:
object- download(links, root_url, gse, verbose=False, out_dir=None)[source]¶
Download GEO files.
- Parameters:
- links: list
List of all links valid downloadable present for a GEO ID
- root_url: string
url for root directory for a GEO ID
- gse: string
GEO ID
- verbose: bool
Print file list
- out_dir: string
Directory location for download
- pysradb.geoweb.download_geo_matrix(accession, output_dir='.')[source]¶
Download a GEO Matrix file for a given GEO accession ID.
- Args:
accession (str): GEO accession ID (e.g., ‘GSE234190’). output_dir (str): Directory to save the downloaded file (default: current directory).
- Returns:
str: Path to the downloaded file.
- Raises:
Exception: If the download fails.
- pysradb.geoweb.parse_geo_matrix_to_tsv(input_file, output_file)[source]¶
Parse a GEO Matrix file to a TSV file, extracting the expression data.
- Args:
input_file (str): Path to the input GEO Matrix file (gzipped). output_file (str): Path to save the output TSV file.
- Returns:
pandas.DataFrame: The parsed expression data.
pysradb.mcp_server module¶
Model Context Protocol server for pysradb.
The MCP dependency is optional. Install with pysradb[mcp] before running
the pysradb-mcp console script.
- pysradb.mcp_server.convert_accession(accession: str, target: str, detailed: bool = False, include_sample_attributes: bool = False, expand_sample_attributes: bool = False, limit: int = 20) dict[str, Any][source]¶
Convert between common GEO and SRA accessions.
- pysradb.mcp_server.convert_bioproject_to_srp(bioproject: str) dict[str, Any][source]¶
Convert a PRJNA BioProject accession to matching SRP accessions.
- pysradb.mcp_server.convert_doi_to_pmid(doi: str) dict[str, Any][source]¶
Convert DOI identifiers to PMIDs.
- pysradb.mcp_server.convert_pmid_to_pmc(pmid: str) dict[str, Any][source]¶
Convert PMID identifiers to PMC identifiers.
- pysradb.mcp_server.extract_identifiers_from_text(text: str) dict[str, Any][source]¶
Extract dataset identifiers from supplied text without network access.
- pysradb.mcp_server.get_ena_fastq_urls(srp: str, limit: int = 20) dict[str, Any][source]¶
Fetch ENA FASTQ URLs for an SRA project accession without downloading files.
- pysradb.mcp_server.get_gds_results(gse: str, limit: int = 20) dict[str, Any][source]¶
Fetch NCBI GEO DataSets summary results for a GSE accession.
- pysradb.mcp_server.get_geo_matrix_url(accession: str) dict[str, str][source]¶
Return the GEO Series Matrix URL for a GSE accession without downloading.
- pysradb.mcp_server.get_geo_metadata(gse: str, detailed: bool = False, include_sample_attributes: bool = False, expand_sample_attributes: bool = False, include_pmids: bool = False, limit: int = 20) dict[str, Any][source]¶
Fetch GEO metadata for a GSE accession.
- pysradb.mcp_server.get_geo_supplementary_links(gse: str) dict[str, Any][source]¶
List GEO supplementary file links for a GSE accession without downloading.
- pysradb.mcp_server.get_gse_from_doi(doi: str, limit: int = 20) dict[str, Any][source]¶
Get GSE accessions from DOIs.
- pysradb.mcp_server.get_gse_from_pmid(pmid: str, limit: int = 20) dict[str, Any][source]¶
Get GSE accessions from PMIDs.
- pysradb.mcp_server.get_gsm_soft_metadata(gsm_ids: str) dict[str, Any][source]¶
Fetch parsed GEO SOFT metadata for one or more GSM accessions.
- pysradb.mcp_server.get_identifiers_from_doi(doi: str, limit: int = 20) dict[str, Any][source]¶
Extract dataset identifiers from DOI-linked articles.
- pysradb.mcp_server.get_identifiers_from_pmc(pmc_id: str, convert_missing: bool = True, limit: int = 20) dict[str, Any][source]¶
Extract GSE, PRJNA, SRP, SRR, SRX, and SRS identifiers from PMC articles.
- pysradb.mcp_server.get_identifiers_from_pmid(pmid: str, limit: int = 20) dict[str, Any][source]¶
Extract dataset identifiers from PubMed articles via PMC links.
- pysradb.mcp_server.get_metadata(accession: str, detailed: bool = False, include_sample_attributes: bool = False, expand_sample_attributes: bool = False, limit: int = 20) dict[str, Any][source]¶
Fetch SRA or GEO metadata for an SRP or GSE accession.
- pysradb.mcp_server.get_pmc_fulltext_excerpt(pmc_id: str, char_limit: int = 4000) dict[str, Any][source]¶
Fetch a bounded PMC full-text XML excerpt for inspection.
- pysradb.mcp_server.get_pmids_for_arrayexpress(accession: str, limit: int = 20) dict[str, Any][source]¶
Get PMIDs for ArrayExpress accessions.
- pysradb.mcp_server.get_pmids_for_bioproject(bioproject: str) dict[str, Any][source]¶
Fetch PMIDs associated with BioProject accessions.
- pysradb.mcp_server.get_pmids_for_ena_or_bioproject(accession: str, limit: int = 20) dict[str, Any][source]¶
Get PMIDs for ENA or BioProject accessions such as PRJEB or PRJNA.
- pysradb.mcp_server.get_pmids_for_gse(gse: str, detailed: bool = False, limit: int = 20) dict[str, Any][source]¶
Get PMIDs for GSE accessions.
- pysradb.mcp_server.get_pmids_for_sra_accession(accession: str, detailed: bool = False, limit: int = 20) dict[str, Any][source]¶
Get PMIDs for SRP, SRR, SRX, SRS, or other SRA accessions.
- pysradb.mcp_server.get_publication_info(ids: str, detailed: bool = False, skip_journal_metrics: bool = False, limit: int = 20) dict[str, Any][source]¶
Get publication metadata and journal metrics for PMIDs, PMCIDs, or DOIs.
- pysradb.mcp_server.get_publication_metadata(pmids: str, limit: int = 20) dict[str, Any][source]¶
Fetch title, journal, DOI, date, authors, ISSN, and citation counts for PMIDs.
- pysradb.mcp_server.get_sra_metadata(srp: str, detailed: bool = False, include_sample_attributes: bool = False, expand_sample_attributes: bool = False, include_pmids: bool = False, limit: int = 20) dict[str, Any][source]¶
Fetch SRA metadata for SRP, SRR, SRX, SRS, GSM, or related accessions.
- pysradb.mcp_server.get_srp_from_doi(doi: str, limit: int = 20) dict[str, Any][source]¶
Get SRP accessions from DOIs.
- pysradb.mcp_server.get_srp_from_pmid(pmid: str, limit: int = 20) dict[str, Any][source]¶
Get SRP accessions from PMIDs.
- pysradb.mcp_server.list_capabilities() dict[str, Any][source]¶
List the MCP tools and intentionally omitted side-effecting workflows.
- pysradb.mcp_server.map_publication_identifiers(identifier: str, target: str = 'identifiers', detailed: bool = False, limit: int = 20) dict[str, Any][source]¶
Map SRP, GSE, PMID, PMC, or DOI identifiers to publications or datasets.
pysradb.search module¶
This file contains the search classes for the search feature.
- class pysradb.search.EnaSearch(verbosity=2, return_max=20, query=None, accession=None, organism=None, layout=None, mbases=None, publication_date=None, platform=None, selection=None, source=None, strategy=None, title=None, suppress_validation=False)[source]¶
Bases:
QuerySearchSubclass of QuerySearch that implements search via querying ENA API
Methods
search()
sends the user query via requests to ENA API and stores search result as an instance attribute in the form of a pandas dataframe
show_result_statistics()
Shows summary information about search results.
visualise_results()
Generate graphs that visualise the search results.
get_plot_objects():
Get the plot objects for plots generated.
_format_query_string()
formats the input user query into a string
_format_request()
formats the request payload
_format_result(content)
formats the search query output and converts it into a pandas dataframe
See also
QuerySearchSuperclass of EnaSearch
- class pysradb.search.GeoSearch(verbosity=2, return_max=20, query=None, accession=None, organism=None, layout=None, mbases=None, publication_date=None, platform=None, selection=None, source=None, strategy=None, title=None, geo_query=None, geo_dataset_type=None, geo_entry_type=None, suppress_validation=False)[source]¶
Bases:
SraSearchSubclass of SraSearch that can query both GEO DataSets and SRA API.
Methods
search()
sends the user query via requests to SRA, GEO DataSets, or both depending on the search query. If query is sent to both APIs, the intersection of the two sets of query results are returned.
show_result_statistics()
Shows summary information about search results.
visualise_results()
Generate graphs that visualise the search results.
get_plot_objects():
Get the plot objects for plots generated.
_format_geo_query_string()
formats the GEO DataSets portion of the input user query into a string.
_format_geo_request()
formats the GEO DataSets request payload
_format_result(content)
formats the search query output and converts it into a pandas dataframe
See also
GeoSearch.infoGeoSearch usage details
SraSearchSuperclass of GeoSearch
QuerySearchSuperclass of SraSearch
- class pysradb.search.QuerySearch(verbosity=2, return_max=20, query=None, accession=None, organism=None, layout=None, mbases=None, publication_date=None, platform=None, selection=None, source=None, strategy=None, title=None, suppress_validation=False)[source]¶
Bases:
objectThis is the base class for the search feature.
This class takes as input the user’s search query, which has been tokenized by the ArgParser. The query will be sent to either SRA or ENA depending on the user’s input, and the results will be returned as a pandas dataframe.
- Parameters:
- verbosityinteger
The level of details of the search result.
- return_maxint
The maximum number of entries to be returned.
- querystr
The main query string.
- accessionstr
A relevant study / experiment / sample / run accession number.
- organismstr
Scientific name of the sample organism
- layoutstr
Library layout. Possible inputs: single, paired
- mbasesint
Size of the sample of interest rounded to the nearest megabase.
- publication_datestr
The publication date of the run in the format dd-mm-yyyy. If a date range is desired, input should be in the format of dd-mm-yyyy:dd-mm-yyyy
- platformstr
Sequencing platform used for the run. Some possible inputs include: illumina, ion torrent, oxford nanopore
- selectionstr
Library selection. Some possible inputs: cdna, chip, dnase, pcr
- sourcestr
Library source. Some possible inputs: genomic, metagenomic, transcriptomic
- strategystr
Library Preparation strategy. Some possible inputs: wgs, amplicon, rna seq
- titlestr
Title of the experiment associated with the run
- suppress_validation: bool
Defaults to False. If this is set to True, the user input format checks will be skipped. Setting this to True may cause the program to behave in unexpected ways, but allows the user to search queries that does not pass the format check.
- Attributes:
- self.df: Pandas DataFrame
The search result belonging to this search instance
Methods
get_df()
Returns the dataframe storing this search result.
search()
Executes the search.
show_result_statistics()
Shows summary information about search results.
visualise_results()
Generate graphs that visualise the search results.
get_plot_objects():
Get the plot objects for plots generated.
- visualise_results(graph_types=('all',), show=False, saveto='./search_plots/')[source]¶
Generate graphs that visualise the search results.
This method will only work if the optional dependency, matplotlib, is installed in the system.
- Parameters:
- graph_typestuple
tuple containing strings representing types of graphs to generate. Possible strings: all, daterange, organism, source, selection, platform, basecount
- savetostr
directory name where the generated graphs are saved.
- showbool
Whether plotted graphs are immediately shown.
- class pysradb.search.SraSearch(verbosity=2, return_max=20, query=None, accession=None, organism=None, layout=None, mbases=None, publication_date=None, platform=None, selection=None, source=None, strategy=None, title=None, suppress_validation=False, progress_disabled=False)[source]¶
Bases:
QuerySearchSubclass of QuerySearch that implements search by querying NCBI Entrez API
Methods
search()
sends the user query via requests to NCBI Entrez API and returns search results as a pandas dataframe.
show_result_statistics()
Shows summary information about search results.
visualise_results()
Generate graphs that visualise the search results.
get_plot_objects():
Get the plot objects for plots generated.
get_uids():
Get NCBI uids retrieved during this search query.
_format_query_string()
formats the input user query into a string
_format_request()
formats the request payload
_format_result(content)
formats the search query output.
See also
QuerySearchSuperclass of SraSearch
pysradb.sraweb module¶
Utilities to interact with SRA online
- exception pysradb.sraweb.OpenAlexError[source]¶
Bases:
RuntimeErrorRaised when OpenAlex API returns a non-transient error (rate limit, auth, etc.).
- class pysradb.sraweb.SRAweb(api_key=None, openalex_api_key=None, openalex_email=None, verbose=True)[source]¶
Bases:
object- ae_to_pmid(ae_accessions)[source]¶
Get PMIDs for ArrayExpress accessions by searching Europe PMC
- Parameters:
- ae_accessions: list or str
ArrayExpress accession(s) (e.g. E-MTAB-, E-GEOD-)
- Returns:
- ae_pmid_df: pandas.DataFrame
DataFrame with columns [ae_accession, pmid]
- bioproject_to_srp(bioproject)[source]¶
Convert PRJNA BioProject ID to SRP accession
- Parameters:
- bioproject: str
BioProject ID (e.g., ‘PRJNA810439’)
- Returns:
- srp_accessions: list
List of SRP accessions found
- doi_to_gse(dois)[source]¶
Get GSE identifiers from DOI(s)
- Parameters:
- dois: list or str
DOI(s)
- Returns:
- results_df: pandas.DataFrame
DataFrame with DOIs and GSE identifiers
- doi_to_identifiers(dois)[source]¶
Extract database identifiers from articles via DOI
- Parameters:
- dois: list or str
DOI(s)
- Returns:
- results_df: pandas.DataFrame
DataFrame with DOIs, PMIDs, PMC IDs, and extracted identifiers
- doi_to_pmid(dois)[source]¶
Convert DOI(s) to PMID(s)
- Parameters:
- dois: list or str
DOI(s)
- Returns:
- doi_pmid_mapping: dict
Mapping of DOI to PMID
- doi_to_srp(dois)[source]¶
Get SRP identifiers from DOI(s)
- Parameters:
- dois: list or str
DOI(s)
- Returns:
- results_df: pandas.DataFrame
DataFrame with DOIs and SRP identifiers
- download(df, out_dir=None, filter_by_srx=None, skip_confirmation=False, use_ascp=False, url_col='public_url', threads=1)[source]¶
Download files described by a detailed SRA metadata table.
- Parameters:
- dfpandas.DataFrame
Detailed metadata with run accessions and download URL columns.
- out_dirstr, optional
Root output directory. Defaults to
pysradb_downloadsin the current working directory.- filter_by_srxlist, optional
Experiment accessions to keep before downloading.
- skip_confirmationbool
If
True, start downloads without prompting.- use_ascpbool
Reserved for compatibility. Aspera downloads are no longer handled by this method.
- url_colstr
Preferred URL column. Falls back to common detailed metadata URL columns when the requested column is unavailable.
- threadsint
Number of concurrent download workers.
- ena_to_pmid(ena_accessions)[source]¶
Get PMIDs for ENA/BioProject accessions via NCBI elink
Uses BioProject → PubMed linkage in NCBI for PRJNA/PRJEB/PRJD accessions, with Europe PMC as fallback.
- Parameters:
- ena_accessions: list or str
ENA study accession(s) (e.g. PRJEB*, PRJNA*, PRJD*)
- Returns:
- ena_pmid_df: pandas.DataFrame
DataFrame with columns [ena_accession, pmid]
- extract_external_sources(metadata_df)[source]¶
Extract external source identifiers from SRA metadata
- Parameters:
- metadata_df: pandas.DataFrame
DataFrame containing SRA metadata
- Returns:
- external_sources: list
List of external source identifiers found
- extract_identifiers_from_text(text)[source]¶
Extract GSE, PRJNA, SRP, and other identifiers from text
- Parameters:
- text: str
Text to search for identifiers
- Returns:
- identifiers: dict
Dictionary with lists of found identifiers by type
- fetch_bioproject_pmids(bioprojects)[source]¶
Fetch PMIDs for given BioProject accessions
- Parameters:
- bioprojects: list or str
BioProject accession(s)
- Returns:
- bioproject_pmids: dict
Mapping of BioProject to list of PMIDs
- fetch_ena_fastq(srp)[source]¶
Fetch FASTQ records from ENA (EXPERIMENTAL)
- Parameters:
- srp: string
Srudy accession
- Returns:
- srr_url: list
List of SRR fastq urls
- fetch_gsm_soft(gsm_ids)[source]¶
Fetch detailed GSM metadata in SOFT format.
- Args:
gsm_ids: List of GSM accessions
- Returns:
Dictionary mapping GSM accession to parsed SOFT metadata
- fetch_journal_metrics(journal_df)[source]¶
Add OpenAlex journal quality metrics to a DataFrame containing a
journalcolumn.- Parameters:
- journal_df: DataFrame with ``journal`` (required) and ``issn`` (optional) columns.
- Returns:
- Same DataFrame with added columns: journal_h_index, journal_i10_index,
- journal_2yr_mean_citedness, journal_cited_by_count, journal_works_count.
- fetch_pmc_fulltext(pmc_id)[source]¶
Fetch full text from PMC article
- Parameters:
- pmc_id: str
PMC ID (can be with or without ‘PMC’ prefix)
- Returns:
- fulltext: str
Full text of the article, or None if unavailable
- fetch_pmid_metadata(pmids)[source]¶
Fetch publication metadata for PMIDs.
- Parameters:
- pmids: list or str
PMID(s)
- Returns:
- DataFrame with columns: pmid, title, journal, doi, pub_date, authors,
issn, citation_count
- static format_xml(string)[source]¶
Create a fake root to make ‘string’ a valid xml
- Parameters:
- stringstr
- Returns:
- str
- geo_metadata(gse, sample_attribute=False, detailed=False, expand_sample_attributes=False, include_pmids=False, enrich=False, enrich_backend='ollama/granite4:3b', embedding_model='abhinand/MedEmbed-large-v0.1', **kwargs)[source]¶
- gse_to_pmid(gse_accessions, detailed=False)[source]¶
Get PMIDs for GSE accessions by searching PubMed Central
- Parameters:
- gse_accessions: list or str
GSE accession(s)
- detailed: bool
If True, include publication metadata (title, journal, doi, pub_date, authors)
- Returns:
- gse_pmid_df: pandas.DataFrame
DataFrame with GSE accessions and associated PMIDs
- metadata(accession, **kwargs)[source]¶
Unified method to fetch metadata for SRA or GEO accessions.
Automatically detects accession type and calls appropriate method.
- Parameters:
- accessionstr or list
SRP/GSEaccession(s).- **kwargs
Additional parameters passed to
sra_metadata()orgeo_metadata(). Examples includedetailed,enrich,enrich_backend, andsample_attribute.
- Returns:
- pandas.DataFrame
Metadata table, enriched if
enrich=True.
Examples
>>> client = SRAweb() >>> df = client.metadata("GSE286254", detailed=True, enrich=True) >>> df = client.metadata("SRP253951", detailed=True, enrich=True) >>> df = client.metadata(["GSE286254", "GSE147507"], enrich=True)
- pmc_to_identifiers(pmc_ids, convert_missing=True)[source]¶
Extract database identifiers from PMC articles
- Parameters:
- pmc_ids: list or str
PMC ID(s) (can be with or without ‘PMC’ prefix)
- convert_missing: bool
If True, automatically convert GSE↔SRP when one is found but not the other Default: True
- Returns:
- results_df: pandas.DataFrame
DataFrame with PMC IDs and extracted identifiers
- pmid_info(ids, detailed=False, skip_journal_metrics=False)[source]¶
Get publication metadata and journal metrics for PMIDs, PMCIDs, or DOIs.
- Parameters:
- ids: list or str
PMID(s), PMCID(s) (e.g. PMC4589343), or DOI(s)
- detailed: bool
If True, also look up associated GEO/SRA datasets.
- skip_journal_metrics: bool
If True, skip the OpenAlex journal metrics lookup (h-index, i10-index, etc.). Citation counts are still fetched per-PMID. Useful for large batch runs since citation lookups are free singleton requests while journal metrics use paid list/search requests on the OpenAlex API.
- Returns:
- DataFrame with publication metadata, journal metrics, citation count,
- and (when detailed) associated GEO/SRA accessions.
- pmid_to_gse(pmids)[source]¶
Get GSE identifiers from PMID(s)
- Parameters:
- pmids: list or str
PMID(s)
- Returns:
- results_df: pandas.DataFrame
DataFrame with PMIDs and GSE identifiers
- pmid_to_identifiers(pmids)[source]¶
Extract database identifiers from PubMed articles via PMC
- Parameters:
- pmids: list or str
PMID(s)
- Returns:
- results_df: pandas.DataFrame
DataFrame with PMIDs, PMC IDs, and extracted identifiers
- pmid_to_pmc(pmids)[source]¶
Convert PMID(s) to PMC ID(s)
- Parameters:
- pmids: list or str
PMID(s)
- Returns:
- pmid_pmc_mapping: dict
Mapping of PMID to PMC ID
- pmid_to_srp(pmids)[source]¶
Get SRP identifiers from PMID(s)
- Parameters:
- pmids: list or str
PMID(s)
- Returns:
- results_df: pandas.DataFrame
DataFrame with PMIDs and SRP identifiers
- search_pmc_for_external_sources(external_sources)[source]¶
Search PubMed Central for PMIDs using external source identifiers
- Parameters:
- external_sources: list
List of external source identifiers
- Returns:
- pmids: list
List of PMIDs found
- sra_metadata(srp, sample_attribute=False, detailed=False, expand_sample_attributes=False, output_read_lengths=False, include_pmids=False, enrich=False, enrich_backend='ollama/granite4:3b', embedding_model='abhinand/MedEmbed-large-v0.1', **kwargs)[source]¶
- sra_to_pmid(sra_accessions)[source]¶
Get PMIDs for SRA accessions (backward compatibility wrapper)
- Parameters:
- sra_accessions: list or str
SRA accession(s) - can be SRP, SRR, SRX, or SRS
- Returns:
- sra_pmid_df: pandas.DataFrame
DataFrame with SRA accessions and associated PMIDs
- srp_to_pmid(srp_accessions, detailed=False)[source]¶
Get PMIDs associated with SRP accessions
- Parameters:
- srp_accessions: list or str
SRP accession(s)
- detailed: bool
If True, include publication metadata (title, journal, doi, pub_date, authors)
- Returns:
- srp_pmid_df: pandas.DataFrame
DataFrame with SRP accessions and associated PMIDs
pysradb.taxid2name module¶
pysradb.utils module¶
- class pysradb.utils.TqdmUpTo(*_, **__)[source]¶
Bases:
tqdmAlternative Class-based version of the above. Provides update_to(n) which uses tqdm.update(delta_n). Inspired by [twine#242](https://github.com/pypa/twine/pull/242), [here](https://github.com/pypa/twine/commit/42e55e06).
Credits: https://github.com/tqdm/tqdm/blob/69326b718905816bb827e0e66c5508c9c04bc06c/examples/tqdm_wget.py
- pysradb.utils.confirm(preceeding_text)[source]¶
Confirm user input.
- Parameters:
- preceeding_text: str
Text to print
- Returns:
- response: bool
- pysradb.utils.copyfileobj(fsrc, fdst, bufsize=16384, filesize=None, desc='')[source]¶
Copy file object with a progress bar.
- Parameters:
- fsrc: filehandle
Input file handle
- fdst: filehandle
Output file handle
- bufsize: int
Length of output buffer
- filesize: int
Input file file size
- desc: string
Description for tqdm status
- pysradb.utils.get_gzip_uncompressed_size(filepath)[source]¶
Get uncompressed size of a .gz file
- Parameters:
- filepath: string
Path to input file
- Returns:
- filesize: int
Uncompressed file size
- pysradb.utils.mkdir_p(path)[source]¶
Python version mkdir -p
- Parameters:
- pathstring
Path to directory to create
- pysradb.utils.order_dataframe(df, columns)[source]¶
Order a dataframe
Order a dataframe by moving the columns in the front
- Parameters:
- df: Dataframe
Dataframe
- columns: list
List of columns that need to be put in front
- pysradb.utils.path_leaf(path)[source]¶
Get path’s tail from a filepath.
- Parameters:
- path: string
Filepath
- Returns:
- tail: string
Filename
- pysradb.utils.requests_3_retries()[source]¶
Generates a requests session object that allows 3 retries.
- Returns:
- session: requests.Session
requests session object that allows 3 retries for server-side errors.
Module contents¶
Top-level package for pysradb.