Command line interface for riboraptor
riboraptor.coherence.coherence(original_values, frames=[0])[source]¶Calculate coherence and an idea ribo-seq signal
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.coherence.coherence_ft(values, nperseg=30, noverlap=15, window='flattop')[source]¶Calculate coherence and an idea ribo-seq signal based on Fourier transform
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.coherence.get_periodicity(values, input_is_stream=False)[source]¶Calculate periodicty wrt 1-0-0 signal.
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.coherence.naive_periodicity(values, identify_peak=False)[source]¶Calculate periodicity in a naive manner
Take ratio of frame1 over avg(frame2+frame3) counts. By default the first value is treated as the first frame as well
| Parameters: |
|
|---|---|
| Returns: |
|
Utilities for read counting operations.
riboraptor.count.OrderedCounter(**kwds)[source]¶Bases: collections.Counter, collections.OrderedDict
Counter that remembers the order elements are first encountered
riboraptor.count.bam_to_bedgraph(bam, strand='both', end_type='5prime', saveto=None)[source]¶Create bigwig from bam.
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.count.bedgraph_to_bigwig(bedgraph, sizes, saveto, input_is_stream=False)[source]¶Convert bedgraph to bigwig.
| Parameters: |
|
|---|
riboraptor.count.count_uniq_mapping_reads(bam)[source]¶Count number of mapped reads.
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.count.export_gene_coverages(bed, bw, saveto, offset_5p=0, offset_3p=0)[source]¶Export all gene coverages.
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.count.export_metagene_coverage(bed, bw, max_positions=None, saveto=None, offset_5p=0, offset_3p=0, orientation='5prime', n_jobs=16)[source]¶Export metagene coverage.
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.count.export_read_counts(gene_coverages, saveto, keep_offsets=True)[source]¶export read counts from gene coverages file.
| Parameters: |
|
|---|
riboraptor.count.export_read_length(bam, saveto=None)[source]¶Count read lengths.
| Parameters: |
|
|---|
riboraptor.count.extract_uniq_mapping_reads(inbam, outbam)[source]¶Extract only uniquely mapping reads from a bam.
| Parameters: |
|
|---|
riboraptor.count.gene_coverage(gene_group, bw, offset_5p=0, offset_3p=0)[source]¶Get gene coverage.
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.count.get_bam_coverage(bam, bed, outprefix=None)[source]¶Get coverage from bam
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.count.get_bam_coverage_on_bed(bam, bed, protocol='forward', orientation='5prime', max_positions=1000, offset=60, saveto=None)[source]¶Get bam coverage over start_codon/stop_codon coordinates
riboraptor.count.get_region_sizes(bed)[source]¶Get collapsed lengths of gene in bed.
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.count.mapping_reads_summary(bam, saveto=None)[source]¶Count number of mapped reads.
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.count.merge_gene_coverages(gene_coverages, max_positions=None, saveto=None)[source]¶merge gene coverages to generate metagene coverage.
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.count.merge_read_counts(read_counts, saveto)[source]¶merge read counts tsv files to one count table
| Parameters: |
|
|---|
riboraptor.count.read_enrichment(read_lengths, min_length=28, max_length=32)[source]¶Calculate read enrichment for a certain range of lengths
| Parameters: |
|
|---|---|
| Returns: |
|
Utilities to download data from NCBI SRA
riboraptor.download.run_download_sra_script(download_root_location=None, ascp_key_path=None, srp_id_file=None, srp_id_list=None)[source]¶Download data from SRA.
| Parameters: |
|
|---|
riboraptor.dtw.dtw(X, Y, metric='euclidean', ddtw=False, ddtw_order=1)[source]¶| Parameters: |
|
|---|
riboraptor.dtw.get_path(D)[source]¶Traceback path of minimum cost
Given accumulated cost matrix D, trace back the minimum cost path
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.fasta.FastaReader(fasta_location)[source]¶Bases: object
Class for reading and querying fasta file.
chromosomes¶Return list of chromsome and their sizes as in the fasta file.
| Returns: |
|
|---|
query(intervals)[source]¶Query regions for sequence.
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.hdf_parser.HDFParser(filepath)[source]¶Bases: object
riboraptor.hdf_parser.create_metagene_from_multi_bigwig(bed, bigwigs, max_positions=1000, offset_5p=0, offset_3p=0, n_jobs=16, saveto=None)[source]¶Collapse multiple bigwigs to get bigwig.
Test case -> we should be able to get the same coverage file as from export_metagene coverage if everything above this works okay.
riboraptor.hdf_parser.hdf_to_bigwig(hdf, prefixdir, read_lengths_to_use='all', output_normalized=True)[source]¶Create fragment and strand specific bigwigs from hdf
| Parameters: |
|
|---|
riboraptor.hdf_parser.merge_bigwigs(bigwigs, chrom_sizes, saveto, scale=False)[source]¶Merge multiple bigwigs into one.
Note: This seems impossible doing it through pybigWig way and hence the dependency on ucsc-bigWigMerge
| Parameters: |
|
|---|
riboraptor.hdf_parser.normalize_bw_hdf(bw, hdf, read_length, outbw)[source]¶Normalize a fragment specific bigwig to RPM for that fragment length
| Parameters: |
|
|---|
riboraptor.hdf_parser.tsv_to_bigwig(df, chrom_lengths, prefix)[source]¶Convert tsv (created by bam-coverage) to bigwig
We create multiple bigwigs, each separately for fragment length, and the strand.
This will output the following files, where N represents the fragment length
N.5prime.pos.bw N.3prime.pos.bw
N.5prime.neg.bw N.3prime.neg.bw
| Parameters: |
|
|---|
All functions that are not so useful, but still useful.
riboraptor.helpers.bwshift(bw, shift_by, out_bw, chunk_size=20000)[source]¶Given a bigwig shift all the values by this Shifting by 10: variableStep chrom=chr span=1 1 1 2 2 3 5 4 6 5 5 6 3 7 3 8 5 9 5 10 5 11 6 12 6 13 0 14 2 15 3 16 3 17 10 18 4 19 4 20 2 21 2 22 2 23 1
shifted by 10 variableStep chrom=chr span=1 1 6 2 6 3 0 4 2 5 3 6 3 7 10 8 4 9 4 10 2 11 2 12 2 13 1
riboraptor.helpers.check_file_exists(filepath)[source]¶Check if file exists.
| Parameters: |
|
|---|
riboraptor.helpers.codon_to_anticodon(codon)[source]¶Codon to anticodon.
| Parameters: |
|
|---|
riboraptor.helpers.complementary_strand(strand)[source]¶Get complementary strand
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.helpers.counts_to_tpm(counts, sizes)[source]¶Counts to TPM
| Parameters: |
|
|---|
riboraptor.helpers.create_bam_index(bam)[source]¶Create bam index.
| Parameters: |
|
|---|
riboraptor.helpers.create_ideal_periodic_signal(signal_length)[source]¶Create ideal ribo-seq signal.
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.helpers.featurecounts_to_tpm(fc_f, outfile)[source]¶Convert htseq-counts file to tpm
| Parameters: |
|
|---|
riboraptor.helpers.find_first_non_none(positions)[source]¶Given a list of positions, find the index and value of first non-none element.
This method is specifically designed for pysam, which has a weird way of returning the reference positions. If they are mismatched/softmasked it returns None when fetched using get_reference_positions.
query_alignment_start and query_alignment_end give you indexes of position in the read which technically align, but are not softmasked i.e. it is set to None even if the position does not align
| Parameters: |
|
|---|
riboraptor.helpers.find_last_non_none(positions)[source]¶Given a list of positions, find the index and value of last non-none element.
This function is similar to the find_first_non_none function, but does it for the reversed list. It is specifically useful for reverse strand cases
| Parameters: |
|
|---|
riboraptor.helpers.get_region_sizes(region_bed)[source]¶Get summed up size of a CDS/UTR region from bed file
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.helpers.get_strandedness(filepath)[source]¶Parse output of infer_experiment.py from RSeqC to get strandedness.
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.helpers.htseq_to_tpm(htseq_f, outfile, cds_bed_f)[source]¶Convert htseq-counts file to tpm
| Parameters: |
|
|---|
riboraptor.helpers.identify_peaks(coverage)[source]¶Given coverage array, find the site of maximum density
riboraptor.helpers.is_read_uniq_mapping(read)[source]¶Check if read is uniquely mappable.
| Parameters: |
|
|---|
riboraptor.helpers.list_to_ranges(list_of_int)[source]¶Convert a list to a list of range object
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.helpers.merge_intervals(intervals, chromosome_lengths=None, offset_5p=0, offset_3p=0, zero_based=True)[source]¶Collapse intervals into non overlapping manner
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.helpers.millify(n)[source]¶Convert integer to human readable format.
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.helpers.order_dataframe(df, columns)[source]¶Order a dataframe
Order a dataframe by moving the columns in the front
| Parameters: |
|
|---|
riboraptor.helpers.pad_five_prime_or_truncate(some_list, offset_5p, target_len)[source]¶Pad first the 5prime end and then the 3prime end or truncate
| Parameters: |
|
|---|
riboraptor.helpers.pad_or_truncate(some_list, target_len)[source]¶Pad or truncate a list upto given target length
| Parameters: |
|
|---|
riboraptor.helpers.parse_star_logs(infile, outfile=None)[source]¶Parse star logs into a dict
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.helpers.r2(x, y)[source]¶Calculate pearson correlation between two vectors.
| Parameters: |
|
|---|
riboraptor.helpers.read_bed_as_intervaltree(filepath)[source]¶Read bed as interval tree
Useful for reading start/stop codon beds
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.helpers.read_chrom_sizes(filepath)[source]¶Read chr.sizes file sorted by chromosome name
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.helpers.read_enrichment(read_lengths, enrichment_range=range(28, 33), input_is_stream=False, input_is_file=True)[source]¶Calculate read enrichment for a certain range of lengths
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.helpers.read_htseq(htseq_f)[source]¶Read HTSeq file.
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.helpers.read_refseq_bed(filepath)[source]¶Read refseq bed12 from UCSC.
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.helpers.round_to_nearest(x, base=5)[source]¶Round to nearest base.
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.helpers.scale_bigwig(inbigwig, chrom_sizes, outbigwig, scale_factor=1)[source]¶Scale a bigwig by certain factor.
| Parameters: |
|
|---|
riboraptor.helpers.set_xrotation(ax, degrees)[source]¶Rotate labels on x-axis.
| Parameters: |
|
|---|
riboraptor.helpers.summarize_counters(samplewise_dict)[source]¶Summarize gene counts for a collection of samples.
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.helpers.summary_stats_two_arrays_welch(old_mean_array, new_array, old_var_array=None, old_n_counter=None, carried_forward_observations=None)[source]¶Average two arrays using welch’s method
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.infer_protocol.infer_protocol(bam, bed, n_reads=500000, drop_probability=0.2)[source]¶Infer strandedness protocol given a bam file
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.kmer.fastq_kmer_histogram(fastq_file, kmer_length=range(5, 31), five_prime=False, max_seq=1000000, offset=0, drop_probability=0)[source]¶Get a histogram of kmers from a fastq file
| Parameters: |
|
|---|
Plotting methods.
riboraptor.plotting.plot_featurewise_barplot(utr5_counts, cds_counts, utr3_counts, ax=None, saveto=None, **kwargs)[source]¶Plot barplots for 5’UTR/CDS/3’UTR counts.
| Parameters: |
|
|---|
riboraptor.plotting.plot_framewise_counts(counts, frames_to_plot='all', ax=None, title=None, millify_labels=False, position_range=None, saveto=None, ascii=False, input_is_stream=False, **kwargs)[source]¶Plot framewise distribution of reads.
| Parameters: |
|
|---|
riboraptor.plotting.plot_periodicity_df(df, saveto, cbar=False, figsize=(8, 8))[source]¶Plot periodicty values across fragment lengths as a matrix.
| Parameters: |
|
|---|
riboraptor.plotting.plot_read_counts(counts, ax=None, marker=None, color='royalblue', title=None, label=None, millify_labels=False, identify_peak=True, saveto=None, position_range=None, ascii=False, input_is_stream=False, ylabel='Normalized RPF density', **kwargs)[source]¶Plot RPF density aro und start/stop codons.
| Parameters: |
|
|---|
riboraptor.plotting.plot_read_length_dist(read_lengths, ax=None, millify_labels=True, input_is_stream=False, title=None, saveto=None, ascii=False, **kwargs)[source]¶Plot read length distribution.
| Parameters: |
|
|---|
riboraptor.plotting.setup_axis(ax, axis='x', majorticks=5, minorticks=1, xrotation=45, yrotation=0)[source]¶Setup axes defaults
| Parameters: |
|
|---|
Utilities for extracting sequence from fasta.
riboraptor.sequence.export_gene_sequences(bed, fasta, saveto=None, offset_5p=0, offset_3p=0)[source]¶Export all gene sequences.
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.sequence.gene_sequence(gene_group, fasta, offset_5p=0, offset_3p=0)[source]¶Extract seq genewise given coordinates in bed file
| Parameters: |
|
|---|---|
| Returns: |
|
Helper functions for parsing SRAmetadb.sqlite file
riboraptor.statistics.KDE(values)[source]¶Perform Univariate Kernel Density Estimation.
Wrapper utility around statsmodels for quick KDE TODO: scikit-learn has a faster implementation (?)
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.statistics.KS_test(a, b)[source]¶Perform KS test between a and b values
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.statistics.calculate_cdf(data)[source]¶Calculate CDF given data points
| Parameters: |
|
|---|---|
| Returns: |
|
riboraptor.tracks.get_bigwigtrack_text(track_name, parent, big_data_url, negate_values)[source]¶Create bigwig track text
riboraptor.tracks.get_multiwigtrack_text(track_name, parent)[source]¶Create a multiWig track.
Example track myMultiWig container multiWig aggregate transparentOverlay showSubtrackColorOnUi on type bigWig 0 1000 viewLimits 0:10 maxHeighPixels 100:32:8
track myFirstOverlaySig parent myMultiWig color 255,128,128 type bigWig 0 1139
track myFirstBigWig parent myMultiWig color 120,235,204
riboraptor.utils.copy_sra_data(df, sra_source_dir='/staging/as/skchoudh/SRA_datasets/', sra_dest_dir='/staging/as/skchoudh/re-ribo-datasets/')[source]¶Copy SRA data to a new location retaining only single ended samples.
riboraptor.wig.WigReader(wig_location)[source]¶Bases: object
Class for reading and querying wigfiles.
chromosomes¶Return list of chromsome and their sizes as in the wig file.
| Returns: |
|
|---|
query(intervals)[source]¶Query regions for scores.
| Parameters: |
|
|---|---|
| Returns: |
|