Command line interface for riboraptor
riboraptor.coherence.
coherence
(original_values, frames=[0])[source]¶Calculate coherence and an idea ribo-seq signal
Parameters: |
|
---|---|
Returns: |
|
riboraptor.coherence.
coherence_ft
(values, nperseg=30, noverlap=15, window='flattop')[source]¶Calculate coherence and an idea ribo-seq signal based on Fourier transform
Parameters: |
|
---|---|
Returns: |
|
riboraptor.coherence.
get_periodicity
(values, input_is_stream=False)[source]¶Calculate periodicty wrt 1-0-0 signal.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.coherence.
naive_periodicity
(values, identify_peak=False)[source]¶Calculate periodicity in a naive manner
Take ratio of frame1 over avg(frame2+frame3) counts. By default the first value is treated as the first frame as well
Parameters: |
|
---|---|
Returns: |
|
Utilities for read counting operations.
riboraptor.count.
OrderedCounter
(**kwds)[source]¶Bases: collections.Counter
, collections.OrderedDict
Counter that remembers the order elements are first encountered
riboraptor.count.
bam_to_bedgraph
(bam, strand='both', end_type='5prime', saveto=None)[source]¶Create bigwig from bam.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.count.
bedgraph_to_bigwig
(bedgraph, sizes, saveto, input_is_stream=False)[source]¶Convert bedgraph to bigwig.
Parameters: |
|
---|
riboraptor.count.
count_uniq_mapping_reads
(bam)[source]¶Count number of mapped reads.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.count.
export_gene_coverages
(bed, bw, saveto, offset_5p=0, offset_3p=0)[source]¶Export all gene coverages.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.count.
export_metagene_coverage
(bed, bw, max_positions=None, saveto=None, offset_5p=0, offset_3p=0, orientation='5prime', n_jobs=16)[source]¶Export metagene coverage.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.count.
export_read_counts
(gene_coverages, saveto, keep_offsets=True)[source]¶export read counts from gene coverages file.
Parameters: |
|
---|
riboraptor.count.
export_read_length
(bam, saveto=None)[source]¶Count read lengths.
Parameters: |
|
---|
riboraptor.count.
extract_uniq_mapping_reads
(inbam, outbam)[source]¶Extract only uniquely mapping reads from a bam.
Parameters: |
|
---|
riboraptor.count.
gene_coverage
(gene_group, bw, offset_5p=0, offset_3p=0)[source]¶Get gene coverage.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.count.
get_bam_coverage
(bam, bed, outprefix=None)[source]¶Get coverage from bam
Parameters: |
|
---|---|
Returns: |
|
riboraptor.count.
get_bam_coverage_on_bed
(bam, bed, protocol='forward', orientation='5prime', max_positions=1000, offset=60, saveto=None)[source]¶Get bam coverage over start_codon/stop_codon coordinates
riboraptor.count.
get_region_sizes
(bed)[source]¶Get collapsed lengths of gene in bed.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.count.
mapping_reads_summary
(bam, saveto=None)[source]¶Count number of mapped reads.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.count.
merge_gene_coverages
(gene_coverages, max_positions=None, saveto=None)[source]¶merge gene coverages to generate metagene coverage.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.count.
merge_read_counts
(read_counts, saveto)[source]¶merge read counts tsv files to one count table
Parameters: |
|
---|
riboraptor.count.
read_enrichment
(read_lengths, min_length=28, max_length=32)[source]¶Calculate read enrichment for a certain range of lengths
Parameters: |
|
---|---|
Returns: |
|
Utilities to download data from NCBI SRA
riboraptor.download.
run_download_sra_script
(download_root_location=None, ascp_key_path=None, srp_id_file=None, srp_id_list=None)[source]¶Download data from SRA.
Parameters: |
|
---|
riboraptor.dtw.
dtw
(X, Y, metric='euclidean', ddtw=False, ddtw_order=1)[source]¶Parameters: |
|
---|
riboraptor.dtw.
get_path
(D)[source]¶Traceback path of minimum cost
Given accumulated cost matrix D, trace back the minimum cost path
Parameters: |
|
---|---|
Returns: |
|
riboraptor.fasta.
FastaReader
(fasta_location)[source]¶Bases: object
Class for reading and querying fasta file.
chromosomes
¶Return list of chromsome and their sizes as in the fasta file.
Returns: |
|
---|
query
(intervals)[source]¶Query regions for sequence.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.hdf_parser.
HDFParser
(filepath)[source]¶Bases: object
riboraptor.hdf_parser.
create_metagene_from_multi_bigwig
(bed, bigwigs, max_positions=1000, offset_5p=0, offset_3p=0, n_jobs=16, saveto=None)[source]¶Collapse multiple bigwigs to get bigwig.
Test case -> we should be able to get the same coverage file as from export_metagene coverage if everything above this works okay.
riboraptor.hdf_parser.
hdf_to_bigwig
(hdf, prefixdir, read_lengths_to_use='all', output_normalized=True)[source]¶Create fragment and strand specific bigwigs from hdf
Parameters: |
|
---|
riboraptor.hdf_parser.
merge_bigwigs
(bigwigs, chrom_sizes, saveto, scale=False)[source]¶Merge multiple bigwigs into one.
Note: This seems impossible doing it through pybigWig way and hence the dependency on ucsc-bigWigMerge
Parameters: |
|
---|
riboraptor.hdf_parser.
normalize_bw_hdf
(bw, hdf, read_length, outbw)[source]¶Normalize a fragment specific bigwig to RPM for that fragment length
Parameters: |
|
---|
riboraptor.hdf_parser.
tsv_to_bigwig
(df, chrom_lengths, prefix)[source]¶Convert tsv (created by bam-coverage) to bigwig
We create multiple bigwigs, each separately for fragment length, and the strand.
This will output the following files, where N represents the fragment length
N.5prime.pos.bw N.3prime.pos.bw
N.5prime.neg.bw N.3prime.neg.bw
Parameters: |
|
---|
All functions that are not so useful, but still useful.
riboraptor.helpers.
bwshift
(bw, shift_by, out_bw, chunk_size=20000)[source]¶Given a bigwig shift all the values by this Shifting by 10: variableStep chrom=chr span=1 1 1 2 2 3 5 4 6 5 5 6 3 7 3 8 5 9 5 10 5 11 6 12 6 13 0 14 2 15 3 16 3 17 10 18 4 19 4 20 2 21 2 22 2 23 1
shifted by 10 variableStep chrom=chr span=1 1 6 2 6 3 0 4 2 5 3 6 3 7 10 8 4 9 4 10 2 11 2 12 2 13 1
riboraptor.helpers.
check_file_exists
(filepath)[source]¶Check if file exists.
Parameters: |
|
---|
riboraptor.helpers.
codon_to_anticodon
(codon)[source]¶Codon to anticodon.
Parameters: |
|
---|
riboraptor.helpers.
complementary_strand
(strand)[source]¶Get complementary strand
Parameters: |
|
---|---|
Returns: |
|
riboraptor.helpers.
counts_to_tpm
(counts, sizes)[source]¶Counts to TPM
Parameters: |
|
---|
riboraptor.helpers.
create_bam_index
(bam)[source]¶Create bam index.
Parameters: |
|
---|
riboraptor.helpers.
create_ideal_periodic_signal
(signal_length)[source]¶Create ideal ribo-seq signal.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.helpers.
featurecounts_to_tpm
(fc_f, outfile)[source]¶Convert htseq-counts file to tpm
Parameters: |
|
---|
riboraptor.helpers.
find_first_non_none
(positions)[source]¶Given a list of positions, find the index and value of first non-none element.
This method is specifically designed for pysam, which has a weird way of returning the reference positions. If they are mismatched/softmasked it returns None when fetched using get_reference_positions.
query_alignment_start and query_alignment_end give you indexes of position in the read which technically align, but are not softmasked i.e. it is set to None even if the position does not align
Parameters: |
|
---|
riboraptor.helpers.
find_last_non_none
(positions)[source]¶Given a list of positions, find the index and value of last non-none element.
This function is similar to the find_first_non_none function, but does it for the reversed list. It is specifically useful for reverse strand cases
Parameters: |
|
---|
riboraptor.helpers.
get_region_sizes
(region_bed)[source]¶Get summed up size of a CDS/UTR region from bed file
Parameters: |
|
---|---|
Returns: |
|
riboraptor.helpers.
get_strandedness
(filepath)[source]¶Parse output of infer_experiment.py from RSeqC to get strandedness.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.helpers.
htseq_to_tpm
(htseq_f, outfile, cds_bed_f)[source]¶Convert htseq-counts file to tpm
Parameters: |
|
---|
riboraptor.helpers.
identify_peaks
(coverage)[source]¶Given coverage array, find the site of maximum density
riboraptor.helpers.
is_read_uniq_mapping
(read)[source]¶Check if read is uniquely mappable.
Parameters: |
|
---|
riboraptor.helpers.
list_to_ranges
(list_of_int)[source]¶Convert a list to a list of range object
Parameters: |
|
---|---|
Returns: |
|
riboraptor.helpers.
merge_intervals
(intervals, chromosome_lengths=None, offset_5p=0, offset_3p=0, zero_based=True)[source]¶Collapse intervals into non overlapping manner
Parameters: |
|
---|---|
Returns: |
|
riboraptor.helpers.
millify
(n)[source]¶Convert integer to human readable format.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.helpers.
order_dataframe
(df, columns)[source]¶Order a dataframe
Order a dataframe by moving the columns in the front
Parameters: |
|
---|
riboraptor.helpers.
pad_five_prime_or_truncate
(some_list, offset_5p, target_len)[source]¶Pad first the 5prime end and then the 3prime end or truncate
Parameters: |
|
---|
riboraptor.helpers.
pad_or_truncate
(some_list, target_len)[source]¶Pad or truncate a list upto given target length
Parameters: |
|
---|
riboraptor.helpers.
parse_star_logs
(infile, outfile=None)[source]¶Parse star logs into a dict
Parameters: |
|
---|---|
Returns: |
|
riboraptor.helpers.
r2
(x, y)[source]¶Calculate pearson correlation between two vectors.
Parameters: |
|
---|
riboraptor.helpers.
read_bed_as_intervaltree
(filepath)[source]¶Read bed as interval tree
Useful for reading start/stop codon beds
Parameters: |
|
---|---|
Returns: |
|
riboraptor.helpers.
read_chrom_sizes
(filepath)[source]¶Read chr.sizes file sorted by chromosome name
Parameters: |
|
---|---|
Returns: |
|
riboraptor.helpers.
read_enrichment
(read_lengths, enrichment_range=range(28, 33), input_is_stream=False, input_is_file=True)[source]¶Calculate read enrichment for a certain range of lengths
Parameters: |
|
---|---|
Returns: |
|
riboraptor.helpers.
read_htseq
(htseq_f)[source]¶Read HTSeq file.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.helpers.
read_refseq_bed
(filepath)[source]¶Read refseq bed12 from UCSC.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.helpers.
round_to_nearest
(x, base=5)[source]¶Round to nearest base.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.helpers.
scale_bigwig
(inbigwig, chrom_sizes, outbigwig, scale_factor=1)[source]¶Scale a bigwig by certain factor.
Parameters: |
|
---|
riboraptor.helpers.
set_xrotation
(ax, degrees)[source]¶Rotate labels on x-axis.
Parameters: |
|
---|
riboraptor.helpers.
summarize_counters
(samplewise_dict)[source]¶Summarize gene counts for a collection of samples.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.helpers.
summary_stats_two_arrays_welch
(old_mean_array, new_array, old_var_array=None, old_n_counter=None, carried_forward_observations=None)[source]¶Average two arrays using welch’s method
Parameters: |
|
---|---|
Returns: |
|
riboraptor.infer_protocol.
infer_protocol
(bam, bed, n_reads=500000, drop_probability=0.2)[source]¶Infer strandedness protocol given a bam file
Parameters: |
|
---|---|
Returns: |
|
riboraptor.kmer.
fastq_kmer_histogram
(fastq_file, kmer_length=range(5, 31), five_prime=False, max_seq=1000000, offset=0, drop_probability=0)[source]¶Get a histogram of kmers from a fastq file
Parameters: |
|
---|
Plotting methods.
riboraptor.plotting.
plot_featurewise_barplot
(utr5_counts, cds_counts, utr3_counts, ax=None, saveto=None, **kwargs)[source]¶Plot barplots for 5’UTR/CDS/3’UTR counts.
Parameters: |
|
---|
riboraptor.plotting.
plot_framewise_counts
(counts, frames_to_plot='all', ax=None, title=None, millify_labels=False, position_range=None, saveto=None, ascii=False, input_is_stream=False, **kwargs)[source]¶Plot framewise distribution of reads.
Parameters: |
|
---|
riboraptor.plotting.
plot_periodicity_df
(df, saveto, cbar=False, figsize=(8, 8))[source]¶Plot periodicty values across fragment lengths as a matrix.
Parameters: |
|
---|
riboraptor.plotting.
plot_read_counts
(counts, ax=None, marker=None, color='royalblue', title=None, label=None, millify_labels=False, identify_peak=True, saveto=None, position_range=None, ascii=False, input_is_stream=False, ylabel='Normalized RPF density', **kwargs)[source]¶Plot RPF density aro und start/stop codons.
Parameters: |
|
---|
riboraptor.plotting.
plot_read_length_dist
(read_lengths, ax=None, millify_labels=True, input_is_stream=False, title=None, saveto=None, ascii=False, **kwargs)[source]¶Plot read length distribution.
Parameters: |
|
---|
riboraptor.plotting.
setup_axis
(ax, axis='x', majorticks=5, minorticks=1, xrotation=45, yrotation=0)[source]¶Setup axes defaults
Parameters: |
|
---|
Utilities for extracting sequence from fasta.
riboraptor.sequence.
export_gene_sequences
(bed, fasta, saveto=None, offset_5p=0, offset_3p=0)[source]¶Export all gene sequences.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.sequence.
gene_sequence
(gene_group, fasta, offset_5p=0, offset_3p=0)[source]¶Extract seq genewise given coordinates in bed file
Parameters: |
|
---|---|
Returns: |
|
Helper functions for parsing SRAmetadb.sqlite file
riboraptor.statistics.
KDE
(values)[source]¶Perform Univariate Kernel Density Estimation.
Wrapper utility around statsmodels for quick KDE TODO: scikit-learn has a faster implementation (?)
Parameters: |
|
---|---|
Returns: |
|
riboraptor.statistics.
KS_test
(a, b)[source]¶Perform KS test between a and b values
Parameters: |
|
---|---|
Returns: |
|
riboraptor.statistics.
calculate_cdf
(data)[source]¶Calculate CDF given data points
Parameters: |
|
---|---|
Returns: |
|
riboraptor.tracks.
get_bigwigtrack_text
(track_name, parent, big_data_url, negate_values)[source]¶Create bigwig track text
riboraptor.tracks.
get_multiwigtrack_text
(track_name, parent)[source]¶Create a multiWig track.
Example track myMultiWig container multiWig aggregate transparentOverlay showSubtrackColorOnUi on type bigWig 0 1000 viewLimits 0:10 maxHeighPixels 100:32:8
track myFirstOverlaySig parent myMultiWig color 255,128,128 type bigWig 0 1139
track myFirstBigWig parent myMultiWig color 120,235,204
riboraptor.utils.
copy_sra_data
(df, sra_source_dir='/staging/as/skchoudh/SRA_datasets/', sra_dest_dir='/staging/as/skchoudh/re-ribo-datasets/')[source]¶Copy SRA data to a new location retaining only single ended samples.
riboraptor.wig.
WigReader
(wig_location)[source]¶Bases: object
Class for reading and querying wigfiles.
chromosomes
¶Return list of chromsome and their sizes as in the wig file.
Returns: |
|
---|
query
(intervals)[source]¶Query regions for scores.
Parameters: |
|
---|---|
Returns: |
|