API reference

The package consists of just one module.

dbotu module

class dbotu.DBCaller(seq_table, records, max_dist, min_fold, threshold_pval, log=None, debug=None)[source]

Bases: object

Object for processing the sequence table and distance matrix into an OTU table.

ga_matches(candidate)[source]

OTUs that meet the genetic and abundance criteria

candidate: OTU
sequence to evaluate

returns: nothing

otu_table()[source]

Generate OTU table.

returns: pandas.DataFrame

run()[source]

Process all the input sequences in order of their abundance.

returns: nothing

write_membership(output)[source]

Write the QIIME-style OTU mapping information to a file.

output: filehandle

returns: nothing

write_otu_table(output)[source]

Write the QIIME-style OTU table to a file.

output: filehandle

returns: nothing

class dbotu.OTU(name, sequence, counts)[source]

Bases: object

Object for keeping track of an OTU’s distribution and computing genetic distances

absorb(other)[source]

Add another OTU’s counts to this one

other: OTU

returns: nothing

distance_to(other)[source]

Length-adjusted Levenshtein “distance” to other OTU

other: OTU
distance to this OTU

returns: float

distribution_pval(other)[source]

P-value from the likelihood ratio test comparing the distribution of the abundances of two OTU objects. See docs for explanation of the test.

other: OTU

returns: float

dbotu.call_otus(seq_table_fh, fasta_fn, output_fh, gen_crit, abund_crit, pval_crit, log=None, membership=None, debug=None)[source]

Read in input files, call OTUs, and return output.

seq_table_fh: filehandle
sequence count table, tab-separated
fasta_fn: str
sequences fasta filename
output_fh: filehandle
place to write main output OTU table
gen_crit, abund_crit, pval_crit: float
threshold values for genetic criterion, abundance criterion, and distribution criterion (pvalue)
log, membership, debug: filehandles
places to write supplementary output
dbotu.read_sequence_table(fn)[source]

Read in a table of sequences. The table must be tab-separated with exactly one header line of a field naming the sequences (e.g., “OTU”, “OTU_ID”, “seq”, etc.) followed by tab-separated sample names. Sequence names are the first field of the following rows. The cells in the table are the counts of that sequence in that sample.

fn: filename (or handle)

returns: pandas.DataFrame