File Formats (Input/Output)

Fasta

Reading fasta

is as easy as

r = FastaReader(file)
for header, seq in r.get_entries():
    print(header)
    print(seq)

file can either be a path or a file object.

class formats.fasta.FastaReader(file)[source]

Read fasta files into tuples (header, seq).

close()[source]

Close file handle

get_entries()[source]

Get the next Entry from the fasta file.

Returns:Generator, which yields (header, sequence) tuples

Writing fasta

w = FastaWriter(file, split=60)
header = "some_random_nucleotides"
seq = "ACTGACATT"
w.write_entry(header, seq)
w.close()

again, file can be a path of a file object. split speficies, after how many characters a sequence will be wrapped in multiple lines. Default is 80.

class formats.fasta.FastaWriter(file, split=80)[source]

Write fasta files from tuples (header, seq)

close()[source]

Close file handle

write_entry(header, sequence)[source]

Write Entry to File

Parameters:
  • header – >sequence_header (without >)
  • sequence – ACTGATT...

Fast5

Fast5 is a sequence format generated by [Oxford Nanopore](http://nanoporetech.com) devices. This class is designed to read formats up to the sequencing kit SQK-MAP006). Compatibility with more recent formats is not guaranteed.

As it serves for rather exotic use-cases, this file will most likely drop out of future versions of this framework.

Example

f5 = Fast5File('lambda_burnin_ch101_file2_strand.fast5')
seq = f5.get_seq('template')
class formats.fast5.Fast5File(path)[source]
static events2seq(events)[source]

turn events into a nt sequence based on the ‘move’ column

Parameters:events – list of events (dictionaries)
get_attrs(strand)[source]

Get Attributes for template/complement strand

Parameters:strand – either “template” or “complement”
Returns:{shift: foo, drift: foo, scale: foo} or None if strand not in File
Return type:dict
get_corrected_events(strand)[source]

Get events for template/complement strand and apply the shift/scale/drift corrections. Unfortunately, these corrections are not documented exactly anywhere. Some information is on https://wiki.nanoporetech.com/display/BP/1D+Basecalling+overview.

Parameters:strand – either “template” or “complement”
Returns:generator yielding one event-dict at a time or None if strand not in File
get_events(strand)[source]

Get events for template/complement strand

Parameters:strand – either “template” or “complement”
Returns:generator yielding one event-dict at a time or None if strand not in File
get_id()[source]
Returns:unique identifier for f5 file.
get_seq(strand)[source]

get the nt-sequence, based on the kmers and ‘move’. Evaluation is done in a lazy fashion. If the function was called once, the sequence can be accessed in constant time.

Parameters:strand (str) – either template or complement
Returns:(str) nucleotide sequence