Supported file formats by this tool
abi: Reads the ABI "Sanger" capillary sequence traces files, including the PHRED quality scores for the base calls. This allows ABI to FASTQ conversion. Note each ABI file contains one and only one sequence (so there is no point in indexing the file).
abi-trim: Same as "abi" but with quality trimming with Mott's algorithm.
ace: Reads the contig sequences from an ACE assembly file. Uses Bio.Sequencing.Ace internally clustal The alignment format of Clustal X and Clustal W. See also the Bio.Clustalw module.
cif-atom: Uses Bio.PDB.MMCIFParser to determine the (partial) protein sequence as it appears in the structure based on the atomic coordinates.
cif-seqres: Reads a macromolecular Crystallographic Information File (mmCIF) file to determine the complete protein sequence as defined by the _pdbx_poly_seq_scheme records.
clustal: The alignment format of Clustal X and Clustal W.
embl: The EMBL flat file format. Uses Bio.GenBank internally.
fasta: This refers to the input FASTA file format introduced for Bill Pearson's FASTA tool, where each record starts with a '>' line. Resulting sequences have a generic alphabet by default.
fasta-2line: FASTA format variant with no line wrapping and exactly two lines per record.
fastq: FASTQ files are a bit like FASTA files but also include sequencing qualities. In Biopython, 'fastq' refers to Sanger style FASTQ files which encode PHRED qualities using an ASCII offset of 33. See also the incompatible 'fastq-solexa' and 'fastq-illumina' variants.
fastq-sanger: FASTQ files are a bit like FASTA files but also include sequencing qualities. In Biopython, 'fastq-sanger' refers to Sanger style FASTQ files which encode PHRED qualities using an ASCII offset of 33. See also the incompatible 'fastq-solexa' and 'fastq-illumina' variants.
fastq-solexa: FASTQ files are a bit like FASTA files but also include sequencing qualities. In Biopython, 'fastq' refers to Sanger style FASTQ files which encode PHRED qualities using an ASCII offset of 33. See also the incompatible 'fastq-solexa' and 'fastq-illumina' variants.
fastq-illumina: FASTQ files are a bit like FASTA files but also include sequencing qualities. In Biopython, 'fastq' refers to Sanger style FASTQ files which encode PHRED qualities using an ASCII offset of 33. See also the incompatible 'fastq-solexa' and 'fastq-illumina' variants.
gck: The native format used by Gene Construction Kit.
genbank: The GenBank or GenPept flat file format.
gb: The GenBank or GenPept flat file format (alias for genbank).
ig: This refers to the IntelliGenetics file format, apparently the same as the MASE alignment format.
igmt: This refers to the IMGT variant of the EMBL plain text file format.
nexus: The NEXUS multiple alignment format, also known as PAUP format.
pdb-seqres: Reads a Protein Data Bank (PDB) file to determine the complete protein sequence as it appears in the header (no dependency on Bio.PDB and NumPy).
pdb-atom: Uses Bio.PDB to determine the (partial) protein sequence as it appears in the structure based on the atom coordinate section of the file (requires NumPy).
phd: PHD files are output from PHRED, used by PHRAP and CONSED for input.
phylip: An alignment format. Truncates names at 10 characters.
pir: A FASTA like' format introduced by the National Biomedical Research Foundation (NBRF) for the Protein Information Resource (PIR) database, now part of UniProt.
seqxml: Simple sequence XML file format.
sff: Standard Flowgram Format (SFF) files produced by 454 sequencing.
sff-trim: Standard Flowgram Format applying the trimming listed in the file.
snapgene: The native format used by SnapGene.
stockholm: The Stockholm alignment format is also known as PFAM format.
swiss: Swiss-Prot aka UniProt format.
tab: Simple two column tab separated sequence files, where each line holds a record's identifier and sequence. For example, this is used by Aligent's eArray software when saving microarray probes in a minimal tab delimited text file.
qual: Qual files are a bit like FASTA files but instead of the sequence, record space separated integer sequencing values as PHRED quality scores. A matched pair of FASTA and QUAL files are often used as an alternative to a single FASTQ file.
uniprot-xml: UniProt XML format, successor to the plain text Swiss-Prot format.
xdna: The native format used by Christian Marck's DNA Strider and Serial Cloner.