GenBank, GenBank Raw (sequence only from a GenBank flat file), EMBL, Table, Fasta, Mase (= IG, Intelligenetics), NEXUS interleaved, NEXUS sequential, MEGA interleaved, MEGA sequential, Stockholm, Clustal, BLAST, RSF, Phylip interleaved, Phylip sequential, MSF, GCG, GDE, GDEFlat, Raw, SLX and MacVector.
For descriptions of some common sequence formats, see Common Sequence Formats.
In addition to the formats listed above relaxed Phylip interleaved, relaxed Phylip sequential, comma-separated (CSV) and Pretty-print are supported as output formats. GenBank, EMBL, MacVector, and BLAST are not supported.
Relaxed Phylip (sequential and interleaved) will produce the same output as standard Phylip does with the only exception that in the relaxed format sequence names are not truncated to 10 characters. Instead, sequence names are left as they are and buffered with whitespaces based on the longest sequence name in the submitted data set. This ensures proper display of the aligned sequences in the interleaved format and consistent sequence name lengths for both, interleaved and sequential formats.
|Sequence Format||File Extension|
|Phylip standard interleaved||.phylipi|
|Phylip standard sequential||.phylips|
|Phylip relaxed interleaved||.rphylipi|
|Phylip relaxed sequential||.rphylips|
The "Raw" format consists of pure sequence, either nucleotides or one-letter amino acids.
ACATGTGCGCGCGATTATCTATCGATGCTACGTAWhen this sequence is converted to a non-raw format it will be given the name "seq1". If Raw input consists of multiple lines, each line is interpreted as a separate sequence. Thus, the input
ACATGTGCGCGCGATTATCTATCGATGCTACGTA GCATGTGCACGCGATTATCTACCGATGCTACTTAwould produce the following fasta output:
>seq1 ACATGTGCGCGCGATTATCTATCGATGCTACGTA >seq2 GCATGTGCACGCGATTATCTACCGATGCTACTTATherefore if you are submitting a single raw sequence be sure it is on a single line.
Phylip files must begin with a line that looks like
3 78 ithat shows the number of sequences in the file (3), the number of characters in each sequence (78), and then the letter "i" or "s" which indicates whether the file is "interleaved" or "sequential" respectively. The format converter requires the i or s letters. The format converter program deals with only two essential data items, the sequence, and the sequence name. Thus, a complicated file format such as Nexus when converted to a simpler format such as table will lose all the associated information except the sequence name and the sequence. Converting a Nexus file like:
#NEXUS Begin data; Dimensions ntax=3 nchar=79; Format datatype=dna gap=-; Matrix 4axED43xco GGAGGCCCTACCTCAAGTAGTGACGCCCTACCTCCCGTTGGCTGTTTCCTCTTGCGTAGAACGCTACTTTCGGGCAACC 2bxMD2b2x1 CGCTGTTGATCACCAAATCGGAGGGCACCTA-----GGAACACAGCTCCTCATGGATCGAGAGTACTTTCTAACCGTGA 2bxMD2b9x1 CGCTGCCAAATACCGAGTCGGAAGGCATCTACGGTTGAGACACGGCTCCCCATGAACCGAGGGTATTTCCTAACCGTGG ; End;to fasta format would produce the following file:
>4axED43xco GGAGGCCCTACCTCAAGTAGTGACGCCCTACCTCCCGTTGGCTGTTTCCTCTTGCGTAGAACGCTACTTTCGGGCAACC >2bxMD2b2x1 CGCTGTTGATCACCAAATCGGAGGGCACCTA-----GGAACACAGCTCCTCATGGATCGAGAGTACTTTCTAACCGTGA >2bxMD2b9x1 CGCTGCCAAATACCGAGTCGGAAGGCATCTACGGTTGAGACACGGCTCCCCATGAACCGAGGGTATTTCCTAACCGTGGThe datatype (dna), number of taxa, etc. are not represented in the fasta file, only the names and sequences.