HIV Databases HIV Databases home HIV Databases home
HIV sequence database

Format Converter Explanation


Both nucleotide and protein sequences are accepted. By default, your molecule type is automatically determined. (In the case of short sequences, this calculation can sometimes fail if the count of characters like 'A' (present in both base types) is high.)

The format of your input file will be automatically detected in most cases. If you get an error message stating that your format cannot be recognized, try specifying the input format instead of choosing "Automatic". If you still receive this error, double check the details of your format, or try removing all blank spaces from your sequence names.

Input formats accepted are:

For descriptions of some common sequence formats, see Common Sequence Formats.


Available output formats are listed below. GenBank, EMBL, MacVector, and BLAST are not supported.

File extensions assigned by this tool reflect the generated output.
Some of these file extensions are non-standard. If you are using the file as input in other software, you may need to change the file extension to its standard form (e.g., '.nxs' for Nexus).

Sequence output formatFile extension
Output aligned.outali
GDE Flat.gdeflat
MEGA interleaved.megai
MEGA sequential.megas
Nexus interleaved.nexusi
Nexus sequential.nexuss
Phylip standard interleaved.phylipi
Phylip standard sequential.phylips
Phylip relaxed interleaved.rphylipi
Phylip relaxed sequential.rphylips

Enforce sequence name uniqueness

There are two situations where you may want to select this option. One, if there is a possibility that any of your sequence names are duplicated, this may cause problems in other software. Two, some sequence formats limit the number of characters in the names, so your unique names may be truncated into non-unique names, unless you check this option. In particular, phylip standard and SLX limit the number of characters in names.

When selected, this option will change the sequence names, as in these examples:
Input namesOutput names
Input namesOutput names

Convert GenBank to GFF3

This translation option is provided specifically to convert the information from GenBank format files into GFF3 format. Unlike other format translations in this tool, this conversion retains the annotated data from the GenBank file, not just the name and sequence. If this selection is chosen, other options are ignored.

For testing purposes, click here to download a sample GenBank format file.

Remove IUPAC characters

Some tools cannot handle IUPAC ambiguity codes in nucleotide sequences. This option replaces any character other than ACGTU with a "N". This option is relevant only for nucleotide sequences.

Notes about specific formats


The "Raw" format consists of pure sequence, either nucleotides or one-letter amino acids.

When this sequence is converted to a non-raw format it will be given the name "seq1". If Raw input consists of multiple lines, each line is interpreted as a separate sequence. Thus, the input
would produce the following fasta output:
Therefore if you are submitting a single raw sequence, be sure it is on a single line.


Phylip files must begin with a line that looks like:

3  78  i
which shows the number of sequences in the file (3), the number of characters in each sequence (78), and then the letter "i" or "s" which indicates "interleaved" or "sequential". The i or s letters are optional.

Standard phylip files have a limitation of 10 characters in the sequence names. For this reason, we also provide relaxed phylip options that will preserve the full length of your sequence names.

Phylip relaxed

The relaxed Phylip format is unique to the Format Converter tool. It is called 'relaxed' because it will generate a Phylip formatted file where sequence names can be longer than 10 characters. Relaxed Phylip (sequential and interleaved) will produce the same output as standard Phylip, except that in the relaxed format sequence names are not truncated to 10 characters. Instead, sequence names are left as they are and buffered with whitespaces based on the longest sequence name in the submitted data set. This ensures proper display of the aligned sequences in the interleaved format and consistent sequence name lengths for both interleaved and sequential formats.


The format converter program deals with only two essential data items, the sequence, and the sequence name. Thus, a complicated file format such as Nexus, when converted to a simpler format such as table, will lose all the associated information except the sequence name and the sequence. For example, this Nexus file:

Begin data;
	Dimensions ntax=3 nchar=79;
	Format datatype=dna gap=-;
would produce the following Fasta file:

The datatype (dna), number of taxa, etc., are not represented in the Fasta file, only the names and sequences.


Sequence names in SLX are limited to 32 characters. Any sequence names longer than that will be truncated in the format conversion process, which can result in non-unique sequence names in the generated output. If you need to preserve the uniqueness of your sequence names please use the check box labeled 'Enforce sequence name uniqueness' in the 'Options' panel.

Alternative tools


last modified: Thu Jun 16 11:14 2016

Questions or comments? Contact us at

Operated by Los Alamos National Security, LLC, for the U.S. Department of Energy's National Nuclear Security Administration
Copyright © 2005-2012 LANS LLC All rights reserved | Disclaimer/Privacy

Dept of Health & Human Services Los Alamos National Institutes of Health