The purpose of SUDI (SUbtyping DIstance Tool) is to determine tree-based genetic distances for a new cluster relative to known subtypes. It is designed to assist you in determining whether a newly identified cluster of sequences should most appropriately be considered part of a new subtype, a new sub-subtype, or part of a previously defined subtype, based upon a comparison to the level of similarity found among previously defined subtypes. Because absolute levels of similarity will be dictated by precise gene regions under consideration, the time of sampling of background samples in an ever-diverging epidemic, and the specific alignment, we do not set absolute criteria for intra- and inter-subtype distances.

SUDI should be used in conjunction with other tools that will identify recombination and phylogenetic relationships. It is important that you are very familiar with the phylogenetic relationships among your novel sequences and the background set of sequences, and that potential regions of inter-subtype recombination have been defined, prior to using this tool.

The examples given here show the use of this tool for HIV-1 M group subtypes and sub-subtypes. However, the tool can be used for any organism that has a defined subtype nomenclature.


SUDI accepts two types of input: either an alignment, or the "outfile" of a PHYLIP tree building program.

If your input is a sequence alignment, the requirements are:

The default tree for the program is a PHYLIP Neighbor Joining tree with an F84 (DNAML) model. If you would prefer to use a more complex model or another kind of tree building strategy, or would like to use a tool other than PHYLIP, then first create your own tree. If it is a PHYLIP tree, it can be used directly as input (see example outfile). If it is not, then use the tree you have created as a basis to create a user-defined tree with PHYLIP, and then use the PHYLIP outfile as the input for SUDI.

Base Node

If you are submitting an alignment, you must include an outgroup as the first sequence in the set. The outgroup will not be included in the final subtype distance analysis. For either input type (alignment or tree), SUDI will automatically determine which node of the tree is closest to the outgroup. In our example outfile, the base node is node 20.

Naming Sequences

It is critical that sequences used in this analysis be named appropriately, or the analysis will not work!

Options and Defaults

Because of the general design of this tool, it can be used for non-HIV sequences, such as HCV. However, the default settings for the groups to be compared are just an example based on HIV-1 subtype nomenclature.


Based on the tree, histograms will be generated showing the range of intra-subtype distances, inter-subtype distances, and sub-subtype distances. The category that a given pairwise distance is assigned to (intra-subtype, inter-subtype, or sub-subtype distances ) will depend on how the sequence was labeled (A_, B_...) and how the clusters were defined.

The cluster of sequences that the user is interested in (the sequences labeled "U") will be highlighted relative to the background set. The U intra-subtype distances will be shown, and the U inter-subtype distance relative to the subtype closest to U will be shown. This way the user can determine if the novel cluster should be broken into sub-subtypes, or be considered part of a previously defined subtype.

Sample Plots

The sample plots below illustrate Subtype and Sub-subtype behavior and show the output of the SAT process.

SUDI was written by B. Korber and R. Funkhouser; P. Rose assisted with the interactive Web interface.

