These sequences are modeled ancestral sequences derived from a maximum likelihood tree. The tree was used to estimate the most likely character for every position in the sequence alignment at each internal node, or branch point. We estimated the ancestral sequence for each of the HIV-1 subtypes and for the M group. The Maximum Likelihood program used in these analyses has been described in Korber et al., Science 2000, and the associated Supplementary Material.
The maximum likelihood ancestral sequences were based on full length genome alignments. Sequences that are described in the literature as being recombinant, or part of a set of sequences assigned to a circulating recombinant form, were excluded from the alignment.
Regions that were riddled with insertions and deletions were excluded from the alignment used to construct the tree. Single and double frameshifting insertions were also deleted. Regions that were deleted in only a few sequences had the gaps replaced with an "N" so the alignment could be maintained and the positions could be included in the likelihood reconstruction of the ancestor. An "N" is neutral and doesn't influence the ancestral reconstruction.
Regions with multiple insertions and deletions that were not included in the original alignment were filled in by using the consensus sequence for each clade. Bases that were reconstructed in the using maximum likelihood methods are uppercase, bases that were inserted from the consensus sequence to rebuild the ancestor spanning indels are lower case.
The tree used to construct these ancestral sequences is shown below and can be downloaded in PDF or postscript format. The evolutionary model , and sequence alignment with the rate variation at each site are available.
Contact us at the e-mail address at the bottom of the page if you have questions, or if you have requests to reconstruct an ancestral sequence from other ancestral nodes in this phylogenetic tree.