In the construction of the CONSENSUS_M_GROUP, the consensus sequence from each subtype is included and each subtype is weighted equally. The CRFs are not included in this construct, and rare subtypes are weighted as strongly as common subtypes.

If this M-group consensus sequence is used as an outgroup, it forces the root of the HIV-1 M group in a maximum likelihood tree to branch near the center of the tree and the ancestor and consensus are very nearly the same (Korber et al., Science, 288:1789 (2000)). These artificial sequences are roughly "central", and essentially the same distance to sequences from any subtype or CRF as modern intra-subtype distances.

A whole protein constructed as a hypothetical ancestor or consensus is artificial and may not fold properly. For short contiguous T cell epitopes, this may not be an issue.

