The sequences in the HIV databases are, in general, classified as they were by the original authors. This means that the methods by which they have been classified vary considerably, both among authors and over time. In some cases, staff at the HIV sequence database have reclassified sequences, sometimes after discussions with the original authors and sometimes not.
The classification and naming of recombinants is a complex issue. We discuss several problems here: how to decide what a recombinant should be called, how to assign a subtype to fragments that appear to be part of a CRF, and how to assign sub-subtypes.
How recombinants are named
We name recombinants alphabetically. If a recombinant has A, C, and H fragments, it will be labeled ACH_name. If it is a recombinant of a CRF and a subtype, the number precedes the letter. For example, a recombinant between CRF01 and subtype B will be labeled 01B. Recombinants of multiple CRFs are named in the same way; a recombinant between CRF01 and CRF02 is labeled 0102, and so on. If a recombinant contains unclassified regions, a 'U' is included in the name.
How fragments of CRFs are assigned to a subtype
A number of recombinant fragments have been reclassified as follows. The non-E fragments of CRF01_AE are quite distinctive, and cluster separately from other subtype A fragments. We have done phylogenetic analysis of all sequences originally classified as subtype A, and those that clearly clustered with the CRF01_AE sequences were reclassified as CRF01_AE, even when no subtype E segment was present in the sequence. This has the advantage that it gives a clearer idea of how many CRF01 sequences are present in the database; but the obvious disadvantage is that it is possible that some of these sequences are not part of the CRF01 lineage, but instead are more closely related to the ancestor of the subtype A fragments in that CRF, and this cannot be distinguished. Initially A-like fragments of CRF02 were also reclassified as CRF02, but Vidal et al (J Virol. 2000 Nov;74(22):10498-507) have showed that there are fragments that cluster with the CRF02 subcluster of subtype A and are part of a non-recombinant subtype A genome, and thus the assumption that these fragments are part of CRF02_AG genomes is no longer warranted. These sequences are now again classified as subtype A, even though a large number of them are probably still derived from real CRF02_AG genomes The set of subtype A fragments that cluster with CRF02_AG is available here.
Assignments of sub-subtypes
As far as possible, subtype A sequences have been reclassified as A1 or A2. Unfortunately, many new subtype A sequences submitted to GenBank are still classified as A rather than A1 or A2, so the number of subtype A sequences that are not assigned to a sub-subtype will likely continue to grow. Periodically new subtype A and F sequences will be assigned to the appropriate sub-subtypes, but we will likely always lag behind in this work.
Updated January 2007 [TKL]