Currently defined groups are:
M group Subtypes
Subtypes and sub-subtypes of the HIV-1 M group are thought to have diverged in humans following a single chimpanzee-to-human transmission. The HIV-1 M group subtypes are phylogenetically associated groups or clades of HIV-1 sequences, and are labeled A1, A2, B, C, D, F1, F2, G, H, J and K. The sequences within any one subtype or sub-subtype are more similar to each other than to sequences from other subtypes throughout their genomes. These subtypes represent different lineages of HIV, and have some geographical associations.
Although there are many ambiguities in the subtyping system, it describes genetic clustering patterns and provides a useful system for organizing viruses by genetic similarity. Since the subtypes were originally defined based only on fragments of the HIV-1 genome, in some cases no intact prototype sequence was available (the former subtypes E and I, for example, which are both now defined as circulating recombinant forms). In these situations there was scant information to differentiate between parental and recombined forms of the virus, and the nomenclature was controversial. Additionally, viruses within one subtype may be evolving at different rates, and there may also be differences in rates between different subtypes, especially in some limited regions of the genome. As new sequences and better analysis tools become available, some of these ambiguities may be resolved, and the subtyping nomenclature system is evolving as we learn more.
We attempt to keep our database up to date with the HIV research community consensus as we perceive it; therefore the nomenclature in the database is not static. The nomenclature for HIV-1 subtypes and CRFs was revised in the fall of 1999; the results of that revision were published in the HIV-1 Nomenclature Proposal 1999. Each year we gather a set of subtype reference sequences that are considered to be representative of complete (or near complete) genomes of all the subtypes and circulating recombinant forms of the HIV-1 M group, and isolates from the HIV-1 N, O, and P groups.
We have created a web tool called SUDI for testing whether a newly sequenced, non-recombinant genome fits the criteria of a "new" subtype. This tool is designed to be used after phylogenetic analyses of subgenomic regions, and other methods such as bootscanning or RIP have been used to determine that the genome is equidistant from currently defined subtypes over its entire length, i.e. it is not a recombinant of existing subtypes or CRFs.
M Group Recombinants and Circulating Recombinant Forms
All retroviruses have a propensity to recombine with other relatively closely related retroviruses, and HIVs and SIVs are no exception. The viral genome is packaged as two copies of ss-RNA (not to be confused with ds-RNA) and if a given cell is infected with two different viral genomes (from the same or different strains) the odds are good that some virions package one copy from each of those two viruses. If the two strains belong to different subtypes of the HIV-1 M group, the result can be a mosaic genome composed of regions from each of the two subtypes, due to the fact that the viral reverse transcriptase engages in "template switching", or hopping from one of the packaged genomes to the other, during reverse transcription, after the co-packaged genomes enter a new cell.
Inter-subtype recombinant genomes are common, but many of them are found only in the one dually-infected (or multiply-infected) individual patient in which they arose. Such a recombinant is called a unique recombinant form (URF). If an inter-subtype recombinant virus is transmitted to many people, it becomes one of the circulating strains in the HIV epidemic, and it can be classified as a "circulating recombinant form (CRF)". CRFs represent recombinant HIV-1 genomes that have infected three or more persons who are not epidemiologically related, so they can be assumed to have an epidemiologically relevant contribution to the HIV-1 M group epidemic. The circulating recombinant forms are labeled with numbers rather than letters, and numbered in the order in which they were first adequately described.
The origins of what we designate as CRF lineages are sometimes debated in the literature. For example, CRF01 was originally considered to be a non-recombinant strain and so it was called subtype E. At the time the nomenclature strategy was defined (Robertson et al., Science. 2000; 288:55), this perspective was supported in some analysis (Anderson et al., J Virol. 2000; 74:10752), although other analyses indicated it was more likely to be a recombinant form of subtype A in most of the genome, and subtype E represented in Env, with the caveat that a full subtype E genome was either extinct or had not been sampled (Gao et al., J. Virol. 1996; 70:7013, and Carr et al., J. Virol. 1996; 9:5935). (Although global sampling of HIV has been vastly enriched since that time, a full-length "subtype E" genome has not been identified to date.) The HIV nomenclature committee decided to call this lineage CRF01_AE through a vote, although this decision was not universally agreed upon. Whatever its origins, the CRF01_AE naming convention provides a consistent way to refer to distinctive and epidemiologically important lineage that is highly prevalent in parts of Asia. This lineage is still often referred to as subtype E.
Another complexity in naming is that it is not always clear which are parental and which are recombinant forms. So the names shouldn't be considered a biological truth, rather a way to refer to related lineages that attempts to reflect the biology. Sometimes this system is imperfect, and sometimes, as in the case of CRF01, the interpretation of the biology has been subject to well-reasoned debate in the scientific literature.
Classification of recombinants in the HIV database
The classification and naming of sequences, CRFs, and recombinants is fairly complicated. We have a separate page to explain How the HIV Database Classifies Subtypes.
HIV-1 groups M, N, O and P, as well as chimpanzee and gorilla SIVs, are all part of the SIVcpz radiation within the primate lentiviruses. Group M is the "main" group of viruses in the HIV-1 global pandemic, and it contains multiple subtypes and recombinant forms described above.
Group N is a very distinctive form of the virus that has only been identified in a few individuals in Cameroon. N is sometime referred to as Not-M, Not-O, also sometimes as the "new" group, and is also thought to have originated from a chimpanzee transmission. Subtypes within the HIV-1 N group are not yet clearly defined. Very few isolates have been identified and sequenced.
HIV-1 Group O, sometimes referred to as the "outlier" group, like group M contains very diverse viruses, but is still relatively rarely found. It is thought to have originated in a transmission to humans from wild gorilla (Van Heuverswyn et al. 2006). Intra-group diversification begins once transmitted virus begins to expand in the human population after each interspecies transfer event. Subtypes within the HIV-1 O group are not yet defined, although the diversity of sequences within the HIV-1 O group is nearly as great as the diversity of sequences in the HIV-1 M group. Phylogenetic analyses of the gag and env genes do not reveal clades of virus as clearly as the clades detected in the M group.
A human isolate closely related to SIVgor was isolated and named Group P (Plantier et al. 2009).
HIV-2 is very distinct from HIV-1. While HIV-1 is most closely related to SIVs from chimpanzees, HIV-2 is closely related to SIVs isolated from sooty mangabeys. No sooty mangabey virus with a sequence falling within the HIV-2 A, B, C, F or G clades (formerly referred to as "subtypes", now referred to as "groups") has yet been found, but within the D and E clades sooty mangabey viruses have been sequenced which are very similar to HIV-2 virus sequences. It thus appears that each group of HIV-2 represents at least one separate transmission event from sooty mangabey to human.
As of September 2001, what were formerly known as the subtypes of the HIV-2 viruses are now known as groups. This was decided upon by the HIV Nomenclature Committee because sequences from these viral clades are nearly as distant from one another as are sequences from the M, N and O groups of the HIV-1 virus, and also because both the sequence diversity and the epidemiology of HIV-2 viruses suggest that each clade of virus was the result of a separate sooty mangabey-to-human transmission event. For groups D and E at least, the strains of HIV-2 found within a geographic region are documented to be more similar to SIV-SMMs from that region, than they are to HIV-2 from other regions or groups (Santiago et al. 2005).
HIV-2 recombinants are rare. In 2010, the first HIV-2 CRF was described (Ibe et al. 2010).
Simian immunodeficiency viruses are very diverse. Their genomic sequences are far more diverse than the genomes of the hosts that carry them. Lentiviruses have now been isolated from many different non-human primate species, all with natural ranges on the African continent. New world primates and Asian primates have not been found to be naturally infected with lentiviruses. Because only a few viral isolates and sequences have been obtained for each non-human primate, the "species" of lentivirus is currently stored in the HIV database in the "subtype" field. In general, each simian species that is known to carry an immunodeficiency virus carries its own "subtype" of virus, and exceptions to this general rule are believed to provide evidence for cross-species transmission events, both in the wild, and in captivity. We maintain a list of non-human primate species from which lentiviruses have been isolated: Subtypes of Primate Immunodeficiency Viruses.
Developing a biologically-relevant and human-friendly nomenclature system for the HIVs and SIVs is an ongoing process as we learn more about the viruses, and have time to inform the research community of proposed changes in nomenclature, so that old names can be replaced with new ones without too much confusion. For the time being, the combination of nomenclature and the fields used to store that nomenclature in the HIV Sequence Database are not always biologically relevant. For example, the use of the subtype field in our database to organize clades of HIV-1, HIV-2 and various SIVs does not imply that the subtypes of HIV-1 are equivalent to the groups of HIV-2, nor that the common ancestor lived in the same species in each case.
The subtype listed for the SIVs is the species in which the virus was first isolated, or began passaging. SIV viruses resulting from cross-species transfers of virus in the wild (or unintentionally in captivity) are named for the receptive host. For example, SIVbab is the result of baboons infected with SIVver. In contrast, cross-species transfers made in the laboratory are named for the original species. For example, SIVsmm can be used to infect Rhesus Macaques, and it is still considered to be SIVsmm, not SIVmac.
The chimpanzee sequences are currently grouped in one subtype (CPZ), but they come from at least two different subspecies of chimpanzees. The subspecies of chimpanzees are Pan troglodytes troglodytes, Pan troglodytes schweinfurthii, Pan troglodytes verus, Pan troglodytes vellerosus and Pan paniscus (pygmy chimp). The SIV-CPZ-US, SIV-CPZ-CAM3, SIV-CPZ-CAM5 and SIV-CPZ-GAB genomes are all derived from Pan troglodytes troglodytes. The SIV-CPZ-ANT genome is from a Pan troglodytes schweinfurthii. The paper (Haesevelde et al. 1996) that described the sequencing of the SIV-CPZ-US genome has a good discussion of the subspecies of chimps, their viruses and their geographic ranges. In addition, papers published in 2006 describing SIV from gorillas (Van Heuverswyn et al. 2006) and SIV from Cameroonian chimpanzees (Keele et al. 2006) provided yet more evidence of the relationships of the HIV-1 M, N and O groups to primate lentiviruses.
The SIV-AGMs are subdivided into subtypes, based on the species of African green monkey: GRIVET (Chlorocebus aethiops) VERVET (Chlorocebus pygerythrus) TANTALUS (Chlorocebus tantalus) and SABAEUS (Chlorocebus sabaeus). The BABOON subtype viruses were isolated from wild-caught Chacma Baboons (Papio ursinus) and wild-caught Yellow Baboons (Papio hamadryas cynocephalus) infected in the wild (South Africa and Tanzania respectively) with SIVs which cluster with the vervet African green monkey SIVs.
The SIVs isolated from sooty mangabeys and macaque species are all known to be derived from sooty mangabey viruses, because wild macaques have been extensively studied and found to be seronegative. The macaques have become infected via contact with sooty mangabeys in captivity, mostly in the primate centers in the USA. There are currently three major lineages of these viruses with sequences in the database: 1) SIV-MAC-251 and viruses known to be derived from SIV-MAC-251, SIV-MAC142 and SIV-MNE-MNE. 2) SIV-SMM-SMM9 and viruses derived from SIV-SMM-SMM9 (notably the PBj series), SIV-SMM-F236 and SIV-SMM-PGM. 3) SIV-STM-STM. It is clear from the intermingling of sequences from sooty mangabeys and macaques, that several cross-species transmissions took place relatively recently, at least in the latter half of the 20th century.