HIV Databases HIV Databases home HIV Databases home
HIV sequence database



Updated Proposal of Reference Sequences of HIV-1 Genetic Subtypes, 1997

1999 Nomenclature Proposal
Circulating Recombinant Forms

Thomas Leitner1, Bette Korber1, David Robertson2, Feng Gao3, Beatrice Hahn3

1Theoretical Biology and Biophysics, Group T-10, MS K710, Los Alamos National Laboratory, Los Alamos, NM 87545;
2 Laboratory of Structural & Genetic Information, CNRS-EP 91, 31 Chemin Joseph Aiguier, 13402 Marseille Cedex 20, France;
3 Department of Medicine and Microbiology, University of Alabama at Birmingham, 701 S. 19th Street, LHRB 613, Birmingham, AL 35294.

HIV-1 subtypes are clusters of sequences within the M (major) group of HIV-1 viruses, that are defined by phylogenetic analysis 12. At the higher level HIV-1 is divided into group M (major) and group O (outlier) 9. In the 1996 compendium, a reference set of HIV-1 sequences representing the different subtypes was proposed. However, since that issue of the compendium, many more full length sequences have been produced from the various HIV-1 M group subtypes. Partly as a consequence of the availability of these new sequences, and partly due to further analysis of pre-existing data, new information has been accumulating concerning subtyping and HIV-1 alignments and hybrid genomes 3,6,8,14. We are thus updating the proposed list of reference sequences that appeared in section III, Table 1 on page 30 of the Human Retroviruses and AIDS 1996 compendium. In the future, we anticipate that this table, and the accompanying alignments, will be updated by the database on a regular basis as a service to the scientific community, particularly since new submissions are likely to include a considerable number of complex mosaic genomes.

Table 1 lists reference sequences for 9 group M subtypes in four major coding regions, i.e., gag, pol, env and nef. The criteria for inclusion of a reference sequence have changed since 1996, with a shift in emphasis to using full length HIV genomic sequences to serve as representatives for the subtypes, when such sequences are available. These are supplemented with sequences spanning intact coding regions, or else the longest gene fragments currently available. Table 1A summarizes a basic list of reference sequences alignments for international variation and subtyping efforts; while Table 1B provides additional information, including GenBank accession numbers, citations describing the respective sequence, sampling year, and the country where the virus was collected. Protein and nucleotide sequence alignments are available from the HIV database for each of the four coding regions on the website subtype alignments page. These alignments also contain the HIV-1 O group sequences ANT70 and MVP5180 and chimpanzee viral sequences CPZ-GAB and CPZ-ANT. Full length nucleotide genome alignments will also be available with the release of the next HIV compendium. The alignments were generated using first a hidden Markov model 4 at the nucleotide level, and then manually corrected at the amino acid level to keep open reading frames in frame, and finally back-translated into nucleotides. We are including maximum likelihood trees 5,15 showing the phylogenetic relationships of the representative sequences selected for the gag, pol, env and nef. Trees for V3 and p17 sequences are also included, because: 1) these genomic regions are commonly sequenced; 2) sequences are available for all of the selected reference strains, including the shorter fragments; and 3) the organization of the V3 region tree is somewhat altered relative to the full length env tree.

Some of the HIV-1 subtypes are more clearly defined than others. A, B, C, D, F, and H, each have at least one full length, apparently non-recombinant genome available as a reference sequence (non-recombinant in the sense that there are no conflicting subtype associations in different regions of the sequence identified at the present time), as well as multiple additional full length env and gag sequences. Our current database and methods, however, have limitations, and additional sequences and analyses in the future may change our current understanding of subtype relationships. Also, when using gene fragments, occasionally sequences may not give clear cut answers over short stretches. For example, the H full-length representative is essentially subtype H throughout its genome (see trees). However, phylogenetic analyses of V3 sequences on occasion produce trees that show a close association of subtype H and A sequences. Since this discordance is not always observed but depends on the particular alignments and programs used, it is not sufficient evidence for recombination. Nevertheless, the fact that it can occur, underscores the possibility of ambiguities that can arise with subtyping efforts, particularly those directed at short regions of the genome.

All full length representatives of subtypes E and G that have been sequenced to date represent mosaic genomes, with parts of the viral genome clustering with the A subtype in phylogenetic analysis, and parts of the genome forming the two clearly distinguishable clades designated either E or G. The E subtype clearly forms a distinct "E" subtype in envelope, appears to be an A/E mosaic in the HIV regulatory regions, and is essentially A-associated in gag and pol 3,8. All of the longer subtype G sequences available to date have stretches of subtype A-associated sequence interspersed. In contrast to the A/E mosaics, however, these A/G mosaic sequences have many different patterns of A-like sequence, suggesting that they resulted from independent recombination events 6,13,14[6, 13, 14]. Subtype I sequences [10] have also been reanalyzed 7,16 and there is now evidence from analysis of a complete genome that they are multiply mosaic and composed of 3 (or of possibly 4) subtypes 7. Further analysis is necessary to determine the potential they have to represent a distinct subtype. Only fragments of subtype J env and gag genes are available at this time 11, so further work is needed in this case as well to fully characterize these strains. However, these sequences are included as reference sequences to assist in identification of possible J subtype related strains in the future. In addition to the two subtype J sequences listed in the table, three other sequences (GM4, GM5, GM7) have been found to cluster close to this subtype over the env V3 region [1]. However, the sequence of GM4 was suggested to be a G/?/C recombinant 2, where the question mark covered a roughly 600 bp section including the V3 region.


Table 1A.
Updated Proposal of Reference Sequences of HIV-1 Genetic Subtypes.

Subtype		gag		pol		env		nef
						

A		U455		U455		U455		U455
		92UG037.1	92UG037.1	92UG037.1	92UG037.1
		K89				K89(KENYA)
		VI32				SF170

B		HXB2		HXB2		HXB2		HXB2
		JRFL		JRFL		JRFL		JRFL
		OYI		OYI		OYI		OYI
		RF		RF		RF		RF

C		ETH2220		ETH2220		ETH2220		ETH2220
		92BR025.8	92BR025.8	92BR025.8	92BR025.8
		UG268				UG268
		ZAM18				ZAM18

D		NDK		NDK		NDK		NDK
		Z2Z6		Z2Z6		Z2Z6		Z2Z6
		ELI		ELI		ELI		ELI
		94UG114.11	94UG114.11	94UG114.11	94UG114.11

E2						CM240
						TN235
						90CR402.1
						93TH253.3

F		93BR020.1	93BR020.1	93BR020.1	93BR020.1
		BZ162				BZ163
		VI69				BZ126
		VI174				RJI03

G3		92NG003.24			92NG003.24	92NG003.24
		92NG083.1	92NG083.1	92NG083.1	92NG083.1
		SE61655		SE61655		92UG975.10
						92RU131.9

H		90CR056.1	90CR056.1	90CR056.1	90CR056.1
		VI5575				VI5575
						CA135

J		SE70225				SE70225	
		SE78875				SE78875				

1 The sequence 94UG114.1 is the most distant D subtype sequence (see trees), tending to branch off closest to the B/D root in most analyses. In some subgenomic regions, it may even move outside the B/D cluster.
2 E in most of env, A in gag and pol , mixture of A & E in regulatory genes [3, 8, 14].
3 The G reference sequences may show resemblance to subtype A in regions of pol and vif, see also foot note 4.
4 92NG003.2 is a full length sequence, but is not included in the pol alignment because of A-like regions in this gene [6].
5 Full length gene sequences of gag, pol or env are not yet available, see Table 1B.


Table 1B. Sequence descriptions
Subtype			Sequence		Acc. No.			Source								Region			Sampling year	Sampling country (origin)
									
HIV-1 M Group sequences in alignments
A			U455			M62320			Oram, J.D. et al, ARHR 6:1073-1078 (1990)				complete			NA		Uganda
A			92UG037.1		U51190			Gao, F. et al, J.virol 70:7013-7029 (1996)				complete			1992		Uganda
A			K89			L11774			Louwagie, J. et al, AIDS 7:769-780 (1993)				gag				NA		Kenya
A			VI32			L11788			Louwagie, J. et al, AIDS 7:769-780 (1993)				gag				NA		Burundi
A			SF170			M66533			Evans, L. et al, PNAS 85:2815 (1988)					env				NA		Rwanda

B			HXB2			K03455, M38432		Wong-staal, F. et al, Nature 313:277-284 (1985)				complete			NA		France
B			JRFL			U63632			O'Brien, W.A. et al, Nature 348:69 (1990)				complete			NA		US
B			OYI			M26727			Wain-Hobson, S. et al, AIDS 3:707 (1989)				complete			NA		Gabon
B			RF			M17451, M12508		Starcich, B.R. et al, Cell 45:637-648 (1986)				complete			1983		US (Haiti)
	
C			ETH2220			U46016			Salminen, M.O. et al, ARHR 12:1329-1339 (1996)				
complete			1986		Ethiopia
C			92BR025.8		U52953			Gao, F. et al, in preparation (1997)					
complete			1992		Brazil
C			UG268			L11799			Louwagie, J. et al, AIDS 7:769-780 (1993)				gag				1993		Uganda
C			UG268			L22948			Louwagie, J. et al, J.virol 69:263-271 (1995)				env-nef				1993		Uganda
C			ZAM18			L03705			McCutchan, F. et al,JAIDS 5:441-449 (1992)				gag				1989		Zambia
C			ZAM18			L22954			Louwagie, J. et al, J.virol 69:263-271 (1995)				env				1989		Zambia

D			NDK			M27323			Spire, B. et al, Gene 81:275-284 (1989)					complete			NA		Zaire
D			Z2Z6			M22639			Theodore, T. et al, unpublished (1988)					complete			NA		Zaire
D			ELI			K03454, X04414		Alizon, M. et al, Cell 46:63-74 (1986)					complete			NA		Zaire
D			94UG114.1		U88824			Gao, F et al, in preparation (1997)					complete			1994		Uganda
	
E			CM240			U54771			Carr, J.K. et al, J.virol 70:5935-5943 (1996)				complete			1990		Thailand
E			TN235			L03698			McCutchan, F.E. et al, ARHR 8:1887-1895 (1992)				env				NA		Thailand
E			90CR402.1		U51188			Gao, F. et al, J.virol 70: 7013-7029 (1996)				complete			1990		Central African Republic 
E			93TH253.3		U51189			Gao, F. et al, J.virol 70: 7013-7029 (1996)				complete			1993 		Thailand
		
F			BZ162			L11751			Louwagie, J. et al, AIDS 7:769-780 (1993)				gag				NA		Brazil
F			VI69			L11796			Louwagie, J. et al, AIDS 7:769-780 (1993)				gag				NA		Belgium (Rwanda)
F			VI174			L11782			Louwagie, J. et al, AIDS 7:769-780 (1993)				gag				NA		Zaire
F			BZ163			L22085			Louwagie, J. et al, ARHR 10:561-567 (1994)				env-nef				NA		Brazil
F			BZ126			L22082 			Louwagie, J. et al, ARHR 10:561-567 (1994)				env-nef				NA		Brazil
F			93BR020.1		AF005494			Gao, F. et al, in preparation (1997)				complete			1993		Brazil	
F			RJI03			U08974			Sabino, E.C., et al., J.virol 68:6340-6346 (1994)			partial env			NA		Brazil

G			92NG003.2 		U88825			Gao, F. et al, in preparation (1997)					complete			1992		Nigeria
G			SE6165			L40752, L40761		Leitner, T. et al, Virology 209:136-146 (1995)				p17, RT				1993		Sweden (Central Africa)
G			92NG083.1		U88826			Gao, F. et al, in preparation (1997)					complete			1992		Nigeria
G			92UG975.10		U27426			Gao, F. et al, J.virol 70:1651-1657 (1996)				env				1992		Uganda
G			92RU131.9		U30312			Gao, F. et al, J.virol 70:1651-1657 (1996)				env-nef				1992		Russia

H			90CR056.1		AF005496		Gao, F. et al, in preparation (1997)					complete			1990		Central African Republic
H			VI557			U09666			Janssens, W. et al, ARHR 10:877-879 (1994)				V3-V5				NA		Zaire
H			VI557			L11793			Louwagie, J. et al, AIDS 7:769-780 (1993)				gag				NA		Zaire
H			CA13			U09667			Janssens, W. et al, ARHR 10:877-879 (1994)				V3-V5				NA		Cameroon

J			SE7022			L41177, L41179		Leitner, T. et al, ARHR 11:995-997 (1995)				V3, p17				1993		Sweden (Zaire)
J			SE7887			L41176, L41178		Leitner, T. et al, ARHR 11:995-997 (1995)				V3, p17				1994		Sweden 
								
Additional sequences available in alignments

O Group			ANT70			L20587			Vanden Haesevelde, M. et al, JVI 68:1586-1596 (1994)			complete			NA		Cameroon
O Group			MVP5180			L20571			Gurtler, L. et al, JVI 68:1581-1585 (1994)				complete			1991		Cameroon
Chimpanzee		SIV-CPZANT		U42720			Vanden Haesevelde, M. et al,Virology 221:346-350 (1996)			complete			1986		Zaire
Chimpanzee		SIV-CPZGAB		X52154			Huet, T. et al, Nature 345:356-359 (1990)				complete			NA		Gabon

Figure 3:
Unrooted phylogenetic trees calculated with maximum likelihood methods
5,15. The alignments for the various subgenomic regions are derived from the complete genomic alignment, as presented in the HIV database 1997. Each of the alignments are available from the HIV database. All alignments were globally stripped for the generation of trees to 1398 sites for the gag gene; 2994 sites for the pol gene; 2250 sites for the env gene; 282 for the nef gene; 426 for the p17gag fragment and 213 sites for the env V3 fragment. All trees, except the nef tree, were constructed using the program DNAML under the F84 substitution model 5 where nucleotide frequencies were derived from the datasets and the transition/transversion parameter was set to 3.0 for the gag gene, 2.0 for the pol gene, 1.5 for the env gene, 3.0 for the p17 fragment and 1.42 for the V3 fragment. The nef gene tree was calculated with the programs fastDNAml and DNArates15, to allow for different substitution rates across sites. This proved to be important for the topology of this gene tree in resolving subtypes B and D. Although G subtype sequences in the trees shown cluster separate from subtype A, they cluster in Subtype A in some subgene regions 6. Subtypes B and D are generally close to each other in all analyses, but with nef gene it may be difficult to completely resolve them from each other. All trees are drawn to the same scale, thereby indicating the relative information density in the different regions.

REFERENCES

1. Blouin, J. C., E. A. Guzman, and B. T. Foley. 1996. Global variation in the HIV-1 V3 region, p. III77-III201. In G. Myers and B. Korber and B. Foley and K.-T. Jeang and J. W. Mellors and S. Wain-Hobson (ed.), Human Retroviruses and AIDS: a compilation and analysis of nucleic and amino acid sequences. Los Alamos National Laboratory, Los Alamos, NM.

2. Bobkov, A., R. Cheingsong-Popov, M. Salminen, F. McCutchan, J. Louwagie, K. Ariyoshi, H. Whittle, and J. Weber. 1996. Complex mosaic structure of the partial envelope sequence from a Gambian HIV type 1 isolate. AIDS Res. Hum. Retrovirus. 12:169-171.

3. Carr, J. K., M. O. Salminen, C. Koch, D. Gotte, A. W. Artenstein, P. A. Hegerich, D. St. Louis, D. S. Burke, and F. E. McCutchan. 1996. Full-length sequence and mosaic structure of a human immunodeficiency virus type 1 isolate from Thailand. J. Virol. 70:5935-5943.

4. Eddy, S. 1995. HMMER Hidden Markov Models of Protein and DNA Sequence, 1.8 ed. Washington University School of Medicine, St. Louis, MO.

5. Felsenstein, J. 1993. PHYLIP: Phylogeny Inference Package, 3.52c ed. University of Washington, Seattle, WA.

6. Gao, F., D. L. Robertson, C. D. Carruthers, S. G. Morrison, B. Jian, Y. Chen, F. Barre- Sinoussi, M. Girard, A. Srinivasan, A. G. Abimiku, G. M. Shaw, P. M. Sharp, and B. H. Hahn. 1997. Non-recombinant reference clones and sequences for human immunodeficiency virus type 1 subtypes A, C, D, F, and H. :submitted.

7. Gao, F., D. L. Robertson, and B. H. Hahn. 1997. unpublished data.

8. Gao, F., D. L. Robertson, S. G. Morrison, H. Hui, S. Craig, J. Decker, P. N. Fultz, M. Girard, G. M. Shaw, B. H. Hahn, and P. M. Sharp. 1996. The heterosexual human immunodeficiency virus type 1 epidemic in Thailand is caused by an intersubtype (A/E) recombinant of African origin. J. Virol. 70:7013-7029.

9. Korber, B., I. Loussert-Ajakai, J. Blouin, and S. Saragosti. 1997. A comparison HIV-1 group M and group O functional and immunogenic domains in the gag p24 protein and the C2V3 region of the envelope protein, p. (III)41-(III)56. In G. Myers and B. Korber and B. Foley and K.-T. Jeang and J. W. Mellors and S. Wain-Hobson (ed.), Human retroviruses and AIDS 1996: a compilation and analysis of nucleic acid and amino acid sequences. Los Alamos National Laboratory, Los Alamos, NM.

10. Kostrikis, L. G., E. Bagdades, Y. Cao, L. Zhang, D. Dimitriou, and D. D. Ho. 1995. Genetic analysis of human immunodeficiency virus type 1 strains from patients in Cyprus: identification of a new subtype designated subtype I. J. Virol. 69: 6122-6130.

11. Leitner, T., A. Alaeus, S. Marquina, E. Lilja, K. Lidman, and J. Albert. 1995. Yet another subtype of HIV type 1? AIDS Res. Hum. Retrovirus. 11:995-997.

12. Louwagie, J., F. E. McCutchan, M. Peeters, T. P. Brennan, B. E. Sanders, G. A. Eddy, G. van der Groen, K. Fransen, G.-M. Gershy-Damet, R. Deleys, and D. S. Burke. 1993. Phylogenetic analysis of gag genes from 70 international HIV-1 isolates provides evidence for multiple genotypes. AIDS. 7: 769-780.

13. McCutchan, F. 1997. Personal communication.

14. McCutchan, F. E., M. O. Salminen, J. K. Carr, and D. S. Burke. 1996. HIV-1 genetic diversity. AIDS. 10 (suppl 3):S13-S20.

15. Olsen, G. J., H. Matsuda, R. Hagstrom, and R. Overbeek. 1994. fastDNAml: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood. Comput. Appl. Biosci. 10: 41-48.

16. Salminen, M. 1996. Personal communication.

last modified: Tue Mar 16 10:03 2010


Questions or comments? Contact us at seq-info@lanl.gov.

 
Operated by Los Alamos National Security, LLC, for the U.S. Department of Energy's National Nuclear Security Administration
Copyright © 2005-2012 LANS LLC All rights reserved | Disclaimer/Privacy

Dept of Health & Human Services Los Alamos National Institutes of Health