HIV Databases HIV Databases home HIV Databases home
HIV sequence database



Consensus and Ancestral Sequence Alignments
Current (Aug. 2004)

Purpose: To provide consensus and ancestral sequences of genetically associated subsets of HIV-1 sequences. The files include a consensus of each subtype, an M-group consensus-of-consensuses, and some ancestral sequences.

Details: Files are available in 4 formats, either as alignments or as gapless files. If you select more than 1 file, all the files will be concatenated into 1 big file, and you will have to split them out yourself. It is usually better to download 1 file at a time.

Pretty print files are not available for unaligned sequences. In the pretty print version, the alignments are broken into lines of 50 characters each, and the sequences are presented in an aligned style. Examples of the download formats can be seen on the Consensus Maker Explanation page.

 

Download format: FastA Mase Table Pretty print

Computer type: Unix Mac PC

 

Nucleotides Proteins
Region Aligned No gaps (unaligned) Aligned No gaps (unaligned)
GAG
POL
VIF
VPR
TAT
REV
VPU
ENV
NEF


What sequences are included

We provide consensuses for the M group subtypes A (including A, A1, and A2), B, C, D, F (including F1 and F2), and G; the circulating recombinant forms CRF01 and CRF02; and group O. We also provide a Consensus M-group, which is a consensus of consensus sequences for subtypes A, B, C, D, F, G, H. Ancestral sequences are also provided. Ancestral sequences are based on the Complete Genome M-group Ancestral sequence and its phylogenetic tree. For more details, see M-group Consensus Construction explanation file.

How the consensus alignments are made

The input alignments are the HIV Sequence Database Web Alignments. These sequences have undergone additional annotation after retrieval. Specifically, question marks in consensus sequences have been resolved, and glycosylation sites have been aligned. From the input, consensus sequences were built using our consensus website.

The consensus sequences were calculated according to the default values on the consensus website except that they were computed for all subtype groups having 3 or more (rather than 4 or more) sequences in the alignment. If a column in a subtype group contained equal numbers of two different letters we resolved that tie by looking at the same column throughout the M group and using the most common letter as the consensus. An upper case letter in a DNA consensus sequence indicates that the nucleotide is preserved unanimously in that position in all sequences used to make the consensus. In cases of nonunanimity the most common nucleotide is shown in lowercase. Regions spanned by multiple insertions and deletions are difficult to align; we attempt to anchor alignments in such regions on glycosylation sites, and to preserve the minimal elements which span such regions. Protein consensus sequences are always upper case letters indicating most common amino acid at that position.

How the ancestral sequences are derived

The ancestral tree and sequences were built as described in Ancestral Tree Construction explanation file.

Interpreting the format of consensus sequences

An upper case letter in a DNA consensus sequence indicates that the nucleotide is preserved in that position in all sequences used to make the consensus. A lower case letter is the most common nucleotide in a variable position. Ties are broken by evaluating the most common occurrence within the alignment of all sequences, including those external to the subtype of interest. If most positions in an alignment are dashes inserted to maintain the alignment, in the consensus sequence no amino acid is put in that position. Regions spanned by multiple insertions and deletions are difficult to align; we attempt to anchor alignments in such regions on glycosylation sites, and to preserve the minimal elements which span such regions.

PROTEIN sequences are always upper case letters.

The number of sequences used to make the consensus is indicated in parentheses following the subtype designation.

Reagent development

Sequences on this web page are suitable for reagent development because they do not contain question marks or ambiguous characters.

Archives of old consensus and ancestral alignments

Relevant links

HIV/SIV web alignments
Subtype reference alignments
Consensus Maker Tools allow you to build a consensus from your own alignment according to your preferences.
Consensus Maker Explanation page shows the output format options.
Ancestral Tree Construction explanation file.
M-group Consensus Construction explanation file.

last modified: Thu Oct 11 15:33 2007


Questions or comments? Contact us at seq-info@lanl.gov.

 
Operated by Los Alamos National Security, LLC, for the U.S. Department of Energy's National Nuclear Security Administration
Copyright © 2005-2012 LANS LLC All rights reserved | Disclaimer/Privacy

Dept of Health & Human Services Los Alamos National Institutes of Health