Purpose: To provide consensus and ancestral sequences of genetically associated subsets of HIV-1 sequences. The files include a consensus of each subtype, an M-group consensus-of-consensuses, and some ancestral sequences.
Details: Files are available in 4 formats, either as alignments or as gapless files. If you select more than 1 file, all the files will be concatenated into 1 big file, and you will have to split them out yourself. It is usually better to download 1 file at a time.
Pretty print files are not available for unaligned sequences. In the pretty print version, the alignments are broken into lines of 50 characters each, and the sequences are presented in an aligned style. Examples of the download formats can be seen on the Consensus Maker Explanation page.
Additional information about consensus and ancestral alignments
We provide consensuses for the M group subtypes A (including A, A1, and A2), B, C, D, F (including F1 and F2), and G; the circulating recombinant forms CRF01 and CRF02; and group O. We also provide a Consensus M-group, which is a consensus of consensus sequences for subtypes A, B, C, D, F, G, H. Ancestral sequences are also provided. Ancestral sequences are based on the Complete Genome M-group Ancestral sequence and its phylogenetic tree. For more details, see M-group Consensus Construction explanation file.
The input alignments are the HIV Sequence Database Web Alignments. These sequences have undergone additional annotation after retrieval. Specifically, question marks in consensus sequences have been resolved, and glycosylation sites have been aligned. From the input, consensus sequences were built using our consensus website.
The consensus sequences were calculated according to the default values on the consensus website except that they were computed for all subtype groups having 3 or more (rather than 4 or more) sequences in the alignment. If a column in a subtype group contained equal numbers of two different letters we resolved that tie by looking at the same column throughout the M group and using the most common letter as the consensus. An upper case letter in a DNA consensus sequence indicates that the nucleotide is preserved unanimously in that position in all sequences used to make the consensus. In cases of nonunanimity the most common nucleotide is shown in lowercase. Regions spanned by multiple insertions and deletions are difficult to align; we attempt to anchor alignments in such regions on glycosylation sites, and to preserve the minimal elements which span such regions. Protein consensus sequences are always upper case letters indicating most common amino acid at that position.
The ancestral tree and sequences were built as described in Ancestral Tree Construction explanation file.
An upper case letter in a DNA consensus sequence indicates that the nucleotide is preserved in that position in all sequences used to make the consensus. A lower case letter is the most common nucleotide in a variable position. Ties are broken by evaluating the most common occurrence within the alignment of all sequences, including those external to the subtype of interest. If most positions in an alignment are dashes inserted to maintain the alignment, in the consensus sequence no amino acid is put in that position. Regions spanned by multiple insertions and deletions are difficult to align; we attempt to anchor alignments in such regions on glycosylation sites, and to preserve the minimal elements which span such regions.
PROTEIN sequences are always upper case letters.
The number of sequences used to make the consensus is indicated in parentheses following the subtype designation.
Sequences on this web page are suitable for reagent development because they do not contain question marks or ambiguous characters.
HIV/SIV web alignments
Subtype reference alignments
Consensus Maker Tools allow you to build a consensus from your own alignment according to your preferences.
Consensus Maker Explanation page shows the output format options.
Ancestral Tree Construction explanation file.
M-group Consensus Construction explanation file.