HIV Databases HIV Databases home HIV Databases home
HIV sequence database

HIV Sequence Alignments

Alignment type
      Start:    End:    (Coordinates: HIV1-HXB2, HIV2-Mac239)
Year *

* Alignments are named for the year in which the sequences were published. For example, “2020” alignments contain sequences published through the end of 2020. The Compendia are published the year after the sequences were published, so the 2008 alignment appears in the 2009 compendium.

** User-defined ranges are clipped from a genome alignment. This option is available only when genome alignments are available .

Web Alignments

What sequences are included

These alignments are complete, meaning that they contain all sequences we have in the database, with a few exceptions:

How the sequences are aligned

These alignments were generated by an iterative process between automated alignment using HMMER and manual editing using MASE, BioEdit, Se-Al or AliView. Most gaps have been introduced in multiples of 3 bases to maintain open reading frames. Any alignment is a compromise between optimal alignment, readability, and an attempt to keep codons intact. In particular the 'Other SIV' sequences are difficult to align, so please consider these as a rough starting point for your analyses.

Further details

Codons containing IUPAC multistate characters involved in silent substitutions are translated to the correct amino acids; when this is not possible, they are translated to 'X'.

The protein alignments provided for each gene were constructed using both nucleotide and translated amino acid sequences. Because the translations are based on alignments, they may differ from a straight, non-aligned, translation. For instance, an aligned translation will include frameshift compensation.

For all genome and single-gene DNA alignments, we have tried to keep the reading frame intact. However, this doesn't always work out. Be cautious when translating the aligned nucleotide sequences.

Sequences that are known to be recombinants are usually labeled as such, even if they are not recombinant in the region under consideration.

Relevant links

Codes and Symbols in Sequence Alignments
How the HIV Database Classifies Sequence Subtypes

Filtered & Super Filtered Web Alignments

What sequences are included

The filtered alignments contain a subset of sequences from the web alignments. Typically 80-95% of the sequences in the corresponding Web Alignment are retained. The Filtered Alignments are cleaner, but contain less information. They are only available for HIV-1.

Excluded from the Filtered alignment:

Excluded from the Super Filtered alignments:

Subtype Reference Alignments

What sequences are included

For each subtype, 4 genomes were selected as being broadly representative of that subtype. A paper describing the criteria used in selecting the 2005 reference sequences is available online as an HTML or PDF file.

Circulating recombinant forms (CRFs) are include, with up to 4 genomes provided for each. CRFs can be excluded from the download using the subtype options provided.

The subtype, country, and year of isolation are given if they are defined in our database.

More information

Leitner, et al. 2005, a compendium review article describing the 2005 subtype reference set
Information about HIV and SIV subtype nomenclature
How the HIV Database Classifies Sequence Subtypes
Information about CRFs
Codes and Symbols in Sequence Alignments

Compendium alignments

The compendium alignments are a carefully-chosen subset of sequences from the web alignment that are re-aligned and printed in the HIV Sequence Compendium. Because they need to fit in a limited space, this set is limited to ~200 sequences for HIV-1 and ~100 sequences for HIV-2 and SIV. We try to include newer sequences in this set, in addition to subtype reference sequences.

Consensus/Ancestral Sequences

2021 Consensus sequences

The details of the 2021 consensus sequence update were described by Linchangco et al., Front. Microbiol., 31 January 2022.

The 2021 consensus sequences were built using the 2019 Filtered Web alignment, which contains all available HIV-1 sequences though the end of 2019.

2004 consensus alignments

The input alignments are the HIV Sequence Web Alignments. These sequences have undergone additional annotation after retrieval. Specifically, question marks in consensus sequences have been resolved, and glycosylation sites have been aligned. From the input, consensus sequences were built using our Consensus Maker site.

The consensus sequences were calculated according to the default values on the Consensus Maker tools except that they were computed for all subtypes having 3 or more (rather than 4 or more) sequences in the alignment. If a column in a subtype group contained equal numbers of two different letters, we resolved that tie by looking at the same column throughout the M group and using the most common letter as the consensus. An upper case letter in a DNA consensus sequence indicates that the nucleotide is preserved unanimously in that position in all sequences used to make the consensus. In cases of nonunanimity, the most common nucleotide is shown in lowercase. Regions spanned by multiple insertions and deletions are difficult to align; we attempt to anchor alignments in such regions on glycosylation sites, and to preserve the minimal elements that span such regions.

An upper case letter in a DNA consensus sequence indicates that the nucleotide is preserved in that position in all sequences used to make the consensus. A lower case letter is the most common nucleotide in a variable position. Protein sequences are always upper case letters.

The number of sequences used to make the consensus is indicated in parentheses following the subtype designation.

Ancestral sequences were included in the 2004 (and prior) consensus alignments. The ancestral tree and sequences were built as described in Ancestral Tree Construction explanation file.

More information

Consensus Maker Tools allow you to build a consensus from your own alignment according to your preferences.
Consensus Maker Explanation page shows the output format options.
Ancestral Tree Construction explanation file.
M-group Consensus Construction explanation file (for consensus alignments up through 2004)
Codes and Symbols in Sequence Alignments

RIP Alignment

This genome-length alignment serves as the Custom Background for the RIP web tool. This alignment has been assembled less systematically than the other alignments offered here, meaning that the sequences have been added in various years, as they have become available. We offer it because it has proven useful for a variety of purposes.

This alignment contains:

More information

Recombinant Identification Program (RIP)
Circulating Recombinant Forms (CRFs)

Neutralization Panel Sequence Alignments

The CATNAP database serves as a source for:

Downloads are available in two ways:


last modified: Fri Feb 11 11:15 2022

Questions or comments? Contact us at

Operated by Triad National Security, LLC for the U.S. Department of Energy's National Nuclear Security Administration
© Copyright Triad National Security, LLC. All Rights Reserved | Disclaimer/Privacy

Dept of Health & Human Services Los Alamos National Institutes of Health