HIV Databases HIV Databases home HIV Databases home
HIV sequence database

AnalyzeAlign Explanation


AnalyzeAlign will perform several analyses on any alignment.

User alignment

You may choose to provide your own alignment. Requirements:

LANL database alignment

You may choose to use premade alignments. See Alignments page for details on premade alignment types offered.

Position/range to analyze

You must indicate what part of the alignment to analyze.

Range numbers refer to...

If you are specifying only part of your alignment to be analyzed, indicate what your coordinate values refer to. These values can be based on:

Group results

The default is to analyze all sequences together as a single group. However, your sequences may contain groups that you want to analyze separately by species, clade, origin, etc.

Group sequences by: there are 3 ways to specify the groupings.

Combine logos into a page: allows you to download all your logo images in a single PDF file. Specify EITHER the length OR width of the display matrix, and its orientation. The other matrix number (length or width) will be determined by the script, based on the actual number of groups in the output. If you aren't sure how to use these options, just keep the default concatenation (1 x __) and orientation; this will always work.

Logo options

The WebLogos in this tool are based on a slightly modified Weblogo 3. For details about options, see WebLogo 3 User's Manual.

Logo size

Max number of logo stacks per line

"Stacks" refers to the vertical columns of the logo. Long logos may be broken by line breaks, and this option allows you to choose the number of positions per line.

Figure title

See illustration at right.

Figure label

See illustration at right.

X-axis label and stack annotations

By default, the columns are labeled by the position numbers you chose. You can override these numbers by providing a comma-separated list of characters to replace the default.

Y-axis units

Choose the units for the Y-axis.

Y-axis label

By default, the y-axis will be labeled based on your choice of y-axis units. You can override this label with your own.

Color schemes for WebLogo

The WebLogo software provides several predefined options for how to color the residues in your logos. Instead of using the standard options, you can choose colors:

Paste or upload custom color scheme.
Use this option to specify a highlight color for a few specific residues. In the image above, the C at position 8 and T/C at position 9 were colored red (#FF0000 in hexadecimal format, see below), by inserting this custom color scheme:

8 C
9 TC

22 A
23 C

23 G

You can define the colors using

See this for information about color names and RGB hex codes. Black is assigned to all unspecified symbols, and gray is assigned to all unspecified gaps.

Specify symbol colors.
Input custom colors for any specific symbols. Colors can be defined by clicking the color text box to display a color palette and choosing one or typing a RGB triplet as explained in the paragraph above.


Remove symbols from logo

This option allows you to change the appearance of the sequence logo image(s).

logo, show all

show all
If the default 'show all' is chosen, the logo is presented as usual, showing the abundance of each residue in the alignment. At right, the epitope RPNNNTRKSI was aligned to the LANL HIV-1 filtered web alignment.

logo, remove consensus

remove: consensus of alignment
If this option is chosen, the images omit the most common residue at each position of the alignment. At right, the epitope RPNNNTRKSI was aligned to the LANL HIV-1 filtered web alignment.

remove: consensus of seq group
This option is similar, but removes consensus of each sequence group, rather than the consensus of the whole alignment.

logo, remove residues of first seq

remove: residues of 1st sequence
If this option is chosen, the images omit the residue that occurs in the first sequence of the alignment. For the LANL database alignments, the first sequence is always the reference sequence. For user alignments, the first sequence might be a vaccine sequence, for example, being compared to the alignment it was derived from. At right, the epitope RPNNNTRKSI was aligned to the LANL HIV-1 filtered web alignment.

remove: residues of 1st seq of seq group
This option is similar, but removes the residues of the first sequence of each sequence group, rather than the first sequence of the whole alignment.

remove: user-specified
Remove residues corresponding to a user-specified sequence. Enter a single sequence in raw format (no sequence name). This sequence must have exactly the same number of residues as defined in "Positions/range to analyze".


Delete Gaps and Shift

If the "Delete gaps and shift" option is selected, then gaps placed to bring sequences into alignment will be squeezed out and the alignment shifted rightwards (toward the C-terminal end). For example, suppose your query has a one-amino acid insertion relative to most other sequences, then following alignment:

seq2   VAR-LHP
seq3   VAR-LFP
seq4   VAR-LMP

would be presented like this with gaps deleted:

seq2   QVARLHP
seq3   QVARLFP
seq4   QVARLMP

Q is the amino acid one position to the left of the V. As a result of squeezing gaps and shifting characters rightward, alignments in gappy regions will look "bad."

NOTE: The delete gaps option is useful for aligning immunologically reactive epitopes, because in such cases it is particularly important to maintain the alignment of the C-terminal anchor residues.


Mark potential N-linked glycosylation sites

When 'yes' is chosen, any asparagine (N) occurring in the pattern NxS or NxT (x = any amino acid except proline) will appear as "O". For more information about N-linked glycosylation, see N-GlycoSite.


Cut-off for calculating frequency by position

The tool will calculate the frequency of each nucleotide or amino acid at each position. The frequency table will show only the residues with the highest representation(s), as determined by a cutoff. If the cutoff is 100%, all residue frequencies will be shown. If the cutoff is 95%, the most frequent residues will be shown, up to a cumulative total of 95%, then all others will be presented as "other". Lumping together the infrequent residues can be a useful simplification, particularly for protein sequences.


Master sequence for finding variants

Variants are defined with one "master" sequence used as the basis of comparison. You can choose which sequence will be the master.

To choose a "user-selected" sequence, enter a single sequence in raw format (no sequence name). This sequence must have exactly the same number of residues as defined in "Positions/range to analyze".

This option does not affect the logo or the calculation of frequency by position.


Additional Resources


last modified: Tue Jun 19 10:46 2018

Questions or comments? Contact us at

Operated by Triad National Security, LLC for the U.S. Department of Energy's National Nuclear Security Administration
© Copyright Triad National Security, LLC. All Rights Reserved | Disclaimer/Privacy

Dept of Health & Human Services Los Alamos National Institutes of Health