HIV Databases HIV Databases home HIV Databases home
HIV sequence database

Highlighter Explanation

Click here to Return to Highlighter tool page

Table of Contents


Data input


This analysis enables you to find the nucleotides/amino acids in a query sequence that do not match with those in a SINGLE master sequence. The nucleotides/amino acids that do not match with the master are assigned a color. The example below describes mismatches in a nucleotide sequence:

= T  = A  = C  = G

Consider the following sequences,


In the above example, the query sequence differs from the master in the positions 1, 2, 3 and 4. Hence these changes in the query are indicated by a "|" in a color depending on which nucleotide is present in the given location in the query.

Mark APOBEC signatures (Mismatches only; nucleotides only)

This option is only available for the Mismatches analysis, and its purpose is to visualize hypermutation. If selected, this option will add pink dots to highlight APOBEC signatures and G->A conversions in the sequences. For example:

The above master and query will produce the following plot. Pink filled circles denote APOBEC signatures, and open diamonds represent G->A conversions.
color bar
In order to identify APOBEC signatures, whenever the program encounters a G->A change, it looks at the nucleotides of the query in the next 2 successive positions. If the first successive position contains an A or a G, and the second successive position does not contain a C, the position is marked as an APOBEC signature. For example, in the above plot, the first position is marked APOBEC because, in position 1 of the query, there is a G->A change, in position 2, there is an A and in position 3, there is again an A (not a C). Similarly, position 3 is also marked as an APOBEC signature. If there is a G->A change in a position, but the next two positions do not qualify as APOBEC, then this position is marked as a G->A conversion. This is shown in position 5.

In a nutshell, all changes to A are marked light green, all changes from G->A have an open diamond in addition to the green bar, and all changes from G->A that are also APOBEC signatures, have a closed circle in addition to the green bar. To learn more about APOBEC signatures see Hypermutation Explanation.

Mark potential glycosylation sites (amino acid input only)

In a Misatches analysis with amino acid sequences, this option will mark glycosylation motifs. A pink dot will mark glycosylation motifs in the master sequence; a pink diamond will mark sites where the query has an additional glycolation motif; a blue diamond will mark sites where the query has lost a glycosylation motif, compared to the master.

Transitions and transversions

This analysis compares your NUCLEOTIDE query sequences with a single master sequence and highlights transitions and transversions.

= Transitions  = Transversions 

Consider the following sequences:

Result   | | |   | |

In the above example, all transitions (A<->T or C<->G) are marked in gold, and all the transversions are marked in pink.

For positions with IUPAC codes (such as the last position in the example above), the result you see for this position will vary depending on what option you select for handling IUPAC codes. If "Use codes to compare" is selected, and the position could be either a transition or transversion, it is marked as a transition, as above. See below for details on IUPAC handling.

Silent and non-silent mutations

This analysis enables you to compare query NUCLEOTIDE sequences with the master and highlight silent and non-silent mutations with the following colors.

= Silent  = Non-silent

Consider the following sequences,


The tool converts the nucleotide sequences to the corresponding amino acid sequences and highlights silent and non-silent mutations. In the above example, the second codon (ACT) in the master encodes threonine, and the corresponding codon in the query (ACC) also codes for the same, hence this is shown as a silent mutation. In contrast, the 3rd codon in the master (AAT) codes for amino acid asparagine, while the corresponding codon in the query (GTT) codes for valine, hence this is shown as a non-silent mutation.

This tool uses SNAP to calculate the statistics. It only compares the Master sequence with the other sequences and does not compare all pairs of sequences. Note that the SNAP module cannot handle IUPAC codes, so if your sequences have them, the SNAP statistics may be faulty. For a more detailed analysis of silent and non-silent mutations, please use SNAP.


This analysis enables you to identify the matching nucleotides/amino acids between a single or multiple masters and a query sequence. If the number of masters is entered as 2, the top 2 sequences in the file will be considered as master sequences.

Each of the masters is assigned a unique color and is matched to each of the query sequences. The nucleotide/amino acid matches in the query are highlighted in the color of the master that it matched.

Consider the following example:


In the above example, the G in position 2 of the query matches with master2 and is indicated by a green "|" in the respective position in the result. The query sequence also matches with the T of master1 in position 3 and this is indicated by the red "|". The query matches with both master1 and master2 in position 1 and hence this position is left uncolored as only unique matches are displayed in the result. With regards to positions 4 and 6, since there are no matches, this is treated as a polymorphic site and depending on the option chosen to mark for unique, this is either left uncolored or is colored black. For more info on marking black for unique, see below.

Mark potential glycosylation sites (Matches only; amino acids only)

In a Matches analysis with amino acid sequences, this option will mark motifs that match potential N-linked glycosylation sites in the master sequence(s). A pink dot will mark sites where the glycolation motif is unique to the query; a pink diamond will mark sites where the glycolation motif is shared by the query and at least one master.

Mark black for unique (Matches only)

If selected, this option will make a black tic mark for each position where the query is unique, i.e., does not match ANY of the masters.

Mark gray for a match to multiple masters (Matches only)

This option may be selected only when you have 3 or more masters. If selected, a gray tic mark will appear when the query matches 2 or more masters (but not if it matches ALL masters). For example, if you have 4 masters and this option is selected, the gray mark will appear when the query matches 2 or 3 of them.

Consider the following example:

In the above example, the last two positions in the query match two masters, and this option causes them to be marked with gray bars. (Without this option selected, they would be unmarked.) The first position matches ALL masters and is unmarked.

Change masters

This feature enables you to select the sequence(s) that will act as master.

By default, the top sequence will be taken as the master. When multiple masters are required (for Matches analysis), the top n sequences will be taken as masters, where n is the number you specify. To select a different master(s), click the box for "Change masters", and you will be given a list to select from. All master sequences must be the same length.

Multiple Masters

Highlighter allows the user to compare the alignment to multiple masters. When more than one master is selected, under Matches, Silent/non-Silent and Transitions/Transversion, the tool will mark any mutation that is not present in any of the selected masters.

For example, when using a single master, you will get the following result:


However, if there are two masters, the result will change as follows:


In this case the second position (A) in the query is no longer cnsidered a mutation because it is found in the second master. When looking at silent and non-silent mutations, positions where the amino acid translation is silent compared to one master and non-silent compared to a different master will be marked in blue. Under transitions and transversions, instead, when a mutation is not found in any of the masters, then it will be marked as a transition if it is a transition when compared to at least one of the masters, otherwise it's marked a transversion.

Ignore alphabet validation (amino acids only)

If unchecked, each sequence is evaluated to be nucleotide, amino acid, or indeterminate. If more than 2% of the characters are unambiguous amino acids [QEILFP], then the sequence is evaluated as protein. If more that 94% of the characters are ATGCUNRY, it is evaluated to be a nucleotide sequence. Otherwise, it is considered ambiguous.

Unchecking the box allows you to submit sequences that have dash marks to indicate identity. For example:

This input can be used only if 'Ignore alphabet validation' is checked.

Treat gaps as character

When a Matches analysis is done with the option "treat gaps as character", the gaps are treated as a "5th nucleotide" (or "21st amino acid"), and a gap in the query is matched with a gap in the master in the same position.

Consider the following example:


In the above example, when "treat gaps as character" is selected, the gap in the 1st position of the query is matched with the gap in the first position of master 1. However, the gap in the 6th position is not taken into account and is ignored because it matches with more than one master.

Mismatch, Transition & transversion, and Silent & non-silent analyses:
When any of the these analyses are run with "treat gaps as a character", a gap IN THE QUERY is highlighted if there is no gap in the master sequence at the same position. If the "treat gaps as character" option is not chosen, such a gap is ignored.

Handling of IUPAC codes

IUPAC codes may occur in your nucleotide sequences:

MA and C
RA and G
WA and T
SC and G
YC and T
KG and T
BC and G and T
HA and C and T
VA and C and G
DA and G and T
NA or C or G or T
?Any state or nothing


When the ignore option is selected, the tool skips over the IUPAC code and does not perform any comparison at that position.

Use codes to compare

Treat IUPAC codes as characters

In this case, the tool treats IUPAC codes as regular characters without using the nucleotides they stand for while comparing. For example, although R stands for A and G, while using this option it will match ONLY another R.

Mark as unknown

When this option is selected, all positions with an IUPAC in either the master or query will be marked with a black dot.

Sort sequences

By similarity

When this option is chosen, the sequences are sorted based on the number of tic marks (for mismatches, transition/transversion, and silent/non-silent), or by the number of matches (for matches analysis). The most similar sequence is placed at the top of the result graph, and the least similar at the bottom. When there are multiple masters, the sequences are sorted according to their similarity with ALL masters, not just to the first master.

Now seqs are sorted by # of tics when mismatches(including transition/transversion and silent/non silent) and # of matches when matches. So when there are multiple masters, the seqs are sorted regarding to their similarity with ALL masters, and not any more only to the first master, which the help says.

By tree

When this option is chosen, the sequences are sorted based on their evolutionary relationship. You may supply your own tree file, or the program will generate one using PAUP*.

Do not sort

When this option is chosen, the sequences in the result set will appear in the same order they were in the alignment file.

Options for coloring matches/mismatches (amino acids only)

1. Standard

Asp, Glu
Lys, Asn, Gln, Arg
Ile, Leu, Val
Phe, Trp, Tyr
Ala, Gly, Ser, Thr

2. Se-Al (default)

For information about the Se-Al software click here
Ala, Gly, Pro, Ser, Thr
His, Lys, Arg
Asp, Glu, Asn, Gln
Ile, Leu, Met, Val
Phe, Trp, Tyr

3. Se-Al (polar/non-polar)

For information about the Se-Al software click here
Ala, Phe, Ile, Leu, Met, Pro, Val, Trp
Cys, Gly, Asn, Gln, Ser, Thr, Tyr
Asp, Glu
His, Lys, Arg

3. BioEdit

For information about the BioEdit software click here
Ala Gly Pro Ser
Asp Glu Trp Tyr
His Lys Arg Ile
Leu Met Val Asn
Gln Thr Phe Cys
Gap Other

Compress mutations into one sequence

When producing a highlighter plot under Mismatches, the user is given the option to create a compressed sequence that summarizes all mutations found across the entire alignment. In the results page, click on the button that says "Compress mutations into one sequence." This will produce a highlighter plot with the master(s) sequence at the top and a single sequence with all the mutations found across the entire alignment (IUPAC codes are used when more than one mutation is found at the same position). Both the figure and the fasta alignment can be downloaded following the links at the bottom of the graph. This "compressed" alignment can be used as input to the Hypermut tool to test for overall APOBEC enrichment.

How to cite this tool

When referencing Highlighter in publications, please cite the tool name and the following reference:

Keele BF, Giorgi EE, Salazar-Gonzalez JF, Decker JM, Pham KT, Salazar MG, Sun C, Grayson T, Wang S, Li H, Wei X, Jiang C, Kirchherr JL, Gao F, Anderson JA, Ping LH, Swanstrom R, Tomaras GD, Blattner WA, Goepfert PA, Kilby JM, Saag MS, Delwart EL, Busch MP, Cohen MS, Montefiori DC, Haynes BF, Gaschen B, Athreya GS, Lee HY, Wood N, Seoighe C, Perelson AS, Bhattacharya T, Korber BT, Hahn BH, Shaw GM.
Proc Natl Acad Sci U S A. 2008 May 27;105(21):7552-7.
PMID: 18490657

last modified: Tue Aug 23 12:41 2016

Questions or comments? Contact us at

Operated by Los Alamos National Security, LLC, for the U.S. Department of Energy's National Nuclear Security Administration
Copyright © 2005-2012 LANS LLC All rights reserved | Disclaimer/Privacy

Dept of Health & Human Services Los Alamos National Institutes of Health