HIV molecular immunology database


HIV Binding Motif Scanner Help


HLA binding motif scanner allows you to find HLA anchor residue motifs within protein sequences for specified HLA serotypes, genotypes or supertypes. The potential epitopes are included in the output. Two major motif libraries were used:

The motifs presented are linked to their sources, and you can choose which one to use for scanning the sequences. We also constantly search the literature for the new motifs, not listed in these two major sources. What we find is presented as an additional source. You also can use your own custom motif, which can be composed based on the information we present and on your own data.

The supermotifs and supertypes classification is taken from

Supermotifs indicate the residues defining supertype specificities. The supermotifs incorporate residues that are recognized by multiple alleles within the supertype.

This tool searches for anchor motifs only. If you want additional information on auxiliary amino acids, please look at the original motif libraries. However, you can still use our tool with the auxiliary amino acids if you compose your own custom motif using the information from these sources.

View or Download Data Dictionaries

View or download the HLA genotype/serotype dictionary.

View or download the HLA genotype/motif dictionary.

View or download the HLA supertype dictionary: Sette & Sidney 1999.

View or download the HLA supertype dictionary: Sidney et al. 2008.

Search Fields

Select the HLAs for which you want to find binding motifs. You may select as many HLAs as you like. The database will be searched for known motifs for these HLAs. HLAs may be specified by
genotype: the specified genotypes will be searched for motifs;
serotype: all genotypes with the specified serotypes will be searched for motifs; and
supertype: the specified supertypes will be searched for supermotifs.
Motif Source
Please select the sources from which you want to find motifs. If no motifs are found in your selected sources, then all sources will be searched.
Marsh2000 S. G. E. Marsh, P. Parham, and L. D. Barber. The HLA Factsbook. Academic Press, San Diego, 2000. URL:
SYFPEITHI The SYFPEITHI Database of MHC Ligands, Peptide Motifs and Epitope Prediction. Jan. 2003. URL:
Others All other motifs reported in the literature or on the WWW.
Supermotifs are from Sette1999 A. Sette and J. Sidney. Nine major HLA Class I supertypes account for the vast preponderance of HLA-A and -B polymorphism. Immunogenetics 50(3-4): 201-212, Nov 1999.
Motif Syntax
The anchor residues are shown in the square brackets. The preferred but not dominant amino acids in the anchor positions are shown in parentheses. For example, motif for A*2602 in the SYFPEITHY library is x-[VTILF]-x-x-x-x-x-x-[YF(ML)]. This means that second and C-terminal positions are anchor positions. The dominant amino acids at the second position are V, T, I, L, F and at the C-terminal anchor position the dominant amino acids are Y and F, while M and L are the preferred but not dominant. Note that as a default, unless you specify your own motif, we will search on all anchor position amino acids, both dominant and preferred but not dominant, so the information on which amino acids are less dominant is presented for your information only. However, if you want to search on the dominant amino acids only, you can compose your own motif using the information we present. Also, should you have any questions of how it was decided which amino acid is dominant and which is not, please address them to the authors who published these motifs.
Supermotif Syntax
Residues within brackets are additional residues also predicted to be tolerated by multiple alleles within the putative supertype.
Motif Length
Please select the lengths of the binding motifs you wish to use. Motifs are stored in the database with a length of 9 amino acids and the other lengths are computed on-the-fly by adding or removing amino acids before the C terminus. Lengths are adjusted only for motifs from Class I genotypes and supertype. Motifs from Class II HLAs and custom motifs are not adjusted.
Custom Motif
You may enter your own custom motif to be searched. Enter the anchor residues within square brackets [], and enter arbitrary residues with an x. You may optionally use a dash (-) to separate the residues. For example, x[LM]xxx[K]xx[V] or x-[LM]-x-x-x-[K]-x-x-[V].
Select from predefined HIV protein sequences or enter or upload your own. Sequences are stripped of gaps before processing. The predefined sequences are the 2002 Consensus and Ancestral Sequences for M and O Groups from the LANL HIV Sequence Database.

Sequence Formats

The sequences consist of the amino acid codes: ACDEFGHIKLMNPQRSTVWYBZX and the gap code -. All other characters are removed and ignored. Gaps are ignored unless the input sequences form an alignment. Two sequence formats are accepted, FASTA and Table, and examples of these formats are shown below. For more information about sequence formats, see Common Sequence Formats.



The result of the program is presented in several ways. First, the motifs corresponding to the input HLA type are presented. Then, you choose which motifs to scan against, choose motif length, load your sequences or choose predefined sequences, and scan these sequences for the respective motifs.

The final output is organized by search pattern---all motifs with identical search patterns are grouped together. The matching binding motifs are presented on the input sequences in two colors: C-terminal anchor amino acids are shown in magenta and anchor amino acids in the other positions are shown in cyan. If a given amino acid is matched by more than one motif, then it is highlighted as a C-terminal anchor amino acid if any of the motifs are matched at the C-terminal anchor. All anchor amino acids are shown in uppercase and non-anchors are lowercase. Following the sequences is a list of potential epitopes showing their positions in the input sequences.

You can also view and download the resulting sequences in the FASTA format where the anchor amino acids are presented in uppercase and all the remaining ones in lowercase. The potential epitopes can be also downloaded in CSV (comma-separated value) format which can be read into a spreadsheet. This output is convenient for further analysis.

Last modified: Thu Jun 9 09:04:55 MDT 2005
Questions or comments? Contact us at
Operated by Los Alamos National Security, LLC, for the U.S. Department of Energy's National Nuclear Security Administration
Copyright © 2006-2012 LANS LLC All rights reserved | Disclaimer/Privacy

Dept of Health & Human Services Los Alamos National Institutes of Health