HIV Databases HIV Databases home HIV Databases home
HIV sequence database



Viral Epidemiology Signature Pattern Analysis

This program can be used to quickly detect amino acids which characterize differences between two groups of sequences. It compares two groups of sequences and looks for a "signature" pattern, or the set of amino acids that is conserved among each set, but differing between the sets. It will pick out those distinguishing amino acids, and calculate their frequencies in each set. (Nucleotide alignments can also be used; however, in the following discussion amino acids are used as representative examples.)

ALIGNMENTS

Align all of your sequences and break down into two sets for comparison. The sequences should all be of the same length, so if some sequences are shorter than others, insert stars (*) in positions where no information was available. Positions with stars will be discounted from frequency calculations. Insertions made to maintain the alignment should be dashes (-); positions with dashes will be counted and included in the signature pattern analysis. Example:

alphabet-OK ABCDEFGHIJKLMNOPQRSTUVWXYZ
mutant-ABCs ZBCDEFGH-JKLMNOPQRSTVVWX**

In the above sequence alignment, the sequence names are alphabet-OK and mutant-ABCs. For the second sequence, no sequence information was available for the last two positions. The "I" in the first sequence was deleted in the second sequence. U has "mutated" to V, and A to Z. Hence the signature pattern for mutant-ABCs relative to alphabet-OK is:

signature   Z.......-...........V...**, or 3/24 characters.

The periods (.) in the above signature indicate that the two sequences agree in those positions. The Z, -, and V show where the sequences disagree defining a signature for the "mutant-ABCs" sequence. The denominator for the three amino acid signature is 24, not 26, because no sequence information was available for the last two positions.

The allowed characters for inclusion in an alignment are A-Z, -, and *; a-z can be used but will be treated as equivalent to uppercase letters, i.e., A = a. Any other character that is used will be treated as a star, and not counted in the signature pattern tally. Therefore, if you have a stop codon, and you label it as a dollar sign, it will be treated as if you have no information at that site. If, on the other hand, you label it with a Z, it will be included in the signature pattern analysis.

GETTING STARTED

  1. Select your sequences carefully and make sure that the reference sequences and background sequences are aligned and the same length as described above. You may either paste your sequences into the respective boxes or select the file from your computer with the browser button.
  2. Show amino acid frequencies?

    Checking the button will answer "yes". Not checking the button gives a short output, just the signatures and frequencies of signature amino acids among the query and background sets. A checked button gives a long output with signatures AND the number of every amino acid found in every position for both alignments. A "yes" answer to this question might be useful if you have positions in your sequence sets that are 50% one amino acid, 50% another.

  3. Choosing a threshold (between 0 and 1.0)

    Choose a specific threshold (0 to 1.0) or run the program with the default threshold set to 0. If you do not set a threshold, the majority signature will be used. If you want to only count the most conserved of the signature amino acids for this calculation, you can set a threshold for the minimum degree of conservation of signature amino acids in the query set.

    A 1.0 will require that the signature amino acid be included in every sequence in the query set to be considered. A 0.9 will require that the signature amino acid be included in 90% of the sequences in the query set to be considered. The default (0) will just use the majority consensus.

REFERENCES

  1. Ou C-Y, Cielsielski CA, Myers G, Bandea CI, Luo C-C, Korber BTM, Mullins JI, Shochetman G, Berkelman RL, Nikki Economou AN, Witte J, Furman LJ, Satten GA, MacInnes KA, Curran JW, and Jaffe HW: Molecular epidemiology of HIV transmission in a dental practice. Science, 1992 May; 256(5060):1165-71.

  2. B. Korber and G. Myers: Signature Pattern Analysis: A Method for Assessing Viral Sequence Relatedness. AIDS Research and Hum. Retroviruses, 1992 Sep; 8(9):1549-60.
last modified: Wed Aug 1 15:41 2012


Questions or comments? Contact us at seq-info@lanl.gov.

 
Operated by Los Alamos National Security, LLC, for the U.S. Department of Energy's National Nuclear Security Administration
Copyright © 2005-2012 LANS LLC All rights reserved | Disclaimer/Privacy

Dept of Health & Human Services Los Alamos National Institutes of Health