HIV sequence database

Development
Entropy-Two | Entropy-One | Entropy Readme | Entropy Options | ||||||||

**Variability**

Variability is calculated as the entropy, or sum of P*ln(P) for each position. The difference in entropies between the two sets of sequences (background sequences and query sequences) is what Entropy-Two is looking for.

**Randomization**

In Entropy-Two, to test if the observed difference is statistically significant, the pooled input data at each position can by randomized with or without replacement. You can choose a limit, say 5 times out of 1000 randomizations, that you wish to have as a cut-off for your "conserved signature".

**Amino Acid Class Equivalents**

You have the option of using the straight amino acids for the calculations, or breaking them down by chemical similarity into the following groups:

In input sequence | In Entropy calculations | ||

D and E | a | ||

R and K | b | ||

I and V | i | ||

L and M | l | ||

F and W and Y | f | ||

N^{*} and Q | n | ||

S and T | s |

All other amino acids use their original representations, for example, "C" remains "c".

N^{*}: N-linked glycosylation sites are treated separately from the N and Q "n" grouping above, and are designated "g". For N-linked glycosylation site analysis, please use the N-Glycosite program.

**Characters '*' and '-' in Sequences**

These two characters are treated differently in our entropy calculation. The asterisk (*) symbol represents unknown or missing information, and is excluded from the entropy calculation. The dash (-) symbol represents insertion or deletion information, and is being considered in the calculation.

last modified: Tue Nov 27 16:59 2007