Hypermut 2.0 allows searching for mutations fitting a pattern you specify. The default pattern is designed to detect hypermutation by APOBEC3G or APOBEC3F. The default pattern will not detect excess substitutions caused by an excess of adenine in PCR pools, for example. The positions that match the upstream context pattern, followed by the specified mutation (relative to the reference sequence, assumed to be the first entered, and treated as ancestral) followed by the downstream context will be found. Likewise, matches to the Control pattern will be shown for comparison. The default settings allow detection of excess APOBEC-induced hypermutation relative to control (non-APOBEC) mutations. The context requirements can be enforced on the reference sequence, or on the mutated sequence (recommended, especially if the reference is distant) or both. Normally only the contexts should differ between the control pattern and the test pattern. Fisher's exact test is then used to detect any increase of mutation for the specified context compared to the control context.
Patterns you might want to try are below. (Note the Upstream field is always blank here; although APOBECs do not like an upstream C, CGs are rare enough that this does not eliminate very many sites.) The default pattern is designed to detect hypermutation by APOBEC3G or APOBEC3F.
For the p-values to be meaningful, the control pattern should be a mutation that would happen with the same rate constant as the tested mutation in the absence of hypermutation. This is roughly true when these patterns differ only in the context positions, and generally not true otherwise. The current program is not set up to compute valid p-values for the G→A/A→G ratio.
|Mutation ||Control Pattern|
| ↓ || ↓ || ↓ || ↓ || ↓ || ↓ || || |
|G→A||RD||G→A||YN|RC||Detect hypermutation.||The default setting.|
|G→A||GD||G→A||YN|RC||Detect mainly APOBEC3G hypermutation.||We have not validated this.|
|G→A||AD||G→A||YN|RC||Detect mainly APOBEC3F hypermutation.||We have not validated this.|
|G→A||A→G||Count G→A versus A→G as in original hypermut.||The p-values are not calculated for these counts because they are not meaningful. The tested mutation and control do not have the same change in nucleotide content.|
IUPAC codes are supported (e.g., R means G or A, etc.) and a vertical bar (“|”) means OR.
As in regular expressions, the symbol ‘|’ means ‘OR’.
Thus GGT|GAA matches GGT or GAA.
() can be used for grouping, i.e., you could also write G(GT|AA).
All of the IUPAC nucleotide codes are supported:
|R||[AG] (i.e, A or G)|
|B||[^A] (i.e, not A)|
For technical reasons, the upstream context pattern must always match a fixed number of nucleotides. For example, A|(TC) is not allowed as an upstream pattern because it could have length 1 or 2. The same requirement holds for the mutation pattern, which is normally just one character anyway, but fixed length patterns (of reasonable length) should work fine.