HIV molecular immunology database
HLA Class-I Driven Evolution of Human Immunodeficiency Virus Type1 Subtype C: Immune Escape and Viral Load. PMID: 18434400
Christine M. Rousseau, Marcus G. Daniels, Jonathan M. Carlson, Carl Kadie, Hayley Crawford, Andrew Prendergast, Philippa Matthews, Rebecca Payne, Morgane Rolland, Dana N. Raugi, Brandon S. Maust, Gerald H. Learn, David C. Nickle, Hoosen Coovadia, Thumbi Ndung'u, Nicole Frahm, Christian Brander, Bruce D. Walker, Philip J. R. Goulder, Tanmoy Bhattacharya, David E. Heckerman, Bette T. Korber, James I. Mullins
HLA-driven viral evolution was assessed using 3 phylogenetic correction methods across full HIV-1 subtype C proteomes from a cohort of 261 South Africans. Amino acids conferring susceptibility or resistance to CTL were identified using single residue-HLA associations as well as 9-mer-HLA associations. Five hundred and fifty-eight CTL-susceptible and resistant HLA-amino acid associations were detected and organized into 310 immunological sets (groups of individual associations related to a single HLA/epitope combination). The epitope map of HIV-1 subtype C proteins showing significant (q < 0.2) individual HLA-amino acid associations and immunological sets are shown. The reference sequence was the consensus of the full-length HIV-1 sequences from South Africa. The individual associations were categorized as susceptible (the amino acid is enriched when the HLA is not present in the host) or resistant (the amino acid is enriched with the HLA is present in the host) to the HLA-mediated cellular immune response and are shown below the consensus sequence. All unique HLA/amino acid associations are shown. Blue amino acids reflect the susceptible form and red the resistant form. Shaded boxes indicate immunological sets.
The size of the immunological set was determined by both the distance between individual associations with related HLAs and validation data (epitopes optimally defined based on a set of stringent criteria and assembled in an annual review article, known as the "A-list", and the full database based on all of the epitopes that can be assembled from the experimental literature, known as the "B-list". These sets are meant to reflect the minimum the number of epitopes that would be required to explain the observed data, taking into account linkage disequilibrium. When linked HLA were associated by linkage disequilibrium with the same amino acid and site, the HLA with lowest p-value is shown, and moving the cursor over the HLA will reveal the additional associated HLAs that were in linkage disequilibrium. If related 2-digit and 4 digit HLAs (for example B18 and B1801) were both associated, again the displayed HLA is the one with the lowest p-value, and the related HLA can be identified by placing the cursor over the HLA. Occasionally, a contradiction is observed in that two HLAs in linkage disequilibrium are associated with the same amino acid, but in one case it is susceptible and in the other it is resistant; for self-consistency, a single amino acid cannot represent both forms if the statistical significance of the association is in one of the cases is an artifact of linkage disequilibrium. Thus, when this is found in the association data, we introduce a new event.
Only significant 9-mer regions with two or more associations are shown, and 9-mer overlap is taken into account. In other words, if more than one site was involved in the variation that makes the 9-mer association distinctive, it is shown. If there were conserved amino acids between the HLA-associated variant sites within the 9-mer, they are indicated by small grey letters. Single amino acid associations are always considered and kept in the immunological events.
An HLA in green indicates that there was an IFNγ-ELISpot reactivity pattern associated with that HLA that overlapped the associated amino acid change. In the validation area (above the consensus sequence), light blue shading over the HLA label indicate motifs, orange are B-list epitopes, red are A-list epitopes and dark blue are predicted epitopes. Similar to the individual associations, epitopes and motifs that were inferred to be susceptible or resistant are also labeled with blue or red letters, respectively. In the association area (below the consensus sequence), a gray background means the association came from linkage disequilibrium. Associations in sites with >90% gaps are not shown.
Immunological set identification occurs in three stages. A sliding window of 17 amino acids in width is dragged over the alignment. For a window to be considered, it must contain at least one association. Those windows that are considered are ranked based on the number of self-consistent associations, epitopes (predicted or real), and motifs found in that window. Rankings are from the reference point of a given `founding' HLA, and thus to make a ranking all alternative `founding' HLAs in a cluster must be considered. A window that has epitope or motif support for the association will always outrank a window that has none. The best ranked, i.e. most self-consistent, dense and validated cluster is chosen to be split into a set. Ideally a cluster will represent just one set, but if there are others in the window that do not fit with the founding HLA (by linkage disequilibrium or by 2 digit match), these are broken out into separate sets. Further, a set must be narrower than the cluster window—a set is limited to 12 amino acids in width, but overlap is considered an adequate constraint for inclusion. Once a cluster is split into sets, the information that went into that clusters is removed from consideration and a new set of windows is evaluated from the remaining data, along with new immunological sets split out of those windows. This process continues until there are no more associations and no more validation information. Note that it is entirely possible at this point for events to be comprised only of validation information.
The second stage is to take the sets we have identified and reduce the validation information down to a single preferred validation record. Here A list epitopes are preferred to B list epitopes which are preferred to predicted epitopes which are preferred to motifs. Only the highest ranked type of evidence is retained in the event at this stage of processing. This step narrows the scope of the event to the best available supporting evidence. Without this step there are often dozens of motifs matching to the left and right of an association; we argue this confuses the picture of what is actually happening.
The first and second stages described above are now iterated to refine where events are located and remove noise from the event space. This has two goals, one is to split or merge overlapping 9-mer association windows into different or the same set and the other, as mentioned above, is to clarify the picture of validation relating to these events. During the first and second iteration, the unused available validation information is carried along for opportunistic use in forming clusters. Once the second iteration is complete, a third similar `noise reduction' iteration is made, but this time the unused validation information is discarded. We now consider the clusters and the sets within them finalized.
When the first and second stage iterations are finished, a third post-processing step is performed using the full set of original validation data. The purpose of this step is to attach associations to validation information that was lost as a result of the locations chosen for the sliding window. The clusters evaluate trade-offs of self-consistency but this does not always serve to validate the secondary events contained in a cluster. Here we fill-in potentially relevant additional available validation information, using the same exclusive preferences for evidence (e.g. A list epitopes are better than B list epitopes, etc.).
Last modified: Wed Mar 25 13:48:01 MDT 2009