HIV Databases HIV Databases home HIV Databases home
HIV sequence database

Contamination Examples

Contamination is a common problem, and can be a serious threat to the integrity of sequence data. The conclusions drawn from bad data will be misleading, and can cause fundamental misconceptions in the way we understand HIV biology. Published papers based on contaminated sequences that were missed both by the authors and in the review process have led to scattered erroneous reports regarding virtually all aspects of HIV biology that involve sequences: viral clearance, transmission patterns, rapid and slow progression, drug resistance, central nervous system tropism, immune escape, and variability in populations. Contamination can happen in anyone's laboratory, and is not a sign of sloppy work. It is a fact of life with HIV culture, PCR amplification, and sequencing.

We have selected a few datasets to illustrate the problems that can arise, and how they can be recognized. The sets are anonymous and unrecognizable; the purpose is to show real examples of contamination, not to cast blame on any particular person or group.

  Example Tree Alignment
  Example 1: a set of C1-C3 sequences containing LAI/HXB2 contamination and sample mix-ups (partial set, published). View tree (no alignment)
  Example 2: in vitro recombination of patient DNA with LAI/HXB2 contamination DNA, especially clear in the alignment (complete set of V3 sequences, published). View tree View alignment
  Example 3: This set was generated to study CTL epitope variation, and consists of partially overlapping sequence fragments of variable length. Phylogenetic analysis was impossible, but a BLAST search and an alignment with the most similar GenBank sequence showed extensive contamination with pNL43. (no tree) View alignment

Selected references on contamination and its consequences:

Maintaining the integrity of human immunodeficiency virus sequence databases. Learn GH Jr, Korber BT, Foley B, Hahn BH, Wolinsky SM, Mullins JI. J Virol 1996 Aug;70(8):5720-5730.

Protecting HIV databases. Korber BT, Learn G, Mullins JI, Hahn BH, Wolinsky S. Nature 1995 Nov 16;378(6554):242-244.

Genetic Evaluation of Suspected Cases of Transient HIV-1 Infection of Infants. Frenkel LM, Mullins JI, Learn GH et al. Science 1998 May 15;280(5366):1073-1077.

HIV clearance in an infant? McClure MO, Bieniasz PD, Weber JN, Tedder RS, O'Shea S, Banatvala JE, Tudor-Williams G, Simmonds P, Holmes EC. Nature 1995 Jun 22;375(6533):637-638.

World-wide Evaluation of DNA Sequencing Approaches for the Identification Drug Resistance Mutations in the HIV-1 Reverse Transcriptase R. Schuurman, L. Demeter, P. Reichelderfer, J. Tijnagel, T. de Groot, C. Boucher on behalf of the ENVA laboratories, the Sequencing Working Group and participating laboratories. Proceedings of the 5th annual Conference on retroviruses, Abstract # 532.

Back to Quality Control main page

last modified: Tue Apr 22 11:38 2008

Questions or comments? Contact us at

Operated by Triad National Security, LLC for the U.S. Department of Energy's National Nuclear Security Administration
© Copyright Triad National Security, LLC. All Rights Reserved | Disclaimer/Privacy

Dept of Health & Human Services Los Alamos National Institutes of Health