HIV Databases HIV Databases home HIV Databases home
HIV sequence database



ElimDupes: Duplicate Sequence Removal

Purpose: Given an alignment or set of unaligned nucleotide or protein sequences, this tool compares the sequences and eliminates any duplicates, thus producing a set of unique sequences.

Details: By default, the program removes all non-letter characters from the sequences, converts all letters to uppercase, and considers as a "duplicate" any sequence that is a subsequence of a longer sequence (e.g., the sequence ATG is a duplicate of the sequence CATGCC). These three default behaviors can be modified by changing the first three options shown below. In the fourth option, you can choose to restore any gaps or non-uppercase characters that were present in the input. The final option gives a means of automatically analyzing your input sequences as a series of sequence groups. The results page summarizes the duplicate and unique sequence sets and allows you to view and download the resulting unique sequences file and the duplicate sequences file.

For more details, see ElimDupes Explanation.

Input
Paste your sequences here
[Sample Input]
or upload your file

Options
Remove extraneous characters from sequences yes no
Make all letters uppercase yes no
Consider subsequences as duplicates yes no
Restore original sequences in output yes no
To analyze input by groups enter number of leading digits
last modified: Mon Apr 14 11:13 2008


Questions or comments? Contact us at seq-info@lanl.gov.

 
Operated by Los Alamos National Security, LLC, for the U.S. Department of Energy's National Nuclear Security Administration
Copyright © 2005-2006 LANSLLC All rights reserved | Disclaimer/Privacy

Dept of Health & Human Services Los Alamos National Institutes of Health