HIV Databases HIV Databases home HIV Databases home
HIV sequence database



ElimDupes

Duplicate Sequence Removal

Purpose: compare the sequences in an alignment and identify or eliminate duplicates or very similar sequences.

For details, see ElimDupes Explanation.

You have javascript turned off
Please note that some tool features, form validation in particular, may not work properly.
Input
Paste your sequences here icon
[Sample Input]
or upload your file
Yes, sequences are aligned icon UNCHECK box if your sequences aren't aligned (tool will be much slower)

Options
Remove extraneous characters from sequences icon Yes No
Make all letters uppercase icon Yes No
Consider subsequences as duplicates icon Yes No
Restore original sequences in output icon Yes No
Eliminate sequences more similar than % icon
Analyze input by groups icon enter number of leading digits
Create file of unique sequences with
_count added to sequence names icon
Yes No     Include rank in sequence names icon Yes No      

WARNING: If you choose to "Create file with _count added to sequence names" and any of your sequence names end in "_" followed by numbers (for example, "NC_123456"), the numbers will be treated as occurrence counts. For best results (if these numbers don't actually represent counts), add an extra character to such sequence names (e.g. "NC_123456x").

last modified: Wed Jan 21 11:03 2015


Questions or comments? Contact us at seq-info@lanl.gov.

 
Operated by Los Alamos National Security, LLC, for the U.S. Department of Energy's National Nuclear Security Administration
Copyright © 2005-2012 LANS LLC All rights reserved | Disclaimer/Privacy

Dept of Health & Human Services Los Alamos National Institutes of Health