HIV Databases HIV Databases home HIV Databases home
HIV Sequence Database



Gap Strip/Squeeze Explanation

Background

Purpose
The tool works for both nucleotide and amino acid sequences and deletes columns that contain an "intolerable" number of gaps. You set the gap tolerance to any value between 0% and 100%. A value of 0% will cause columns to be deleted if they contain only a single gap, (called "gap stripping") while a value of 100% will delete only columns that are entirely gaps ("gap squeezing").

Functions
This tool can perform these 3 functions separately or in combination:

See examples of usage.

Input

Paste or upload your alignment file in the space provided. The program will automatically identify any standard format. Each sequence must have an associated name, so you cannot submit raw sequence files.

While the tool is used most often with nucleotide sequences, it will perform gap stripping and squeezing for amino acid sequences as well.

Options

Gap character(s)
Specify the gap character if it is not a dash (-). If you want to specify more than one gap character, enter the characters as a list with no breaks of any kind between the characters. You can also specify ordinary letters to be gaps. This is useful if, for example, you were interested in removing all columns containing IUPAC ambiguity codes from your alignment, thereby preserving only columns with ATGCU.

Gap tolerance
You can adjust the Tolerance value. Its default (100%) is set so only columns that are entirely gaps will be removed.

Show deleted gaps
If you select the "Show deleted gaps" box, your output will include the first sequence in your alignment with marks showing columns that were deleted in the stripped alignment that follows.

Preserve codons
If the sequences in your alignment are codon-aligned nucleotides, you can choose to remove columns in groups of three (columns that comprise a codon). This is done by checking the "Preserve codons" box, specifying the number of columns within each codon that can exceed the tolerance value and specifying the reading frame of your alignment.

Use tolerance for [1,2,3] positions in the codon
If the tolerance is calculated over 1 codon position, the codon will be deleted if 1 of its positions contains more gaps than the tolerance level, even if the other two positions are OK; if it is calculated over 3 codon positions, all 3 positions must go over the limit for the codon to be removed; otherwise it is preserved. Default is set to 1.

Return to Gap Strip/Squeeze Input Page.

last modified: Tue Mar 5 16:00 2013


Questions or comments? Contact us at seq-info@lanl.gov.