AnalyzeAlign will perform several analyses on any alignment.
You may choose to provide your own alignment. Requirements:
You may choose to use premade alignments. See Alignments page for details on premade alignment types offered.
You must indicate what part of the alignment to analyze.
If you are specifying only part of your alignment to be analyzed, indicate what your coordinate values refer to. These values can be based on:
The default is to analyze all sequences together as a single group. However, your sequences may contain groups that you want to analyze separately by species, clade, origin, etc.
Group sequences by: there are 3 ways to specify the groupings.
Example:Group name Foo: B.FR.83.HXB2_LAI_IIIB_BRU.K03455 A1.AU.03.PS1044_Day0.DQ676872 C.ZM.11.DEMC11ZM006.KF716467 D.CD.83.ELI.K03454 Group name Bar: O.BE.87.ANT70.L20587 N.CM.97.YBF106.AJ271370 N.FR.11.N1_FR_2011.JN572926 P.CM.06.U14788.HQ179987 P.FR.09.RBF168.GU111555
Combine logos into a page: allows you to download all your logo images in a single PDF file. Specify EITHER the length OR width of the display matrix, and its orientation. The other matrix number (length or width) will be determined by the script, based on the actual number of groups in the output. If you aren't sure how to use these options, just keep the default concatenation (1 x __) and orientation; this will always work.
"Stacks" refers to the vertical columns of the logo. Long logos may be broken by line breaks, and this option allows you to choose the number of positions per line.
See illustration at right.
See illustration at right.
By default, the columns are labeled by the position numbers you chose. You can override these numbers by providing a comma-separated list of characters to replace the default.
Choose the units for the Y-axis.
By default, the y-axis will be labeled based on your choice of y-axis units. You can override this label with your own.
The WebLogo software provides several predefined options for how to color the residues in your logos. Instead of using the standard options, you can choose colors:
Paste or upload custom color scheme.
Use this option to specify a highlight color for a few specific residues. In the image above, the C at position 8 and T/C at position 9 were colored red (#FF0000 in hexadecimal format, see below), by inserting this custom color scheme:
#FF0000: 8 C 9 TC Green: 22 A 23 C Magenta: 23 G
You can define the colors using
Specify symbol colors.
Input custom colors for any specific symbols. Colors can be defined by clicking the color text box to display a color palette and choosing one or typing a RGB triplet as explained in the paragraph above.
This option allows you to change the appearance of the sequence logo image(s).
If the default 'show all' is chosen, the logo is presented as usual, showing the abundance of each residue in the alignment. At right, the epitope RPNNNTRKSI was aligned to the LANL HIV-1 filtered web alignment.
remove: consensus of alignment
If this option is chosen, the images omit the most common residue at each position of the alignment. At right, the epitope RPNNNTRKSI was aligned to the LANL HIV-1 filtered web alignment.
remove: consensus of seq group
This option is similar, but removes consensus of each sequence group, rather than the consensus of the whole alignment.
remove: residues of 1st sequence
If this option is chosen, the images omit the residue that occurs in the first sequence of the alignment. For the LANL database alignments, the first sequence is always the reference sequence. For user alignments, the first sequence might be a vaccine sequence, for example, being compared to the alignment it was derived from. At right, the epitope RPNNNTRKSI was aligned to the LANL HIV-1 filtered web alignment.
remove: residues of 1st seq of seq group
This option is similar, but removes the residues of the first sequence of each sequence group, rather than the first sequence of the whole alignment.
Remove residues corresponding to a user-specified sequence. Enter a single sequence in raw format (no sequence name). This sequence must have exactly the same number of residues as defined in "Positions/range to analyze".
If the "Delete gaps and shift" option is selected, then gaps placed to bring sequences into alignment will be squeezed out and the alignment shifted rightwards (toward the C-terminal end). For example, suppose your query has a one-amino acid insertion relative to most other sequences, then following alignment:
QUERY VARELHP REF VAR-LHP seq2 VAR-LHP seq3 VAR-LFP seq4 VAR-LMP
would be presented like this with gaps deleted:
QUERY VARELHP REF QVARLHP seq2 QVARLHP seq3 QVARLFP seq4 QVARLMP
Q is the amino acid one position to the left of the V. As a result of squeezing gaps and shifting characters rightward, alignments in gappy regions will look "bad."
NOTE: The delete gaps option is useful for aligning immunologically reactive epitopes, because in such cases it is particularly important to maintain the alignment of the C-terminal anchor residues.
When 'yes' is chosen, any asparagine (N) occurring in the pattern NxS or NxT (x = any amino acid except proline) will appear as "O". For more information about N-linked glycosylation, see N-GlycoSite.
The tool will calculate the frequency of each nucleotide or amino acid at each position. The frequency table will show only the residues with the highest representation(s), as determined by a cutoff. If the cutoff is 100%, all residue frequencies will be shown. If the cutoff is 95%, the most frequent residues will be shown, up to a cumulative total of 95%, then all others will be presented as "other". Lumping together the infrequent residues can be a useful simplification, particularly for protein sequences.
Variants are defined with one "master" sequence used as the basis of comparison. You can choose which sequence will be the master.
To choose a "user-selected" sequence, enter a single sequence in raw format (no sequence name). This sequence must have exactly the same number of residues as defined in "Positions/range to analyze".
This option does not affect the logo or the calculation of frequency by position.