HIV Databases HIV Databases home HIV Databases home
HIV sequence database



N-GlycoSite

Purpose: Highlight and tally predicted N-linked glycosylation sites (Nx[ST] patterns, where x can be any amino acid).

Input
Paste your input here
[Sample Input]
or upload your file

Option
Exclude NP[ST] pattern
Group sequences Do not group
Summarize results by grouped sequences according to:
first character(s) in sequence names
the column in field of sequence names delimited by
paste or upload grouped sequence names (see example below)

Details:
During glycosylation, an oligosaccharide chain is attached to asparagine (N) occurring in the tripeptide sequence N-X-S or N-X-T, where X can be any amino acid except Pro. This sequence is called a glycosylation sequon. The N-GlycoSite tool marks and tallies the locations where this pattern occurs.

The likelihood of N-linked glycosylation of a particular site can be influenced by the context in which it is embedded, and could be expanded to a 4-amino acid NX[ST]Z pattern, where the amino acid in the X or Z position can be important determinants of glycosylation efficiency. For example, a proline in position X or Z strongly disfavors N-linked glycosylation.

O-linked glycosylation signals are more difficult predict, but one can estimate their positions using the NetPhos program at Center for Biological Sequence Analysis.

Input:
Input can be one amino acid sequence, or an alignment of amino acid sequences. If you just want to tally the number of N-glycosylation sites, the protein sequences do not need to be aligned. Standard sequence alignment formats are recognized.

Exclude NP[ST] pattern:
A second position proline (site pattern NP[ST]) is strongly disfavored for glycosylation. Thus the default option excludes these patterns. You may uncheck the box to include them.

Grouped Sequence Names:
If you are analyzing multiple sequences, you can choose how to group them in the analysis. If you are analyzing a single sequence, or you do not want to group your sequences, just ignore these options. Your sequences can be grouped by the first character in the sequence names, or by a set of characters delimiting the sequence names, or by providing a list of groups.

Each sequence must be on a separate line, and groups are separated by an empty line. The first item ending in ':' in a group will be taken as the group name, but this line is optional. If group names are omitted, names will be assigned as Group-1, Group-2, etc. Sequences that are not present in any group will be named 'Others' and colored gray. This is useful for highlighting some groups of sequences out of a target set.

The following can be pasted in as the "grouped sequence names" for testing with the Sample Input:

Non-recombinants:
A1.KE.93.Q23-17
B.FR.HXB2
C.BR.92.92BR025
D.UG.94.94UG1141
O.CM.-.ANT70
CPZ.CM.-.CAM3

Recombinants:
01_AE.CF.90.90CF11697
02_AG.CM.97.97CM-MP807

References:

  1. Zhang M et al., Glycobiology. 14(12):1229-46 (2004) -- please cite this reference if you use our tool in a publication.
  2. Marshall RD, Biochem Soc Symp. 40:17-26 (1974)
  3. Kasturi et al., Biochem J. 323 (Pt 2):415-9 (1997)
  4. Mellquist JL et al., Biochemistry. 37(19):6833-7 (1998)
last modified: Wed Jan 28 13:57 2015


Questions or comments? Contact us at seq-info@lanl.gov.

 
Operated by Los Alamos National Security, LLC, for the U.S. Department of Energy's National Nuclear Security Administration
Copyright © 2005-2012 LANS LLC All rights reserved | Disclaimer/Privacy

Dept of Health & Human Services Los Alamos National Institutes of Health