HIV Databases HIV Databases home HIV Databases home
HIV Sequence Database



GenBank Entry Generation

Make a Sequin file for HIV-1, HIV-2, or SIV sequences

Purpose: To prepare HIV-1, HIV-2, or SIV sequence sets, together with related data, for submission to GenBank.

Before you start: (1) Verify that the sequences have no sample mix-ups, contaminants, or hypermutants; we strongly recommend using the Quality Control tool first. (2) Certain sequence names cause errors; see details below.

Sequences
Paste your sequence set
[Sample Input]
Or upload sequence set

Features
Choose organism icon
Molecule icon
Molecule type icon
Host
Gene/CDS location Generate automatically No gene/CDS features
Additional annotation data
(*.csv, comma-delimited file)
Upload annotation data (example of csv annotation file)
     
No annotation available

Contact/Authors/Reference
Upload template file
if you don't have a template file, make one using GenBank Submission Template and upload it

Job info
Job title
Your email for job results

 


Details

Required information

  • HIV-1, HIV-2, or SIV nucleotide sequences in Fasta format.
  • author and manuscript information.

Sequence names

This tool uses GenBank's table2asn script. Certain sequence names cause errors in this script. Please avoid:

  • any sequence name that is entirely contained within another sequence name (e.g., seqname, seqname1)
  • any sequence name consisting solely of numbers (e.g., 12345)

Additional annotation data

You will be prompted to enter annotation information from a comma delimited (.csv) file. We strongly recommend including, as a minimum, viral subtype, sample date, and sample country. To save your data from Excel, go to File → Save As → Comma Separated Values(.csv); do NOT select CSV UTF-8.

Each row in the comma delimited (.csv) file should correspond to a sequence in the Fasta file. The first column should contain the names of the sequences exactly as they appear in the Fasta file. Any differences in sequence names will lead to errors. The order of sequences need not match the Fasta file. Annotation data are associated with sequences by matching sequence names, rather than the order in the files.

Each column contains one field of sequence annotation data. For details about supported annotation and the requisite format, see the Annotation fields. This information will be stored in human-readable form in the comment field of the GenBank entry, making it available to researchers worldwide. Once the GenBank record is released, this information will be included in the Los Alamos Sequence Database, allowing the data to be searchable from our search interface.

See an example of Fasta sequences and their CSV annotation data.

Please note

  • This tool does not deposit your sequences, it only prepares them for deposit. Your results e-mail will contain instructions for submission to GenBank.
  • After successful deposit to GenBank, your entries will automatically appear in the Los Alamos Sequence Database, typically a few weeks after release by GenBank.
  • This tool may not find the correct protein translations for some SIV sequences. It works well for SIVmac and SIVsmm, but less-well for some other SIVs.

Related Links:
Quality Control
QC/GenBank Tool Explanation
Annotation fields help

 

last modified: Fri Feb 16 13:40 2024


Questions or comments? Contact us at seq-info@lanl.gov.

 
Operated by Triad National Security, LLC for the U.S. Department of Energy's National Nuclear Security Administration
© Copyright Triad National Security, LLC. All Rights Reserved | Disclaimer/Privacy

Dept of Health & Human Services Los Alamos National Institutes of Health