Purpose: To prepare HIV-1, HIV-2, or SIV sequence sets, together with related data, for submission to GenBank.
Before you start: (1) Verify that the sequences have no sample mix-ups, contaminants, or hypermutants; we strongly recommend using the Quality Control tool first. (2) Certain sequence names cause errors; see details below.
Required information
Sequence names
This tool uses GenBank's table2asn script. Certain sequence names cause errors in this script. Please avoid:
Additional annotation data
You will be prompted to enter annotation information from a comma delimited (.csv) file. We strongly recommend including, as a minimum, viral subtype, sample date, and sample country. To save your data from Excel, go to File → Save As → Comma Separated Values(.csv); do NOT select CSV UTF-8.
Each row in the comma delimited (.csv) file should correspond to a sequence in the Fasta file. The first column should contain the names of the sequences exactly as they appear in the Fasta file. Any differences in sequence names will lead to errors. The order of sequences need not match the Fasta file. Annotation data are associated with sequences by matching sequence names, rather than the order in the files.
Each column contains one field of sequence annotation data. For details about supported annotation and the requisite format, see the Annotation fields. This information will be stored in human-readable form in the comment field of the GenBank entry, making it available to researchers worldwide. Once the GenBank record is released, this information will be included in the Los Alamos Sequence Database, allowing the data to be searchable from our search interface.
See an example of Fasta sequences and their CSV annotation data.
Please note
Related Links: Quality Control QC/GenBank Tool Explanation Annotation fields help