HIV Databases HIV Databases home HIV Databases home
HIV sequence database

Data Fields

The following fields may be used to add HIV-specific data to sequences for GenBank deposit. If you have questions, please contact



Required field

Sequence name:
Must match EXACTLY the name of the sequence in the Fasta file!
Valid Values: Text string. Valid characters A-Z, a-z, 0-9, _, -
Example: 05UG102.124d would be an example of an informative name describing a Ugandan sample taken in 2005 from subject 102 at 124 days post-seroconversion.



Strongly recommended fields

Sample country:
Country where the sample was obtained. Note: this may be different than the infection country!
Valid Values: either 2-letter ISO codes or English country names as specified by GenBank.

Sample date:
Date the sample was collected.
Preferred format: DD-Mmm-YYYY
Examples: 22-Jun-2001, Jun-2001 (no day given), 2001 (no month or day given).

Alternative format: MM/DD/YYYY (USA standard format also accepted)
Examples: 6/22/2001, 01/01/2007

If sequences go through the QC process, subtype will be automatically provided by QC. Subtypes entered in the CSV file will override the subtypes provided by the QC tool.
Values: All valid subtypes and CRFs
Examples: B, F2, CRF_01, A1C, 02B. Unknown subtypes or regions that have no identifiable subtype are classified as U.
For additional details, see How the HIV Database Classifies Sequence Subtypes.



Optional fields

Amplification Strategy:
The method used to amplify the sequence.
Valid values: bulk, SGA, vector cloning, limiting dilution PCR.

Indicates sequence sampled prior to seroconversion, when the patient had detectable viral RNA, but not yet antibodies.
Valid values: yes, no.

CD4 count:
The patient's CD4 count at the time a sample was taken.

CD8 count:
The patient's CD8 count at the time a sample was taken.

The experimentally-determined coreceptor for the viral isolate. Do not enter inferred usage based on sequence analysis.
Common values: CCR5, CXCR4, CCR3, or a list such as "CCR2b CCR3 CCR5".

Culture Method:
Valid values: cultured, uncultured, primary, expanded.

Days from seroconversion:
An estimate of the number of days between seroconversion and sampling.
Valid Values: An integer representing the number of days from seroconversion. If the study follows patients over time, assume that the day of seroconversion was the midpoint between last negative sample and first positive sampling date. If the patient is sampled during acute infection, enter 0. If the sample was taken before seroconversion, enter a negative number (or the word "preseroconversion").

Days from last seronegative sample:
Valid Values: integer.

Days from onset of symptoms:
Valid Values: integer.

Days post-infection:
Same as Days from Seroconversion, but used if the study is estimating time from the date of infection rather than time from seroconversion.
Valid Values: Same as for days from seroconversion, except that negative numbers are not valid.

Drug naive:
Sequences that were sampled prior to the patient receiving any type of antiretroviral treatment.
Valid Values: yes or no

Fiebig stage:
The Fiebig staging system precisely denotes a patient's stage of infection. For definitions of Fiebig stages, see Search Help - Fiebig.
Valid values: 1, 2, 3, 4, 5, or 6. It is OK to enter "1 or 2" when the stage is not known exactly.

HLA type:
HLA data for the individual sampled.
Valid Values: Text string, 2 or 4 digit entries are accepted. Incomplete HLA types are accepted.
Example: A02 A34 B*4403 B*4403 Cw*0401 Cw*0701

The animal species from which the viral isolate was derived. For HIV-1, the host is Homo sapiens by default.

Infection city:
The city, province, region, or state where the patient was infected.
Valid Values: text field

Infection country:
The country where the individual was infected. This may be different than the sampling country.
Valid Values: either 2-letter ISO codes or English country names as specified by GenBank.

Infection date:
The year the patient was infected with the virus. If the infection year is uncertain, it should not be entered. In most cases, infection dates are not known with enough certainty to provide a specific date, so include just the year or month-year.
Preferred format: DD-Mmm-YYYY
Examples: Jun-2001 (known month and year), 2001 (year only).

Alternative format: MM/DD/YYYY (USA standard format also accepted)
Examples: 6/01/2001, 01/01/2001

Isolate name:
A name that describes the sample. If no isolate name is provided, the Sequence Name will appear as the isolate in the GenBank record.
Valid Values: Text string.
Valid characters A-Z, a-z, 0-9 , -

The nucleic acid source from which the sequence was obtained. For example, plasma-derived HIV sequences are sequences of mature virus, which is RNA. Genomic proviral sequences are DNA. Note: if plasma is given as the sample tissue, you will receive an error message if you specify DNA.
Valid Values: DNA, RNA

Molecule type:
In most cases, the molecule type for a natural viral sequence is "genomic".
Valid Values: genomic

Text field describing any information related to the sample or the sequence, but not included in other data fields.
Values: Text string.

Patient age:
Age of patient at time of sampling.
Values: Integer. Be sure to include units (days, months, years).

Patient code:
Unique identifier of the patient from which the sample was taken. Do not use patient names or initials.
Valid Values: Text string. Valid characters A-Z, a-z, 0-9, _ , -

Patient cohort:
The cohort or study group in which the patient was recruited.
Valid Values: Text string.

Patient comment:
Text field describing information relating to patient.
Values: Text string.
Valid characters: A-Z, a-z, 0-9, - , .
Example: Patient infected between 1998 and 2000. Linked transmission with patient 569.

Patient ethnicity:
Description: Study appropriate description of Ethnicity, usually self-described by patient.
Valid Values: Text string. Valid characters A-Z, a-z

Patient health status:
Description: HIV related health status of the individual when the sample was taken.
Valid Values: acute infection, asymptomatic, symptomatic, AIDS, deceased.

Patient sex:
Sex of the patient.
Valid values: M or F

Patient progression:
Rate of disease progression of the patient.
Valid values: SP, P, RP, LTNP, EC (slow progressor, progressor, rapid progressor, long-term non-progressor, and elite controller)

Description: The experimentally-determined phenotype of the virus.
Valid values: NSI, SI (non-syncytium-inducing and syncytium-inducing)

Risk factor:
Valid Values: any below. If patient has >1 risk factor, leave blank.

SG - homosexual
SB - bisexual
SM - male sex with male
SH - heterosexual
SW - sex worker
SU - sexual transmission, unspecified type
PH - hemophiliac
PB - Blood transfusion
PI - IV drug use
MB - Mother-baby
NO - Nosocomial
EX - Experimental
NR - not recorded (or unknown)
OT - other

Sample city:
The city/province/region in which the sample was taken
Valid Values: Text string. Valid characters A-Z, a-z

Sample timepoint:
Used to indicate the sample timepoint relative to any arbitrary starting point. Valid values: integers with units Example: 4 sequences from the same patient could have timepoints of 0 weeks, 2 weeks, 6 weeks, 12 weeks.

Sample tissue:
The tissue from which the sample was derived.
Valid Values: Text string. Use noun, not adjective (i.e., vagina, not vaginal)
Examples: plasma, PBMC, semen, CSF, brain, spleen, feces, etc.

The viral strain from which the sequence was derived. In some cases, this is the same as the isolate. In some cases, one isolate may be used to produce multiple strains in vitro.

Viral load:
The patient's viral load when the sample was taken.
Valid Values: Integer.


last modified: Thu Aug 21 11:00 2014

Questions or comments? Contact us at

Operated by Triad National Security, LLC for the U.S. Department of Energy's National Nuclear Security Administration
© Copyright Triad National Security, LLC. All Rights Reserved | Disclaimer/Privacy

Dept of Health & Human Services Los Alamos National Institutes of Health