HIV Databases HIV Databases home HIV Databases home
HIV sequence database



Data Fields

The following are descriptions of the data fields that appear on the spreadsheet of data that are used to prepare sequences sets for GenBank deposit. If you have questions about any of these fields, please write to us using the e-mail address at the bottom of the page.

 

CD4 count:
The patient's CD4 count at the time a sample was taken.
Valid value: integer value only. Do not include values with < or > sign.

CD8 count:
The patient's CD8 count at the time a sample was taken.
Valid value: integer value only . Do not include values with < or > sign.

Coreceptor:
The experimentally-determined coreceptor for the viral isolate. Do not enter inferred usage based on sequence.
Valid values: text field.
Common values: CCR5, CXCR4, CCR3, or a list: "CCR2b CCR3 CC5" or "CCR5 CXCR4"

Culture Method:
Valid values: uncultured, primary, expanded, co-cultured.

Days from seroconversion:
An estimate of the number of days between seroconversion and sampling.
Valid Values: An integer representing the number of days from seroconversion. If the study follows patients over time, assume that the day of seroconversion was the midpoint between last negative sample and first positive sampling date. If the patient is sampled during acute infection, enter 0. If the patient is tested and positive prior to seroconversion, a negative number may be entered.

Days post-infection:
Same as Days from Seroconversion, but used if the study is estimating time from infection rather than time from seroconversion.
Valid Values: Same as for days from seroconversion, except that negative numbers are not accepted.

Description (notes):
Text field describing information related to the sample or the sequence, but not included in other data fields.
Values: Text string.
Valid characters: A-Z, a-z, 0-9, - , .

Drug naive:
Sequences that were sampled prior to the patient receiving any type of antiretroviral treatment.
Valid Values: yes or no

HLA type:
HLA data for the individual sampled.
Valid Values: Text string, 2 or 4 digit entries are acceptable
Example: A02 A34 B*4403 B*4403 Cw*0401 Cw*0701

Infection country:
The country where the individual was infected. This may be different than the sampling country.
Valid Values: 2-letter ISO codes only
Examples: UG for Uganda, BR for Brazil, ZA for South Africa, etc.

Infection date:
Estimated year the patient was infected.
Valid format: MM/DD/YYYY
Note that this is the American standard format, i.e., month-day-year. If you want to enter only a year, enter 01 for the month and day.
Examples: 06/01/1999 is June 1999 (no day) or 01/01/1999 is 1999 (no month or day).

Isolate name:
A name that describes the sample. This name is usually identical to the Sequence Name (see below).
Valid Values: Text string.
Valid characters A-Z, a-z, 0-9 , -
Example: 05UG102.124d would be an example of an informative name describing a Ugandan sample taken in 2005 from subject 102 at 124d post-seroconversion.

Molecule:
The molecule from which the sequence was obtained.
Valid Values: DNA, RNA, PCR, UNK

Organism:
Valid Values: HIV-1, HIV-2, SIV, SHIV. Default is HIV-1.

Patient age:
Age of patient in days at time of sampling.
Values: Integer (do not include > or < ). Be sure to convert to days.

Patient code:
Unique identifier of the patient from which the sample was taken. Do not use patient names or initials!
Valid Values: Text string. Valid characters A-Z, a-z, 0-9, _ , -

Patient comment:
Text field describing information relating to patient not included in other data fields
Values: Text string.
Valid characters: A-Z, a-z, 0-9, - , .
Example: Patient infected between 1998 and 2000. Linked transmission with patient 569.

Patient ethnicity:
Description: Study appropriate description of Ethnicity, usually self-described by patient.
Valid Values: Text string. Valid characters A-Z, a-z

Patient health status:
Description: HIV related health status of the individual when the sample was taken.
Valid Values: acute infection, asymptomatic, symptomatic, AIDS, deceased. Please convert other health status categorizations (e.g., CDC IIA) into these 4 categories.

Patient sex:
Sex of the patient.
Valid values: M or F

Patient progression:
Rate of disease progression of the patient.
Valid values: SP, P, RP, LTNP, EC (slow progressor, progressor, rapid progressor, long-term non-progressor, and elite controller)

Phenotype:
Description: Generally not in use anymore, replaced in current studies with co-receptor usage.
Valid values: NSI, SI (standing respectively for non-syncytium-inducing and synctium-inducing)

Project:
Unique identifier of the cohort the patient belongs to.
Valid Values: Text string. Valid characters A-Z, a-z, 0-9, _ , -
Example: Amsterdam Cohort of Homosexual Men

Risk factor:
Valid Values: a single 2-letter code only. If patient has >1 risk factor, leave blank.

SG - homosexual
SB - bisexual
SH - heterosexual
SW - sex worker
PH - haemophiliac
PB - Blood transfusion
PI - IV drug user
MB - Perinatal, mother -> baby
NO - Nosocomial
EX - Experimental
NR - not recorded (or unknown)
OT - other

Sampling city:
The city/province/region in which the sample was taken
Valid Values: Text string. Valid characters A-Z, a-z

Sampling country:
Country where the sequence was sampled. This may be different than the country of infection.
Valid Values: 2-letter ISO codes only
Examples: UG for Uganda, BR for Brazil, ZA for South Africa, etc.

Sampling date:
Date the sample was taken.
Valid format: MM/DD/YYYY
Note that this is the American standard format, i.e., month-day-year. If you want to enter only a year, enter 01 for the month and day.
Examples: 06/01/1999 is June 1999 (no day given). 01/01/1999 is 1999 (no month or day given).

Sequence name:
Required field! Must match exactly the name of the sequence in the FastA file.
Valid Values: Text string. Valid characters A-Z, a-z, 0-9, -

Sample tissue:
The tissue from which the sample was derived.
Valid Values: Text string. Use noun, not adjective (i.e., vagina, not vaginal)
Examples: plasma, PBMC, semen, CSF, brain, etc.

Subtype:
Sequence subtype will be automatically provided by the QC process. Subtypes entered here manually will override the subtypes provided by the QC tool.
Values: All valid subtypes and CRFs
Examples: B, F2, CRF_01, AC, 02B. Unknown subtypes or regions that have no identifiable subtype are classified as U.
For additional details, see How the HIV Database Classifies Sequence Subtypes.

Viral load:
The patient's viral load when the sample was taken.
Valid Values: Integer only. Do not include values with < or > sign.

 

last modified: Tue Nov 27 08:33 2007


Questions or comments? Contact us at seq-info@lanl.gov.

 
Operated by Los Alamos National Security, LLC, for the U.S. Department of Energy's National Nuclear Security Administration
Copyright © 2005-2006 LANSLLC All rights reserved | Disclaimer/Privacy

Dept of Health & Human Services Los Alamos National Institutes of Health