HIV Databases HIV Databases home HIV Databases home
HIV sequence database



ODprep and ODfit


----------------------------------------------------------------------------
Written by: Aaron Halpern, T10 MS K710, LANL, Los Alamos, NM 87545
----------------------------------------------------------------------------
January 1997 ----------------------------------------------------------------------------

These programs calculate antibody titers based on concentration and optical density data.

REQUIREMENTS:
For Mac version only: Requires a Power Macintosh with more than 16 MB of free RAM.
----------------------------------------------------------------------------
CREDITS:

"ODprep" was written by Aaron Halpern to simplify user interactions with "ODfit", and to establish various options for the fitting of ELIZA optical densities.
"ODfit" is the work of several people. Byron Goldstein and Ron Beattie wrote a very general program for doing non-linear least squares data fitting, which is the heart of the program. Avidan Neumann wrote a specific procedure which specifies that the fit should involve Sips/Hill functions, a fairly standard approach to fitting binding data, and developed his strategy and program options through discussions with John Moore. Rich Posner and colleagues adapted the Goldstein/Beattie code for the MacIntosh, and were kind enough to compile the MacIntosh version of "ODfit".
At least during the relevant periods, Halpern, Goldstein, Beattie and Neumann were members of the Theoretical Biology and Biophysics group (T-10), Theoretical Division, Los Alamos National Laboratory. Rich Posner was at the Department of Chemistry, Northern Arizona University.

This project was sponsored by the Correlates of HIV Immune Protection Project, an interagency agreement between the NIH and the DOE.

An application of the ODprep/ODfit package in HIV-1 research is described in a paper by R. Connor, et. al., J Virol 72: 1552-1576 (1998).
----------------------------------------------------------------------------
FTP site:
ODprep/ODfit README
ODprep/ODfit ftp site

----------------------------------------------------------------------------
DATA FORMAT:
----------------------------------------------------------------------------

Observations from a given sample must be in one file; observations from different samples must be in different files. The format of the files should be one line per concentration, with each line giving the concentration (inverse dilution) and the net OD, in that order, e.g.

concentrationnet OD
1.000000e-050.014000
3.333330e-050.006000
1.000000e-040.075000
3.333330e-040.273000
1.000000e-030.920000
3.333330e-031.117000

There shouldn't be any blank lines at the beginning or at the end of the file, although the last line should end with a carriage return.

In the directory "SampleFiles" there are two files called "in1" and "in2"; these are examples of data files. ----------------------------------------------------------------------------
LOCATIONS OF FILES
----------------------------------------------------------------------------

All of the files from samples run as a single experimental batch (generally, all of the samples from a single patient, or, all of the files for which certain parameters such as maximum OD should be the same) should be in a single folder or directory. Preferably, this folder should contain only files to be analyzed together, e.g., files for different patients should be in separate folders.

The input files should be in the same folder as the programs "ODprep" and "ODfit". One could either copy the programs into the data folder or copy the data files into the program folder. However, since the programs aren't clever about renaming output files from different runs, it is probably best to copy the programs into the data folder. After analyzing data in one folder, the copies of the executables can be moved out of that folder and into another for analysis of data from another patient.

An additional "control" file may be located in the same directory. See below for more.
File names must not contain spaces or slashes ("/")!!!
----------------------------------------------------------------------------
RUNNING THE PROGRAMS: ODprep and ODfit
----------------------------------------------------------------------------
SUMMARY:

1) Execute or double-click on ODprep.
2) Answer the questions that appear (type in your response and press enter, for each question).
3) When no new questions appear, hit the return key to exit.
4) Execute or double-click on ODfit
5) When a random number is asked for, type in a number between 0 and 1 and press return.
6) Examine the output files for the relevant estimates.

Running ODprep:

The program "ODprep" must be run first. This program may be run in two fashions, one fully interactive and one more automatic. To start the program, double-click on "ODprep". A window will open in which various questions about how the data is to be analyzed will appear. You will first be asked whether you wish to respond to questions via the keyboard (fully interactive) or through the contents of a file (automatic). You should type "k" (+ return) or "f" (+ return) at this point. Do not type the quotation marks; they are included here only to highlight the user's response. For the fully interactive version, you must respond to a series of questions. For each, type your response and hit return. For more discussion of the various options, see "DATA FITTING OPTIONS", below. For the automated version, you will be asked for the name of the file containing your responses.

In the SampleFiles directory is a file called "control"; which illustrates what an input file for the automatic mode should look like. "control" can be used to analyze the sample data files "in1" and "in2". Each line of the file should correspond to one answer, so if you keep track of your responses during an interactive session, you can create a file of these answers and use that in an automated session. And, of course, you may modify the file for use on other sets of data. However, be aware that the list of questions can change depending on your earlier answers, so changes to fitting options should probably be tried by hand before being automated.

Among other questions, you must type in the names of the files containing the data. It is important to keep track of the order in which you enter them (generally, it would make sense to order them in chronological order if these are longitudinal samples). If you type clearly unreasonable responses (e.g. you type a letter when being asked for a number), the program will ask you whether you want to correct your response or quit; if at some point, you want to abort the program, type in some garbage in response to the next question and then reply "y" when asked if you want to quit. When the program is complete, hit the return key once more to remove the ODprep window. The effect of ODprep is to create two sets of files; the first "set" is actually the single file "odfittp", and the second set is a series of files named "fitdata1", "fitdata2", etc.

"odfittp" contains instructions to the "ODfit" program; the file is an ascii file, so in principle you can edit it with your favorite text editor, but the format and the content are somewhat arcane, so you will probably want to leave it alone. "fitdata1" etc are simply copies of your data files, one file for each one you entered during "ODprep". (The reason for copying is that "ODfit" only looks for files with the right names.)

ODfit: Credit of various kinds goes to Byron Goldstein, Ron Beattie, Rich Posner, and Avidan Neumann, fits a Sips/Hill function to the data, with parameters corresponding to the maximum height of the function (maxod), the midpoint (mpt), and the steepness of the curve (nhill). The program requires the files "odfittp" and "fitdata1", etc. described above. These files can in principle be prepared by hand, but they may be prepared in a more automated fashion by the "ODprep" program. To run "ODfit", double-click on the icon or execute it from the command line. Progress will be reported in a window. At the beginning (after a delay for startup), you will be asked for a random number; type in any value between 0 and 1.0. This is not vital, but you should not always use the same value. If you get very strange results for an analysis, it is worth trying the analysis again with a different random number: sometimes the curve fitting does not converge on a stable answer. Once you type in your number, the analysis begins. The actual data fitting is quite fast; the simulations take somewhat longer. You will see various messages, possibly including some apparent error messages about problems with simulations, which you can generally ignore. When the program is done, the window disappears.

----------------------- DATA FITTING OPTIONS: -----------------------

The data for a given sample is fitted according to the following equation:

CONC^nhill
OD = maxod * --------------------------
CONC^nhill + (1/mpt)^nhill

where CONC is the concentration (inverse dilution, first column in data files), OD is the net optical density (experimental vs. control), and maxod, nhill and mpt are the three parameters for the fit.

The primary set of options has to deal with how the three parameters are determined. We will treat each separately:

maxod: This is an estimate of the maximum OD reading which would be observed under ideal conditions (i.e., no error in any measurements or methods).
Three methods for determining this value are offered by the program:
- specified by user
- maximum value observed in any of the data files
- value which gives the optimal fit
The user-specified option has two primary uses: exploring the effect of varying this parameter, or forcing the program to use the same value across sets of data which are analyzed at different times. The maximum-observed value may be justified when few of the samples being analyzed show signs of leveling off (saturating, reaching a plateau). However, it also may correspond to a value which is excessively high, due to experimental variability. The optimal-fit value allows the data-fitting program to determine the single value of maxod which results in the best overall fit to the set of samples which are being analyzed. If several (but not necessarily all) of the samples appear to approach saturation, this method should give good results; samples which don't show signs of leveling off will not greatly affect the estimate of maxod, and so are not a problem so long as some (minimally, one) do level off.

nhill: This is the sharpness of the saturation curve. For monovalent, monoclonal binding, the theoretically appropriate value is 1.0. Polyclonal monovalent sera should, due to the heterogeneity of binding constants, lead to a shallower curve, corresponding to a value less than 1.0. Four methods are offered by the program:
- specified by user
- monoclonal monovalent value of 1.0
- single value which gives best overall fit
- values which give best fit for each sample User-specification again is appropriate for exploring the effect of this parameter, or for forcing the same value across experiments. Value of 1.0, which could also be specified by the user, is a reasonable default lacking a detailed model. The single optimal-fit value allows the data to determine a common value for this parameter across samples from a single experiment. The corresponding hypothesis is that the samples should have similar variances of binding constants even though the average binding constant might differ from sample to sample. The separate optimal-fit values option estimates an optimal value for each of multiple samples, and does not require the homogeneity assumptions of the single optimal-fit option, but it does introduce an extra degree of freedom into the fits. The user should confirm that the resulting estimates are sensible (a sanity test might be to determine whether the estimated value for each sample is between 0.5 and 1.5).

mpt: mpt is the "mid point titer"; this value is estimated separately, from the data, for each sample.

Other fitting options:
Number of points to use.

Because of the problem of non-specific binding at high antigen levels, it is sometimes necessary to omit some of the data points for higher concentrations. The user is asked to select from among several options to decide which points to use:
- use all points in each file
- use constant number of points, specified by user, from each file
- test for decline in OD with increasing serum concentration
- use variable number of points, interactively specified by user.
Use all points in each file if you are confident that all data points are valid.
Specify a constant number of points if, e.g., the same concentrations were tested for all samples, and you decide not to use points from, e.g., the highest concentration. Note that if you say to use 5 points, it will be the first five points in each file which are used.
Test for decline if you prefer the program to omit points if and only if they show ODs lower than values seen for lower concentrations. If you choose this option, you will be asked for a fraction, between 0 and 1, of the maximum value for a given sample to use as the cutoff; that is, if you specify .9, any point which has an OD smaller than .9 times the maximum value observed for the sample AND has a higher serum concentration will be omitted.
Specify the number of points interactively if you prefer to examine the data in each file and decide
for yourself which points to include. However, beware of possibility that you are introducing bias into the analysis. Again, if you specify that 5 points should be used for a given sample, it will be the first 5 points that are used. To delete specific points believed to be problematic, remove them from your data file by hand.

Number of files.

You will be asked how many files are to be analyzed together. Type in the appropriate number.

Names of files.

You must type out the names of the files to be analyzed. N.B. File names must not contain spaces or slashes ("/")!

Initial values for mpt, maxod, and nhill: ODfit performs an iterative estimation of the parameters; it needs to be given a starting set of parameter values. These values should be reasonable initial guesses. You will be prompted for the values; in general, the suggested values should be adequate.

--------------------------- RESULTS: ----------------------------

The results include two files of interest, "output" and "simout". "output" contains the final estimates of the parameters (more on this below) and shows the differences between the observed data and the values predicted by the fitted curve. "simout" shows the results of bootstrapping simulations to give an indication of the likely errors of the estimates.


A sample copy of an "output" file is given below, with comments interspersed on lines beginning with ">".
----------------------------------------------------------------------------
SAMPLE output FILE:
----------------------------------------------------------------------------

info for data set # 1
------------------------
summary of input variables
problem has 2 parameters and 11 points

>This run was done on two samples, from which 5+6=11 points were included.

2 user defined constants read from tinp
maxod = 1.60000E+00
nhill = 1.00000E+00

> Two parameters were fixed, leaving only "mpt" as a free parameter for
> a given sample, but "mpt" was allowed to be different for the two samples,
> leaving two free parameters in total, named "mpt1" and "mpt2" for the
> first and second samples, respectively.

relative error in SSQ is at most TOL
var= ssq/npts= 4.3411E-03
sum of squares = 4.77523E-02 ssq/(m-n) = 5.30581E-03

> Overall goodness of fit may be assessed by ssq

k name of guess for final value
k-th param k-th param k-th param
1 mpt1 1.0000E+03 9.09385E+02
2 mpt2 1.0000E+03 6.22660E+02

> "mpt1" and "mpt2" were initialized with the value 1000; final estimates
> were 909.4 and 622.7 respectively. These are generally the values of interest.

> Note that the order of parameter numbering ("mpt1", "mpt2", etc) will
> correspond to the order in which you specified data files via "ODprep".

exp 1
iindep vardata valuecalculated valueresidual
11.00000E-051.40000E-021.44190E-02-4.19038E-04
23.33333E-056.00000E-034.70736E-02-4.10736E-02
31.00000E-047.50000E-021.33373E-01-5.83729E-02
43.33333E-042.73000E-013.72185E-01-9.91852E-02
51.00000E-039.20000E-017.62034E-011.57966E-01
63.33333E-031.11700E+001.20310E+00-8.61038E-02

> For the first sample ("experiment 1"), a comparison of the observed data
> and the value expected given the final fit.

exp 2
i indep var data value calculated value residual
1 1.00000E-05 1.40000E-02 9.90092E-03 4.09908E-03
2 3.33333E-05 2.60000E-02 3.25333E-02 -6.53328E-03
3 1.00000E-04 7.50000E-02 9.37860E-02 -1.87860E-02
4 3.33333E-04 2.73000E-01 2.75007E-01 -2.00664E-03
5 1.00000E-03 6.20000E-01 6.13965E-01 6.03510E-03

> For the second sample ("experiment 2"), a comparison of the observed data
> and the value expected given the final fit.


A sample copy of a "simout" file is given below, with comments interspersed on lines beginning with ">".
----------------------------------------------------------------------------
SAMPLE simout FILE:
----------------------------------------------------------------------------
user defined constants read from tinp
maxod = 1.60000E+00
nhill = 1.00000E+00

fit variance (ssq/npts) = 4.3411E-03

k name of guess for final value k-th param k-th param k-th param 1 mpt1 1.00000E+03 9.09385E+02 +- 1.01323E+02 2 mpt2 1.00000E+03 6.22660E+02 +- 9.82692E+01 > Summary of initial and fitted values of parameters; should match
> values in "output", except that a 68% confidence interval is
> indicated for the final (fitted) values, based on the results of
> the bootstrap simulations shown below.

S I M U L A T I O N S
There were 100 successful simulations
mpt1

average - 8.99374E+02
sigma - 1.01323E+02
orig. fit - 9.09385E+02
68% confid. lower - 8.00411E+02
upper - 1.00567E+03

mpt2
average - 5.84904E+02
sigma - 9.82692E+01
orig. fit - 6.22660E+02
68% confid. lower - 5.03413E+02
upper - 6.42328E+02


> Summary of results of simluations.

sim.# 1 variance (ssq/npts) = 5.1390E-03

> For each simulation, the variance of the fit is given.

sim.# 2 variance (ssq/npts) = 4.5249E-03
sim.# 3 variance (ssq/npts) = 3.3746E-03
sim.# 4 variance (ssq/npts) = 2.1457E-03
sim.# 5 variance (ssq/npts) = 4.5487E-03
sim.# 6 variance (ssq/npts) = 3.9214E-03
sim.# 7 variance (ssq/npts) = 4.5064E-03
sim.# 8 variance (ssq/npts) = 4.6758E-03
sim.# 9 variance (ssq/npts) = 2.2339E-03
sim.# 10 variance (ssq/npts) = 6.8980E-03
...
> some simulations cut out for brevity
...
sim.# 99 variance (ssq/npts) = 1.7464E-03
sim.#100 variance (ssq/npts) = 2.9140E-03 average = 3.6430E-03
sim. # final value mpt1 1 - 8.18434E+02

> For each simulation, the result of the fit to the first parameter is given.

2 - 9.92668E+02
3 - 8.48807E+02
4 - 8.00411E+02
5 - 9.66069E+02
6 - 1.02524E+03
7 - 7.77622E+02
8 - 7.85415E+02
9 - 8.65783E+02
10 - 9.50281E+02
...
> some simulations cut out for brevity
...
99 - 8.32195E+02
100 - 1.01862E+03
average - 8.99374E+02
sigma - 1.01323E+02
orig. fit - 9.09385E+02
68% confid. lower - 8.00411E+02
upper - 1.00567E+03

mpt2
1 -7.46605E+02

> For each simulation, the result of the fit to the second parameter is given.

2 -4.93670E+02
3 -5.99571E+02
4 -6.07142E+02
5 -6.09218E+02
6 -5.91164E+02
7 -4.54853E+02
8 -7.83728E+02
9 -4.47978E+02
10 -5.34751E+02
...
> some simulations cut out for brevity
...
99 -5.92447E+02
100 -5.70378E+02
average -5.84904E+02
sigma -9.82692E+01
orig. fit -6.22660E+02
68% confid. lower -5.03413E+02
upper -6.42328E+02

----------------------------------------------------------------------------
Notice:

Unless otherwise indicated, this information, consisting of source code, documentation, and executable programs, has been authored by an employee or employees of the University of California under LACC # ______ , operator of the Los Alamos National Laboratory under Contract No. W-7405-ENG-36 with the U.S. Department of Energy. The U.S. Government has rights to use, reproduce, and distribute this information. The public may copy and use this information without charge, make derivative works, distribute, and publicly display provided that this Notice and any statement of authorship are reproduced on all copies. However, the public may not incorporate this information in any commercial or proprietary product. Neither the Government nor the University makes any warranty, express or implied, or assumes any liability or responsibility for the use of this information.

----------------------------------------------------------------------------
last modified: Wed Oct 10 16:16 2007


Questions or comments? Contact us at seq-info@lanl.gov.

 
Operated by Los Alamos National Security, LLC, for the U.S. Department of Energy's National Nuclear Security Administration
Copyright © 2005-2006 LANSLLC All rights reserved | Disclaimer/Privacy

Dept of Health & Human Services Los Alamos National Institutes of Health