ODprep and ODfit
----------------------------------------------------------------------------
Written by: Aaron Halpern, T10 MS K710, LANL, Los Alamos, NM 87545
----------------------------------------------------------------------------
January 1997
----------------------------------------------------------------------------
These programs calculate antibody titers based on concentration and optical
density data.
REQUIREMENTS:
For Mac version only: Requires a Power Macintosh with more than 16 MB of free RAM.
----------------------------------------------------------------------------
CREDITS:
"ODprep" was written by Aaron Halpern to simplify user interactions with "ODfit",
and to establish various options for the fitting of ELIZA optical densities.
"ODfit" is the work of several people. Byron Goldstein and Ron Beattie wrote
a very general program for doing non-linear least squares data fitting,
which is the heart of the program.
Avidan Neumann wrote a specific procedure which specifies that the fit should involve Sips/Hill functions,
a fairly standard approach to fitting binding data, and developed his strategy and program options through
discussions with John Moore. Rich Posner and colleagues adapted the Goldstein/Beattie code for the MacIntosh,
and were kind enough to compile the MacIntosh version of "ODfit".
At least during the relevant periods, Halpern, Goldstein, Beattie and Neumann
were members of the Theoretical Biology and Biophysics group (T-10), Theoretical
Division, Los Alamos National Laboratory. Rich Posner was at the Department
of Chemistry, Northern Arizona University.
This project was sponsored by the Correlates of HIV Immune Protection Project,
an interagency agreement between the NIH and the DOE.
An application of the ODprep/ODfit package in HIV-1 research is described in a
paper
by R. Connor, et. al., J Virol 72: 1552-1576 (1998).
----------------------------------------------------------------------------
FTP site:
ODprep/ODfit README
ODprep/ODfit ftp site
----------------------------------------------------------------------------
DATA FORMAT:
----------------------------------------------------------------------------
Observations from a given sample must be in one file; observations from
different samples must be in different files. The format of the files should
be one line per concentration, with each line giving the concentration
(inverse dilution) and the net OD, in that order, e.g.
| concentration | net OD |
| 1.000000e-05 | 0.014000 |
| 3.333330e-05 | 0.006000 |
| 1.000000e-04 | 0.075000 |
| 3.333330e-04 | 0.273000 |
| 1.000000e-03 | 0.920000 |
| 3.333330e-03 | 1.117000 |
There shouldn't be any blank lines at the beginning or at the end of the
file, although the last line should end with a carriage return.
In the directory "SampleFiles" there are two files called "in1" and
"in2"; these are examples of data files.
----------------------------------------------------------------------------
LOCATIONS OF FILES
----------------------------------------------------------------------------
All of the files from samples run as a single experimental batch (generally,
all of the samples from a single patient, or, all of the files for which
certain parameters such as maximum OD should be the same) should be in
a single folder or directory. Preferably, this folder should contain only files to be
analyzed together, e.g., files for different patients should be in separate
folders.
The input files should be in the same folder as the programs "ODprep"
and "ODfit". One could either copy the programs into the data
folder or copy the data files into the program folder. However, since the
programs aren't clever about renaming output files from different runs,
it is probably best to copy the programs into the data folder. After analyzing
data in one folder, the copies of the executables can be moved out of that
folder and into another for analysis of data from another patient.
An additional "control" file may be located in the same directory.
See below for more.
File names must not contain spaces or slashes ("/")!!!
----------------------------------------------------------------------------
RUNNING THE PROGRAMS: ODprep and ODfit
----------------------------------------------------------------------------
SUMMARY:
1) Execute or double-click on ODprep.
2) Answer the questions that appear (type in your
response and press enter, for each question).
3) When no new questions appear, hit the return key to exit.
4) Execute or double-click on ODfit
5) When a random number is asked for, type in a number between
0 and 1 and press return.
6) Examine the output files for the relevant estimates.
Running ODprep:
The program "ODprep" must be run first. This program may be run
in two fashions, one fully interactive and one more automatic. To start
the program, double-click on "ODprep". A window will open in
which various questions about how the data is to be analyzed will appear.
You will first be asked whether you wish to respond to questions via the
keyboard (fully interactive) or through the contents of a file (automatic).
You should type "k" (+ return) or "f" (+ return) at
this point. Do not type the quotation marks; they are included here only
to highlight the user's response. For the fully interactive version, you
must respond to a series of questions. For each, type your response and
hit return. For more discussion of the various options, see "DATA
FITTING OPTIONS", below. For the automated version, you will be asked
for the name of the file containing your responses.
In the SampleFiles directory is a file called "control"
which illustrates what
an input file for the automatic mode should look like. "control"
can be used to analyze the sample data files
"in1" and "in2". Each line of the file should correspond
to one answer, so if you keep track of your responses during an interactive
session, you can create a file of these answers and use that in an automated
session. And, of course, you may modify the file for use on other sets
of data. However, be aware that the list of questions can change depending
on your earlier answers, so changes to fitting options should probably
be tried by hand before being automated.
Among other questions, you must
type in the names of the files containing the data. It is important to
keep track of the order in which you enter them (generally, it would make
sense to order them in chronological order if these are longitudinal samples).
If you type clearly unreasonable responses (e.g. you type a letter when
being asked for a number), the program will ask you whether you want to
correct your response or quit; if at some point, you want to abort the
program, type in some garbage in response to the next question and then
reply "y" when asked if you want to quit. When the program is
complete, hit the return key once more to remove the ODprep window. The
effect of ODprep is to create two sets of files; the first "set"
is actually the single file "odfittp", and the second set is
a series of files named "fitdata1", "fitdata2", etc.
"odfittp" contains instructions to the "ODfit" program;
the file is an ascii file, so in principle you can edit it with your favorite
text editor, but the format and the content are somewhat arcane, so you
will probably want to leave it alone. "fitdata1" etc are simply
copies of your data files, one file for each one you entered during "ODprep".
(The reason for copying is that "ODfit" only looks for files
with the right names.)
ODfit: Credit of various
kinds goes to Byron Goldstein, Ron Beattie, Rich Posner, and Avidan Neumann,
fits a Sips/Hill function to the data, with parameters corresponding to
the maximum height of the function (maxod), the midpoint (mpt), and the
steepness of the curve (nhill). The program requires the files "odfittp"
and "fitdata1", etc. described above. These files can in principle
be prepared by hand, but they may be prepared in a more automated fashion
by the "ODprep" program.
To run "ODfit", double-click
on the icon or execute it from the command line. Progress will be reported in a window.
At the beginning
(after a delay for startup), you will be asked for a random number; type
in any value between 0 and 1.0. This is not vital, but you should not always
use the same value. If you get very strange results for an analysis, it is worth trying the
analysis again with a different random number: sometimes
the curve fitting does not converge on a stable answer. Once you type in
your number, the analysis begins. The actual data fitting is quite fast;
the simulations take somewhat longer. You will see various messages, possibly
including some apparent error messages about problems with simulations,
which you can generally ignore. When the program is done, the window disappears.
----------------------- DATA FITTING OPTIONS: -----------------------
The data for a given sample is fitted according to the following equation:
CONC^nhill
OD = maxod * --------------------------
CONC^nhill + (1/mpt)^nhill
where CONC is the concentration (inverse dilution, first column in data
files), OD is the net optical density (experimental vs. control), and maxod,
nhill and mpt are the three parameters for the fit.
The primary set of
options has to deal with how the three parameters are determined. We will
treat each separately:
maxod: This is an estimate of the maximum OD reading
which would be observed under ideal conditions (i.e., no error in any measurements
or methods).
Three methods for determining this value are offered by the program:
- specified by user
- maximum value observed in any of the data files
- value which gives the optimal fit
The user-specified option has
two primary uses: exploring the effect of varying this parameter, or forcing
the program to use the same value across sets of data which are analyzed
at different times. The maximum-observed value may be justified when few
of the samples being analyzed show signs of leveling off (saturating, reaching
a plateau). However, it also may correspond to a value which is excessively
high, due to experimental variability. The optimal-fit value allows the
data-fitting program to determine the single value of maxod which results
in the best overall fit to the set of samples which are being analyzed.
If several (but not necessarily all) of the samples appear to approach
saturation, this method should give good results; samples which don't show
signs of leveling off will not greatly affect the estimate of maxod, and
so are not a problem so long as some (minimally, one) do level off.
nhill: This is the sharpness of the saturation curve. For monovalent, monoclonal
binding, the theoretically appropriate value is 1.0. Polyclonal monovalent
sera should, due to the heterogeneity of binding constants, lead to a shallower
curve, corresponding to a value less than 1.0. Four methods are offered
by the program:
- specified by user
- monoclonal monovalent value of 1.0
- single value which gives best overall fit
- values which give best fit
for each sample User-specification again is appropriate for exploring the
effect of this parameter, or for forcing the same value across experiments.
Value of 1.0, which could also be specified by the user, is a reasonable
default lacking a detailed model. The single optimal-fit value allows the
data to determine a common value for this parameter across samples from
a single experiment. The corresponding hypothesis is that the samples should
have similar variances of binding constants even though the average binding
constant might differ from sample to sample. The separate optimal-fit values
option estimates an optimal value for each of multiple samples, and does
not require the homogeneity assumptions of the single optimal-fit option,
but it does introduce an extra degree of freedom into the fits. The user
should confirm that the resulting estimates are sensible (a sanity test
might be to determine whether the estimated value for each sample is between
0.5 and 1.5).
mpt: mpt is the "mid point titer"; this value is
estimated separately, from the data, for each sample.
Other fitting options:
Number of points to use.
Because of the problem of non-specific binding
at high antigen levels, it is sometimes necessary to omit some of the data
points for higher concentrations. The user is asked to select from among
several options to decide which points to use:
- use all points in each file
- use constant number of points, specified by user, from each file
- test for decline in OD with increasing serum concentration
- use variable number of points, interactively specified by user.
Use all points in each file if you are confident that all data points are valid.
Specify a constant
number of points if, e.g., the same concentrations were tested for all
samples, and you decide not to use points from, e.g., the highest concentration.
Note that if you say to use 5 points, it will be the first five points
in each file which are used.
Test for decline if you prefer the program
to omit points if and only if they show ODs lower than values seen for
lower concentrations. If you choose this option, you will be asked for
a fraction, between 0 and 1, of the maximum value for a given sample to
use as the cutoff; that is, if you specify .9, any point which has an OD
smaller than .9 times the maximum value observed for the sample AND has
a higher serum concentration will be omitted.
Specify the number of points
interactively if you prefer to examine the data in each file and decide
for yourself which points to include. However, beware of possibility that
you are introducing bias into the analysis. Again, if you specify that
5 points should be used for a given sample, it will be the first 5 points
that are used. To delete specific points believed to be problematic, remove
them from your data file by hand.
Number of files.
You will be asked how many files are to be analyzed together. Type in the appropriate number.
Names of files.
You must type out the names of the files to be analyzed.
N.B. File names must not contain spaces or slashes ("/")!
Initial values for mpt, maxod, and nhill: ODfit performs an iterative estimation
of the parameters; it needs to be given a starting set of parameter values.
These values should be reasonable initial guesses. You will be prompted
for the values; in general, the suggested values should be adequate.
--------------------------- RESULTS: ----------------------------
The results include two files of interest, "output" and "simout".
"output" contains the final estimates of the parameters (more on this below) and
shows the differences between the observed data and the values predicted
by the fitted curve. "simout" shows the results of bootstrapping
simulations to give an indication of the likely errors of the estimates.
A sample copy of an "output" file is given below, with comments
interspersed on lines beginning with ">".
----------------------------------------------------------------------------
SAMPLE output FILE:
----------------------------------------------------------------------------
info for data set # 1
------------------------
summary of input variables
problem has 2 parameters and 11 points
>This run was done on two samples, from which 5+6=11 points were included.
2 user defined constants read from tinp
maxod = 1.60000E+00
nhill = 1.00000E+00
> Two parameters were fixed, leaving only "mpt" as a free parameter for
> a given sample, but "mpt" was allowed to be different for the two samples,
> leaving two free parameters in total, named "mpt1" and "mpt2" for the
> first and second samples, respectively.
relative error in SSQ is at most TOL
var= ssq/npts= 4.3411E-03
sum of squares = 4.77523E-02 ssq/(m-n) = 5.30581E-03
> Overall goodness of fit may be assessed by ssq
| k | name of | guess for | final value |
| k-th param | k-th param | k-th param |
| 1 | mpt1 | 1.0000E+03 | 9.09385E+02 |
| 2 | mpt2 | 1.0000E+03 | 6.22660E+02 |
> "mpt1" and "mpt2" were initialized with the value 1000; final estimates
> were 909.4 and 622.7 respectively. These are generally the values of interest.
> Note that the order of parameter numbering ("mpt1", "mpt2", etc) will
> correspond to the order in which you specified data files via "ODprep".
exp 1
| i | indep var | data value | calculated value | residual |
| 1 | 1.00000E-05 | 1.40000E-02 | 1.44190E-02 | -4.19038E-04 |
| 2 | 3.33333E-05 | 6.00000E-03 | 4.70736E-02 | -4.10736E-02 |
| 3 | 1.00000E-04 | 7.50000E-02 | 1.33373E-01 | -5.83729E-02 |
| 4 | 3.33333E-04 | 2.73000E-01 | 3.72185E-01 | -9.91852E-02 |
| 5 | 1.00000E-03 | 9.20000E-01 | 7.62034E-01 | 1.57966E-01 |
| 6 | 3.33333E-03 | 1.11700E+00 | 1.20310E+00 | -8.61038E-02 |
> For the first sample ("experiment 1"), a comparison of the observed data
> and the value expected given the final fit.
exp 2
| i | indep var | data value | calculated value | residual |
| 1 | 1.00000E-05 | 1.40000E-02 | 9.90092E-03 | 4.09908E-03 |
| 2 | 3.33333E-05 | 2.60000E-02 | 3.25333E-02 | -6.53328E-03 |
| 3 | 1.00000E-04 | 7.50000E-02 | 9.37860E-02 | -1.87860E-02 |
| 4 | 3.33333E-04 | 2.73000E-01 | 2.75007E-01 | -2.00664E-03 |
| 5 | 1.00000E-03 | 6.20000E-01 | 6.13965E-01 | 6.03510E-03 |
> For the second sample ("experiment 2"), a comparison of the observed data
> and the value expected given the final fit.
A sample copy of a "simout" file is given below, with comments interspersed on lines beginning with ">".
----------------------------------------------------------------------------
SAMPLE simout FILE:
----------------------------------------------------------------------------
user defined constants read from tinp
maxod = 1.60000E+00
nhill = 1.00000E+00
fit variance (ssq/npts) = 4.3411E-03
k name of guess for final value
k-th param k-th param k-th param
1 mpt1 1.00000E+03 9.09385E+02 +- 1.01323E+02
2 mpt2 1.00000E+03 6.22660E+02 +- 9.82692E+01
> Summary of initial and fitted values of parameters; should match
> values in "output", except that a 68% confidence interval is
> indicated for the final (fitted) values, based on the results of
> the bootstrap simulations shown below.
S I M U L A T I O N S
There were 100 successful simulations
mpt1
average - 8.99374E+02
sigma - 1.01323E+02
orig. fit - 9.09385E+02
68% confid. lower - 8.00411E+02
upper - 1.00567E+03
mpt2
average - 5.84904E+02
sigma - 9.82692E+01
orig. fit - 6.22660E+02
68% confid. lower - 5.03413E+02
upper - 6.42328E+02
> Summary of results of simluations.
sim.# 1 variance (ssq/npts) = 5.1390E-03
> For each simulation, the variance of the fit is given.
sim.# 2 variance (ssq/npts) = 4.5249E-03
sim.# 3 variance (ssq/npts) = 3.3746E-03
sim.# 4 variance (ssq/npts) = 2.1457E-03
sim.# 5 variance (ssq/npts) = 4.5487E-03
sim.# 6 variance (ssq/npts) = 3.9214E-03
sim.# 7 variance (ssq/npts) = 4.5064E-03
sim.# 8 variance (ssq/npts) = 4.6758E-03
sim.# 9 variance (ssq/npts) = 2.2339E-03
sim.# 10 variance (ssq/npts) = 6.8980E-03
...
> some simulations cut out for brevity
...
sim.# 99 variance (ssq/npts) = 1.7464E-03
sim.#100 variance (ssq/npts) = 2.9140E-03 average = 3.6430E-03
sim. # final value mpt1 1 - 8.18434E+02
> For each simulation, the result of the fit to the first parameter is given.
2 - 9.92668E+02
3 - 8.48807E+02
4 - 8.00411E+02
5 - 9.66069E+02
6 - 1.02524E+03
7 - 7.77622E+02
8 - 7.85415E+02
9 - 8.65783E+02
10 - 9.50281E+02
...
> some simulations cut out for brevity
...
99 - 8.32195E+02
100 - 1.01862E+03
average - 8.99374E+02
sigma - 1.01323E+02
orig. fit - 9.09385E+02
68% confid. lower - 8.00411E+02
upper - 1.00567E+03
mpt2
> For each simulation, the result of the fit to the second parameter is given.
| 2 - | 4.93670E+02 |
| 3 - | 5.99571E+02 |
| 4 - | 6.07142E+02 |
| 5 - | 6.09218E+02 |
| 6 - | 5.91164E+02 |
| 7 - | 4.54853E+02 |
| 8 - | 7.83728E+02 |
| 9 - | 4.47978E+02 |
| 10 - | 5.34751E+02 |
...
> some simulations cut out for brevity
...
| 99 - | 5.92447E+02 |
| 100 - | 5.70378E+02 |
| average - | 5.84904E+02 |
| sigma - | 9.82692E+01 |
| orig. fit - | 6.22660E+02 |
| 68% confid. lower - | 5.03413E+02 |
| upper - | 6.42328E+02 |
----------------------------------------------------------------------------
Notice:
Unless otherwise indicated, this information, consisting of source code,
documentation, and executable programs, has been authored by an employee
or employees of the University of California under LACC # ______ , operator
of the Los Alamos National Laboratory under Contract No. W-7405-ENG-36
with the U.S. Department of Energy. The U.S. Government has rights to use,
reproduce, and distribute this information. The public may copy and use
this information without charge, make derivative works, distribute, and
publicly display provided that this Notice and any statement of authorship
are reproduced on all copies. However, the public may not incorporate this
information in any commercial or proprietary product. Neither the Government
nor the University makes any warranty, express or implied, or assumes any
liability or responsibility for the use of this information.
----------------------------------------------------------------------------
last modified: Wed Oct 10 16:16 2007