HIV Databases HIV Databases home HIV Databases home
HIV sequence database



HDdist ReadMe


About HDdist:


    This is the front end for the command-line version
of a program to calculate various measures of difference 
between the lanes in heteroduplex assays.  Input should
be in the same form as the HDdist sample input file.

    The program reads a file and stores all the scans of
lanes that it finds there.  Each scan should be recorded
as a list of (pixel number, intensity) pairs.  The program
normalizes these lists so the total intensities are 1.0, 
then regards the results as probability distributions and 
computes various measures of the difference between them:

    L_1 distance:   This is the default measure of
                    distance; it is the pixel-by-pixel 
                    sum of the absolute values of the 
                    differences in normalized intensity.  
                    This sum has a maximum value of 2.0,
                    so we report (L_1 distance / 2.0)
                    in order to get a result in the
                    range (0.0, 1.0).

    L_2 distance:   Euclidean distance between the
                    two distributions when each is
                    regarded as a unit vector. It
                    has a maximum value of sqrt(2.0).
                    Here again we divide the distance
                    by its maximum value in order to
                    report a result between 0 and 1.

    Cosine dist.    This measure is closely related
                    to the L_2 distance:
                
                    cos_dist = (L_2_dist^2)/ 2.0

                    
The output lines will be of the form:

year[[C]*] (n_pixels): [L1 diff] [L2 diff] [cosine diff]

where the * indicates the sample that corresponds to the
probe data and which distances get reported depends on
the command line options.  The default is to smooth with
a kernel whose width is 1.5% the length of the scan and
to report nothing; to see any output one *must* select at
least one difference measure with the -d command line
option.

See HDdist sample output.

****************************************************************

Acknowledgements:
    This program is the work of many people: Bette Korber,
David Wolpert, Avidan Neumann, Eric Delwart and Jim Mullins developed 
the original version applied in Delwart et al.;  James Theiler 
wrote opt-3.5, the package used to handle command line
arguments and Mark Muldoon wrote the code you are reading.
This website was created by Satish Pillai.

		


last modified: Tue Oct 9 16:41 2007


Questions or comments? Contact us at seq-info@lanl.gov.

 
Operated by Los Alamos National Security, LLC, for the U.S. Department of Energy's National Nuclear Security Administration
Copyright © 2005-2012 LANS LLC All rights reserved | Disclaimer/Privacy

Dept of Health & Human Services Los Alamos National Institutes of Health