HIV sequence database

Rooting Methods

Choose this option to calculate branch lengths between user-defined nodes. This function is equivalent to our former BranchLength tool.

Choose this option to find the least squares optimized midpoint to root trees when all samples are taken from a single time point.

The minimum sum of variance optimization finds the root that gives the most homogeneous (“clocklike”) rate in a tree with samples from at least two different time points.

If you choose this option without providing a time point text file, the output will show the distances and variances for all possible rooting points, displayed in a table. If you provide the optional time point data, the table will also include the evolutionary rate between chosen time points in the tree.

Input

The tree input file should be in standard Newick format.

Note: The beta version of TreeRate cannot accept Newick files that contain __internal node names__ or __branchlengths expressed as exponents__. In the future, we hope to modify TreeRate to automatically remove internal node names and interpret exponents.

Example of internal node names:Newick file without internal node names (accepted):

(((((B.US.00.5_:0.080568,B.US.90.5_:0.033786):0.000000,B.US.90.2_:0.077736):0.008048,

(B.US.00.7_:0.106131,B.US.00.2_:0.165907):0.025352):0.009527,B.US.00.1_:0.122455):0.023859,

B.US.00.8_:0.036894,B.US.00.6_:0.137676);Newick file with internal node names (not accepted):

(((((B.US.00.5_:0.080568,B.US.90.5_:0.033786)330:0.000000,B.US.90.2_:0.077736)100:0.008048,

(B.US.00.7_:0.106131,B.US.00.2_:0.165907)1000:0.025352)550:0.009527,B.US.00.1_:0.122455)330:0.023859,

B.US.00.8_:0.036894,B.US.00.6_:0.137676);

To be able to calculate an evolutionary rate between chosen time points in the tree, you must also load a file of time point data. This file is uploaded under Optional “Upload dates file”. The file should consist of two columns, the first being the taxa names (exactly as they appear in the tree file), and the second containing the time points. A single space should separate the two columns. The time points may be in any unit (years, months, days) and must be whole numbers or decimals. See timepoint example file.

Note that you can upload a tree file with data having more than two timepoints. However, the tool can only perform calculations for two time points at once. At the “group selection step”, described below, you will have the option to set aside any taxa coming from additional time points.

Group selection step

All taxa in the uploaded tree are visible in the Timepoint 1 window. Move at least two taxa into the Timepoint 2 window. Any taxa that are present in the tree, but are not being considered in the calculation, should be moved to Discard. See Figure 1, below.

Note that is possible at this step for you to sort the taxa into timepoints that differ from the timepoints specified in the optional “dates file”. Take care that your group selection and your text file are in agreement, or your tree rates will be erroneous.

Fig.1.We have a phylogenetic tree inferred from sequences sampled at three time points, 1981, 1990, and 2000. We wish to calculate the evolutionary rate between 1981 and 2000. All taxa belonging to sequences from 1981 are left in the Timepoint 1 window. All taxa belonging to sequences from 2000 are moved to the Timepoint 2 window. Taxa belonging to sequences from 1990 are moved to Discard.

Calculations

Based on the user input, the tool roots the input tree in all possible ways. For each rooting point, the tool calculates an average distance from the root to the Timepoint 1 taxa (x_{1})
and an average distance from the root to the Timepoint 2 taxa (x_{2}). The difference between the average distances from the Timepoint 2 taxa and the Timepoint 1 taxa (x_{2} - x_{1}) gives a Δd value for each rooting point. The tool then calculates the sum of variances for the taxa in Timepoint 1 and Timepoint 2 for each rooting point. The Δd from the rooting point that gives the lowest minimum sum of variances will give the best estimation of an evolutionary rate for the chosen time points in the tree.

**Fig.2.** Schematic figure of the Δd calculation.

For the calculations of evolutionary rate, the tool calculates an average time for the taxa in the Timepoint 1 group, and an average time for the taxa in the Timepoint 2 group. The difference between the average time from the Timepoint 2 taxa and the Timepoint 1 taxa gives a Δt value. The evolutionary rate for the chosen time period is calculated by dividing Δd by Δt, and is presented as substitutions per site per unit time (in whatever units were used in the dates input file). The evolutionary rate for every rooting point of the input tree is calculated, but the best estimated evolutionary rate value will be the one that is calculated by Δd with the lowest minimum sum of variances.

Output

The results for all possible rooting points are displayed in a table. The rooting point with the Δd that gives the minimum sum of variances between the the two taxa groups, and thus gives the best estimation of the evolutionary rate for the chosen time period, is marked in orange. It is important to note that the rooting point that gives the best estimation of the evolutionary rate is not necessarily the best root for the whole tree.

Fig.3.Sample output page. The root giving the best evolutionary rate estimation for time period between 1981 and 2000 is marked in orange. Click on image (and widen screen) to view at full size.

All the re-rooted trees can be viewed by clicking on the node of the desired tree in the output table. The table and all the trees can be downloaded.

If you wish to estimate an evolutionary rate for a different time period from the tree, click the back button, and it will take you back to the group selection step. Here, the taxa can be re-arranged between the Timepoint 1, Timepoint 2, and the Discard windows in the desired way, and a new calculation can be obtained. There is no need to re-import the dates and the tree files.

last modified: Thu May 17 10:05 2012