When using RIP in a publication, please cite:
A computer program designed to screen rapidly for HIV type 1 intersubtype recombinant sequences.
Siepel AC, Halpern AL, Macken C, Korber BT.
AIDS Res Hum Retroviruses. 1995 Nov;11(11):1413-6.
Link to: PubMed or text online
Background alignment options:
The RIP default background consists of a single representative sequence for subtypes A1, A2, B, C, D, F1, F2, G, H, J, and K. Optionally, you may check the box to include CRF01_AE. Most are consensus sequences, some are single reference sequences of the subtype. The consensus sequences were created using Consensus Maker; details available on request.
The RIP custom background contains consensus and subtype reference sequences for HIV-1, including M-group subtypes; CRFs; and groups N, O, and P.
Choose as few sequences as possible to get cleaner results. RIP cannot include more than 26 sequences, as it has only 26 colors for the plot.
The RIP Custom Background alignment is available for download on the Alignments page. It is updated periodically with new CRFs and updated censensus sequences.
The Use your own alignment option allows you to submit your own background alignment. The background sequences must be aligned to each other, but do not need to be aligned with the query sequence. The query must not be included in the background file. Using this option, RIP can be used to analyze non-HIV sequences. The total number of background sequences is limited to 26.
Download: HIV-2 reference sequences. This file of reference sequences can be used as a background set for RIP to examine HIV-2 recombination: HIV-2 reference set (Fasta format).
RIP aligns your query sequence to the background using the program "align0". This usually works well, but if your RIP output looks unusual, check the alignment for errors.
You can download the alignment of your query with the background, fine tune it, and then resubmit this alignment to RIP.
The RIP "window" is moved in increments of one residue from left to right in the alignment, and a Hamming distance (p-distance) is calculated for each window.
Choice of window size is important, as it will affect the sensitivity of detection of recombinants!
The user has 4 options for handling gaps in the alignment:
Examples of Gap Handling Options
Position 1234567890123456789012 Query AATCGTAAA---TGGCATAGTA Ref 1 AATCTTAAA---TGAAACGATA Ref 2 AAA---ATTACCTGGCATAGTA Window1 --- Window2 --- Window3 --- Window4 --- Window10 ---
With a window size of 3 nucleotides, the first point in the plot will be in position 2. All three gap/window handling options would give the same result, i.e., the query will be a perfect match to Ref 1 and distance = 1/3 away from Ref 2 (one mismatch out of three positions compared).
With options 1 and 2, window 2 will compare the query sequence ATC with Ref 1's ATC and Ref 2's AA-, and plot the corresponding similarity values in position 3 of the graph (Similarity = Match Fraction = 1 - distance). Here, gaps are treated as a 5th nucleotide character. Hence, window 2 will have distance = 0 to Ref 1 and distance = 2/3 to Ref 2. Similarly, windows 3 and 4 fill have values plotted in positions 4 and 5.
For window 10, option 1 will continue to plot the similarity value (perfect match to Ref 1), while option 2 would leave a blank in the graph to indicate that there is a gap in the query sequence. Also windows 9 and 11 would be blank with option 2.
Options 3 and 4 gapstrip the above alignment. The resulting alignment looks like this:
Query AATAAATGGCATAGTA Ref 1 AATAAATGAAACGATA Ref 2 AAAATTTGGCATAGTA
In option 3, windows scan the remaining alignment and plot similarity values throughout.
With option 4, the regions that were stripped out will be reinserted with blanks in the graph. The resulting alignment would look like this:
Query AAT---AAA---TGGCATAGTA Ref 1 AAT---AAA---TGAAACGATA Ref 2 AAA---ATT---TGGCATAGTA Window1 --- Window2 -- - Window3 - -- Window4 ---
Note that windows that include gaps will not use gaps as information; instead the next residue after the gap will be used (which is the next residue in the gapstripped alignment).
The scoring matrix for partial matches looks like this:
A C G T M R W S Y K B D H V N A 1 - - - .50 .50 .50 - - - - .33 .33 .33 1 C - 1 - - .50 - - .50 .50 - .33 - .33 .33 1 G - - 1 - - .50 - .50 - .50 .33 .33 - .33 1 T - - - 1 - - .50 - .50 .50 .33 .33 .33 - 1 M .50 .50 - - 1 .50 .50 .50 .50 - .33 .33 .66 .66 1 R .50 - .50 - .50 1 .50 .50 - .50 .33 .66 .33 .66 1 W .50 - - .50 .50 .50 1 - .50 .50 .33 .66 .66 .33 1 S - .50 .50 - .50 .50 - 1 .50 .50 .66 .33 .33 .66 1 Y - .50 - .50 .50 - .50 .50 1 .50 .66 .33 .66 .33 1 K - - .50 .50 - .50 .50 .50 .50 1 .66 .66 .33 .33 1 B - .33 .33 .33 .33 .33 .33 .66 .66 .66 1 .66 .66 .66 1 D .33 - .33 .33 .33 .66 .66 .33 .33 .66 .66 1 .66 .66 1 H .33 .33 - .33 .66 .33 .66 .33 .66 .33 .66 .66 1 .66 1 V .33 .33 .33 - .66 .66 .33 .66 .33 .33 .66 .66 .66 1 1 N 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 where M = AC R = AG W = AT S = GC Y = CT K = GT B = CGT ( not A) D = AGT ( not C) H = ACT ( not G) V = ACG ( not TU) N = ACGT
The results page begins with a summary of the parameters used in this RIP run:
WindowSize = 400, Significance threshold = 0.9, GapOption = 1, Multistate characters = yes
Next is a Download button that allows you to retrieve your query aligned to the background sequences.
Three graphs showing different distance measurements between the query and the various background sequences are presented. A typical similarity plot might look like this:
The the x-axis (k) represents the query sequence position at the center of the moving window. That is why the first point is at position 200; half the window size of 400.
The y-axis, s(k), shows the similarity between that window of sequence and each of the background sequences. In the sample plot, CONSENSUS G (dark blue) is the sequence with the highest similarity to the background; it begins with a similarity of 0.9 and falls to a similarity of about 0.77 near position 600.
The two bars across the top of the graph represent the "best match" (lower bar), and the significance of this match (upper bar). The "best match" sequence is the background sequence with the highest similarity to the query. The upper bar is also colored at a position when the best match is significantly better than the second match. In the example above, you can see that around position 1700 the best match switches from "red" (A1) to "green" (J); however, there are several positions where neither sequence is significantly the best match.
Following the graphical output is an alignment of the query to the background, one block of which might look like this:
841: 900 [ 799: 855] query: 11_cpx.NG.94.NG3670b: AATGGCAGTCTAGCAGAAGAAGAGGTAAGGAT...TAGATCTGAAAACATCACAAACAAT a : CON_A1 : ----------------------------T---...------------T------R----- b : A2.CY.94CY017_41 : -------------------G--G-AA--TA--GAT------------T--T--------- c : CON_B : ---------------------------GTA--...---------C--TT----GG----- d : CON_C : -----T--C---------------A---TA--...------------TC-G--------- e : CON_D : ------------------------A---TA--...------------TC-------T--- f : CON_F1 : --------C--------------TA---TA--...C------C----T---T--G-T--- g : CON_F2 : --------C--------------TA---TA--...------------T---T--G-T--- h : CON_G : ---------T-------------AA---TA--...------------T------G----- i : CON_H : -----A--C----------D-C----C-TA--...-------A----T---T--G----- j : J.SE.94.SE7022 : ---------G---------G---CA---TA--...------------T---T--G----- k : K.CD.97.EQTB11C : --------C---------------A---TT--...---G------G-T--T-----G--- l : CON_01_AE : ------------------------A---TA--...C-----------TC----------- best match : aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa significant : ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This style of output is called "output-aligned" because the background sequences (labeled a through l) are shown aligned to the query, and only in those positions where they differ is the difference shown by a letter. When a background position agrees with the query, a "-" character is shown. Gaps are represented by "." characters. Fifty characters of the alignment are shown in each block; this example shows query positions 841 to 900. The second set of numbers, "[ 799: 855]" shows the absolute position in the query sequence, i.e., the position not counting gaps. At the bottom of the alignment is the "best match" line. In this example, the center of the window at every position 841-900 was a best match to sequence "a" which is "CON_A1", the A1 consensus sequence in RIP's standard background. You get a feeling that this is true just by seeing fewer "mutations" in the CON_A1 line relative to other sequences in the background. But note, you are only seeing 50 characters in this block, whereas the window itself was 400 characters wide. The final line in this block shows that the match between the query and the CON_A1 sequence was significantly better than the match score with any other background sequence. When the match is not significant, the "^" symbol disappears.
Return to RIP interface.