Analysis of the HO & SHAW Papers

By Mark Craddock

I have completed a preliminary analysis of the papers by Ho and Shaw which appeared in Nature, January 12, 1995. My considered opinion is that they are total rubbish. I seriously doubt that the two groups really have any idea what they are doing when they construct their supposed models of the interaction of the virus and the immune system. The models when analysed properly do not do what they think they do.

To begin with, a few oddities and yet another remark about QC-PCR. In the Shaw paper (Wei et al) they study 22 patients with CD4 counts between 18 and 251 . They claimed that plasma viral RNA levels in the 22 subjects at baseline ranged from 10^4.6 and 10^7.2 with geomteric mean 10^5.5. (The notation 10^4.6 is 10 to the power of 4.6, which you can work out on any scientific calculator. 10^4.6 =39810, or 19,905 virions per ml of blood. 10^7.2 = 15848932, or 7924466 virions per ml. 10^5.5 = 316228, or 158114 virions per ml. Of course the accuracy given here is ludicrous. They can't really mean that they can measure things this accurately) Note that they say geometric mean. What is the geometric mean I hear you ask? Well this is obtained by multiplying the numbers together and taking the nth root. So for two numbers you multiply them together and take the square root. For 3 numbers you take the cube root of the product and so on. The Geometric mean is always less than the arithmetic mean, or average , except when the numbers are identical, in which case the two means are equal. Why have they used the geomteric mean here? Well the only reason we could think of (myself and a colleague) is that the geometric mean smooths out ratio changes. This might be important to make their estimates of viral load by QC-PCR look more consistent than they really are. If they are estimating changes in viral load by taking ratios of QC-PCR measurements at different times, then the geometric mean of these variations will show less variability than the arithmetic mean. It makes their results look more consistant than they really are.

The idea behind QC-PCR is to amplify (mass produce) target DNA together with some DNA which acts as a control , and is used to estimate the size of the unknown target. So if there is X amount of target present, which you do not know, you add Y amount of the control, and amplify the two together. After n PCR cycles, you end up with Xn target DNA, and Yn control DNA. The assumption is that Xn/Yn = X/Y for all n. Because you can now measure Xn/Yn, and you know what Y is already, you can work out X. The critical assumption is that these 2 ratios are the same. But as Todd Miller has explained, this assumption is not correct, and data obtained by this method will be wrong. How wrong? Well the formulas for Xn and Yn give you the way to work this out.

Xn = X(1+Ex)^n

Yn = Y(1+Ey)^n

Ex and Ey are the two efficiences. Using QC-PCR assumes these two numbers are EXACTLY equal. Close is not good enough. The error in the QC-PCR estimate is given by ((1+Ex)/(1+Ey))^n. This gets bigger the more cycles n you choose. In a paper by Piatak et al in Science in 1993, it was claimed that QC-PCR was detecting millions of HIV RNA molecules per ml of blood plasma. They used 45 PCR cycles. Well if in one experiment ((1+Ex)/(1+Ey)) = 1.25, say, then after 45 cycles you overestimate the size of the unknown by a factor of 23,000. You can also underestimate the size of X. As Luc Raeymaekers pointed out, the published results on QC-PCR actually contain evidence that Ex and Ey are not equal. I suspect that in Piatak et al's reconstruction experiments (The senior author of this paper was none other than G. Shaw) they fitted a bunch of straight lines to their data and got an answer that was close to the true value and so assumed that the process would always work. There are very good reasons to believe that it does not work at all.

Now to return to the Shaw paper in Nature. I suspect that their QC-PCR estimates bounced around all over the place, and so they used the geometric mean as a way of smoothing this variation out. I can't prove that since they supply no data at all. But it would be good to see their actual numbers.

Another point to make before moving to the main theme is that they do silly things with data. Like drawing straight lines through clouds of data points and pretending that their straight lines have some meaning. Have they never heard of polynomial interpolation, or Time series? Obviously not. The point is that there are advanced mathematical techniques for handling this kind of data, but they seem wedded to the idea of sticking a straight line through any old collection of points and calling it a regression analysis.

Now to the new "mathematical understanding of the immune system that their work provides" (to paraphrase Maddox in an English Newspaper). Well it's bollocks. Pure and simple. Anybody who doesn't like mathematics be advised that there is some coming up. I will try and make this as simple as possible. And I will concentrate on the Ho paper. Similar comments can be made about the Wei paper.

Ho et al estimate the rate of viral clearance by studying the equation

dV/dt = P - cV.

P is the rate of viral production and c is the rate of viral clearance. dV/dt is the rate at which V changes with time. It is a differential equation, and by a happy coincidence my area of research is in differential equations. They make the fundamental assumption that the virus is in a steady state, meaning that dV/dt = 0. This means that P = cV. In other words they are assuming that the rate of virus production, this is before drugs are introduced, is exactly equal to the rate of viral clearance. A more correct approach would be to model viral production by

dV/dt = (a-c)V

and treat c as a parameter. When they come to study the interaction of the virus and T cells, you have a problem in what is called bifurcation theory. This studies what happens when parameters used in equations are changed. The interaction of HIV and T cells in this problem depends upon the behaviour of c. c cannot be a constant , because it varies depending upon antibody production and the state of the immune system. Clearly as the immune system declines, the ability to fight the virus must weaken, and so c must decrease. And in the 3 months or more before antibody production, c would be very small, perhaps close to zero. What Ho et al have done is assumed that c always matches the rate of viral production, which is impossible.

Then they model the behaviour of the T cells by

dT/dt = P -muT

P here is the rate of T cells clearance, and is not to be confused with P above. They should use different notation. This is a very badly written paper. mu (that's the Greek letter mu) represents the cell decay rate.

This is a very odd equation. It predicts that the T cells decline exponentially quickly to P/mu. Presumably they are assuming that this models T cell behaviour when drugs are being administered, but why is not clear. There is also a major problem ? Where is HIV? If they are assuming that the T cells are declining because of the effects of HIV, then this equation must contain some term involving the amount of virus present. It does not. So what the hell is it supposed to mean? Well I would guess that they are assuming that the amount of virus is constant, and so the effect of V is constant. So presumably the term mu represents the constant effects of the virus on the T cell population.

But this bears no resemblance to what actually happens in AIDS patients. Here you have an exponential decline in T cell numbers to a steady state P/mu, which could be quite high depending on mu. In AIDS we have a slow decline over ten years or more to close to zero. So Ho's model does not describe what actually happens in AIDS patients.

However it is actually a good deal worse than this. Because the parameter c does not always match the viral production rate a. If c > a, the virus is rapidly cleared from the body and the T cell count remains high. In other words the patient recovers. If a > c, which would be the case before antibodies appear, and we have 3 months or more in which this is the case remember, then we have

V(t) = V(0)exp(bt) b = a-c > 0

Now let us look at our equation for T. If we put the effect of V into the equation, we must have

dT/dt = P -mu(V)T - f(V).

So the rate of cell decline mu depends on the amount of virus present. This must be the case. You can think of this term as perhaps modelling the effects of apoptosis. The term f(V) represents the decline caused by direct killing of T cells by HIV. If V = V(0)exp(bt), then what does this say about the behaviour of T over time? To model this you need an expression for mu(V) and f(V) . Choose f(V) = 0. And approximate mu(V) by a Taylor series to first order.

mu(V) = mu(0) + mu'(0)V

Substituting into the equation for T, we get

dT/dt = P - (mu(0) + mu'(0)V(0)exp(bt))T

solving this equation, and picking b = (log2)/4 (Ho and Shaw estimate that a is twice this value in their paper, and so this is a conservative estimate) and picking V(0) =1, so there is only 1 virus to start with, and picking some other very conservative values, you find that T drops to less than 5% of the original value in 20 days. In other words you should have AIDS in 20 days with this rate of viral production if T cells are dying from apoptosis.

If we assume there is no apoptosis and that direct killing is responsible, we get mu = constant and pick some form for f(V). I chose as the simplest most conservative form f(V) = kV. That is, the killing is directly proportional to the amount of virus present. This gives with the same parameters above, and picking k very small (meanining that a lot of virus is needed to kill one cell), then AIDS develops within 60 days of infection. By AIDS I mean it takes about 60 days for every single T cell in the body to be killed.

Ho uses the analogy of a sink with water pouring into it but the drain is open. He argues that the virus is killing slightly more cells than the body can replace and so you get a slow decline in T cells. In terms of his analogy, the water flows out of the sink slightly faster than it flows in. A better analogy would be that as the water level drops, the drain gets bigger. So the process speeds up. Ho and Shaws data if you read it correctly predicts that AIDS should develop in a matter of days after infection or at most a few months. This is what exponential growth is all about. The virus grows exponentially , doubling in number every 2 days in the absence of an immune response they say. So when the immune response is weakest, before antibody production, it should kill every T cell in the body quickly. This does not happen. I wonder how they explain this? *

Mark Craddock PhD
School of Mathematics
University of New South Wales, Australia