**What
has the weakest link to do with fallacies in medical statistics?**

**Theory**

*A chain is as strong as its weakest
link* – a truism that we learn in childhood, yet one that is ignored in a
wide range of human activities. The concept leads to a branch of extreme
value statistics that applies in a wide area of problems, particularly those
involving failure, such as mechanical structures, electrical insulation and
human life.

If we make up a chain from a number, *n*,
of links whose probability of failure is a variable F_{1}, then the
probability of the chain failing (under given conditions of tension and time) is
given by:

*F _{n}*
= 1- (1-

This is known as the **smallest value transformation**. It can be derived directly from the binomial
distribution (the easy way is to subtract the probability of no failures
from 1).

*F* will be function of stress and
time, but for the present purpose we will regard the stress as constant and
treat it as a function of time, in fact the distribution
*F*(*t*).

**Data
dredge**

In an epidemiological survey, if the researchers look at one disease and one potential cause, they determine the incidence of the disease and then quote a level of significance either as the probability of the number having occurred by accident or as a confidence interval.

If they are looking at two diseases and count either as an event, they are statistically testing a chain of two links. Whichever crosses the given threshold first determines that the event has occurred. By reference to the general population or to a control population they determine the probability that the rate of occurrence is significantly unlikely, i.e. less than a predetermined threshold. More often than not this threshold is chosen to be the rather unsatisfactory value of 0.05 and we get the iconic P<0.05.

The data
dredge fallacy arises from looking at more than one disease, but treating
them all as though they were each a part of an independent survey. Only the ones
that cross the significance threshold are counted and the rest are discarded.
They claim a particular value of P, but the reality is that a larger value
applies, and it is determined by the smallest value transformation. Likewise,
the Confidence Interval, which in a sense is the mirror image of P, is affected
and is subject to the largest value transformation. P and CI are simply
different ways of expressing the (often dubious) claim of a one in twenty chance
of being wrong. Using the formula above we can create a table to show what P and
CI ought to be for any number of diseases, *n*,
rather than the numbers 0.05 or 95% respectively that are almost invariably
quoted. We can present the effect of the number of diseases on the true value of
P or CI in a table.

n |
P |
CI% |

1 |
0.05 |
95 |

2 |
0.098 |
90.03 |

3 |
0.143 |
85.74 |

4 |
0.185 |
81.45 |

5 |
0.226 |
77.38 |

6 |
0.265 |
73.51 |

7 |
0.392 |
69.83 |

8 |
0.337 |
66.34 |

9 |
0.370 |
63.03 |

10 |
0.401 |
59.87 |

11 |
0.431 |
56.88 |

12 |
0.460 |
54.04 |

And so it goes on. In a very large data dredge, such as the Harvard Nurses Health study, with hundreds of combinations of disease and potential cause, the chance of getting one accidental correlation is tantamount a certainty as, indeed, is the probability of dozens; yet each result is trotted out with P<0.05 or a CI of 95% as though they are all part of independent trials.

**Premature
termination
**

When the Tamoxifen trial was prematurely closed and unblinded by the American contingent, it was treated as a scandal and an outrage by the European participants. Now they are all at it. In drug trials several diseases are monitored and as soon as one crosses the threshold the trial is abandoned. They claim the same old one in twenty chance of accident, but the actually probability depends on the number of diseases being monitored.

Here is a comment from The Epidemiologists:

*The
policy of cancelling arouses a number of concerns. First, there are security
worries. Do we really believe in the efficacy of those “Chinese walls” that
are supposed to stop different departments in financial institutions from
leaking information to each other. Are there no innuendoes exchanged in the
lap-dancing clubs after work? Second, the progress of such a trial in terms of,
say, relative risk is a random
walk, wandering up and down but, if the trial is long enough, gradually settling
down to an equilibrium value. If it is terminated before that equilibrium is
reached, can the result be regarded as significant? Would the trend have drifted
the other way given time? Third, who prescribes the standards by which the
action to terminate will be judged? It is rather disturbing that in at least one
case the terminating condition involved a 95% Confidence Interval that embraced
the value of relative risk of 1.0, which means there is no effect. Fourth, the
very act of termination endows a study with much greater significance than it
would otherwise be granted. Following the 2004 announcement of the termination
of a Scandinavian HRT and breast cancer trial, the headline was Breast cancer
fears force doctors to axe second trial. Yet this trial involved a mere 174
women. Furthermore, it was formed from the combination of two trials, one of
which was producing “evidence” that HRT protected against cancer.
Fifth, the whole thing involves the extreme value fallacy. If a dozen diseases
are being monitored it only needs one of them to cross the arbitrary threshold
for the trial to be terminated, yet for one of the others the treatment could
have turned out to be wonderfully beneficial or devastatingly malign.
*

*The next HRT trial was abandoned on the grounds of
risk of stroke. Breast cancer did not figure. By April 2004, more than half the
women on HRT had abandoned it, when yet another study appeared, exonerating it.*

**Asymptotic
distributions**

The other question of relevance is the shape of distributions of extreme values. Just as averages tend towards the normal distribution by virtue of the central limit theorem, so extreme value distributions tend towards certain shapes. These may be derived from an idea known as the stability postulate, that the asymptotic distributions are such that they do not change their shape under the smallest (or largest) value transformation. It turns out that there are only six possible types. In medical statistics the two important ones are the exponential distribution (also known as the Poisson traffic law) that governs time to failure or initiation of disease (as in the Vioxx trial) and the Gompertz distribution, which applies to human life duration.

**Simple
worked example
**

**An
electronic component has a constant failure rate of a
per hour, what is the failure rate of a system with n such essential components?**

The system has failed if one component fails.

The distribution of times to failure for
one component is given by *F _{1}*(

Plugging this into the *smallest value transformation* above, we get *F _{n}*(

The calculation is the same for *n*
diseases, assuming the statistical properties are similar.