**Distributions**

Most variables in science are, in fact, random variables
(or variates) if only because they are subject to measurement errors. This means
that individual measurements will occupy random positions within a *range*,
the boundaries of which will not, in general, be precisely defined. Variates may
be grouped into two distinct classes, discrete and continuous. An example of a
discrete variate is the score obtained on a single roll of a die, while an
example
of a continuous variate is the ages of men
at death.

In order to describe the properties of a given variate we imagine that our samples are taken from a theoretical population of infinite number; so, while our die might come up with a score of five, we take it to be one representative of an infinite number of dice that all share common properties.

The primary way of mathematically describing the properties
of this imaginary parent population is the *distribution function*, which
may defined in words by:

The distribution function *F(x)*
of a variate *v* is the probability of *v* taking on a value less than
or equal to the number *x*.

In mathematical notation we write this as:

*F*(*x*) = Prob {*v*
£
*x*}

For the example of the score of a single roll of the die, the distribution function is of the form:

The properties of *F(x)* are that it starts from zero
at the left, it never decreases towards the right and ends up at 1.

As an example of a continuous case *F(x)* is the age
at death of UK males:

A more familiar way of depicting the distribution is in the
form of the density (or frequency) function, which better illustrates how the
random numbers are concentrated. It is known as *f(x)* and is the slope of *F(x).*
There are clear difficulties in representing *f(x) *for discrete variables
without the aid of more advanced mathematical concepts (delta functions) but for
the death rates above *f(x)* is of the form:

Properties of the density function are that the total area under the curve is unity and that it tends to zero to the left and right. We can get over the difficulties of representing the density function for discrete variates by using a bar chart format and adopting the convention that the numbers against the vertical axis apply to the area of the bar rather than the height. We can then treat discrete and continuous variable in the same way. Thus the density function for the fair die can be drawn as:

In mathematical terms the properties of the functions are:

F(x) is monotonic

The ideal function *f(x)* is estimated by a normalised
histogram of observations, where the term normalised means that we divide by the
total number in the sample, so that the area is unity. In presenting data it is
better not to normalise, in order to preserve the information about the
magnitude of the numbers involved.

Here is a histogram (from *Sorry, wrong number!*) of
breast cancer mortalities in 99 different UK hospitals, with a fitted normal
curve. It shows just normal random variation and nothing can be deduced
from it, but that did not stop them from trying.

**Footnote**: it seems that
£
does not appear as the less-than-or-equals sign in some operating systems.