The Mean and Standard Deviation

1. First a few notes on computing averages.
The mean, µ, or average of a set of N measurements is found by dividing the sum over the entire set of measurements by the total number of measurements N.
Example, the mean (average) of {5, 15, 25, 10, 15} is  = (5+15+25+10+15)/5 = 14.
Sum / N

Notice that 15 happened to appear twice in the list of numbers being averaged above.  We could have written
2x(15)
by keeping track of the number of 15s (all other valuse occured just once).
Consider this table of measurements
Height in inches of sample of 100 male adults
61  64  67  70  70  71  75  72  72  69  68  70  65  67  62  59  62  66  66  68
68  73  71  72  73  71  71  68  69  68  69  65  63  70  70  76  71  72  74  60
56  74  75  79  72  72  69  68  68  68  62  66  66  66  61  77  75  74  63  72

63  62  65  65  66  65  67  67  65  67  68  62  67  60  68  65  70  70  69  70
68  73  64  71  71  68  70  69  73  72  70  69  67  64  67  58  66  69  76  73

When rounded off to the nearest inch there are many numbers repeated.  A frequency table can keep track of how often individual values are repeated (see center).  A histogram gives a visual representation (far right).  56 is the smallest height recorded in this sample.  79 is the largest.  68 is the most common height observed, and most values in fact crowd close to this common height. 68 is known as the mode, while 67.20 is the mean.
Frequenct
Table of
Heights
    56     1
    57    
0
    58     1
    59     1
    60     2
    61     2
    62     5
    63     3
    64     3
    65     7
    66     7
    67     8
    68     12
    69     8
    70     10
    71     7
    72     8
    73     5
    74     3
    75     3
    76     2
    77     1
    78     0
    79     1
Distribution of Adult Male Heights

Mean = 67.20

Notice this could be written as
µ = [ (Number of 56s)×56 + (Number of 57s)×57 + ... + (Number of 79s)×79  ] /  N
Alternate Form
where obviously N56 + N57 + N58 + ... + N78 + N79 = N the total number (100) of men in the sample.

Here's an interesting example.
The average of N rolls of a pair of dice would be
(Number of twos)×2 + (Number of threes)×3 + ... (Number of elevens)×11 + (Number of twelves)×12
all divided by the total number of rolls N = N2 + N3 + N4 + ... + N11 + N12

Average N Die Throws

But N2/N is the fraction of times a 2 is tossed.  N7/N the fraction a 7 is tossed, etc.  We can predict those!
There are 36 possible outcomes
to any single toss of two die,
only 1 of which gives a 2
Sanke Eyes
So we expect N2/N = 1/36.
(The probability of a 2 = 1/36).
There are two ways to score a 3:
AceyDeucey orDeuceyAcey
So N3/N = 2/36=1/18.
Similarly N4/N=3/36=1/12,
up past N7/N = 6/36 = 1/6
(the probability of a 6 = 5/6)...
...all the way through
the one way to score 12:
Boxcars
N12/N = 1/36.

  Thus the average of any large number of dice rolls can be predicted (quite accurately) by
Mean = 7.5
Note two things:
i.  The defnition of average can be re-written in terms of probabilities:
Sum of Probable Values
     the sum of all possible measurements weighted by their individual probability!
ii.  The average value may not always represent any real possible measurement. After all, the average value for a large number of rolls of a single die is 3.5 (and you'll never throw a 3.5)!

2.
Averages identify a "central value" to any distribution.  But they do not indicate how tightly clumped together data might be.  The range can give you an idea of how spread out the data is:
Range = xmax - xmin
For the 100 men sampled above, the range in heights was 79 - 56 = 23.

If a sample contains some rare, extreme measurements (see right) the range, however, can be very misleading!
Narrow distribution
In fact at right are shown 3 data samples which all have exactly the same mean and range!  The data at the top, however, are much more consistent than those at the bottom.  Almost all the measurements have come out very close to one another.  The data in the center actually display somewhat larger spread, despite the statistical accident of having a range that matches the top data set.  The data set at the bottom shows an enormous amount of variation. 

We need a better way of describing this variation.
3 quite different distributions


3.
The statistical standard deviation Sigma is the square root of the variance; the variance is often described as the average difference from the mean.
The Standard Deviation
A data series like 1, 2, 3, 6 has a mean equal to (1+2+3+6)/4=3. How far is each data point from the mean? The individual differences from the mean are: 1-3=-2, 2-3=-1, 3-3=0, and 6-3=+3. The average difference is just
(-2+-1+0+3)/4=0/4=0
(there is as much data below the mean, as there is above).  The average difference then can never be very informative.  The variance actually averages the squares of such differences (avoiding the problem introduced by the negative numbers). For this example: {(-2)2 + (-1)2 + 0 + 32}/4=14/4=3.5. Finally, the standard deviation is equal to the square root of the variance: SQR(3.5)=1.87.
Illustrating the Definition of St Dev

4.  Another example:
X
X-µ
(X-µ)2
1
-3
9
3
-1
1
4
0
0
4
0 0
5
1
1
7
3
9
SUM
24
20
N
6 6
SUM/N
4
mean
3.3333
variance
SQRT

1.8257
StandDeviation

5.  Final observations.  The mean µ gives you the central value of a distribution (usually its peak).  The standard deviation tells you on average how far the data varies from the mean.  Some data points are of course farther from the mean, some closer to it.  When many crowd very close to the mean (like in a "bell-shaped" distribution), only a few can range far from the mean at all (since the average must still work out to be µ).