Assignment #2: Confirming the Stable Run Periods
You have seen the rates for random events fluctuate, at least slightly, about an otherwise stable average.  Now let's study such fluctuations in more detail.   A few calculations should lead us through the statistics that govern this behavior.


At right once again is the histogram displaying, minute by minute, 2-fold coincidences at the Base Station.  Note the zero is "suppressed", i.e., the vertical axis starts at 500, not 0.  We have focussed in on the very top of the graph.  These are actually minor fluctuations riding atop a nearly 600 tall base.  We see 1 bin as low as 562, and 1 as high as 657; most are close to the 609.5 counts/minute average (marked by a blue horizontal line).  We don't see sudden spikes up to 800, or drops down to 400.  So is this good data?  How do we decide if the fluctuations are reasonable or jumping too wildly?

Minute-by-minute 2-fold Coincidences
Within Excel click an empty cell below the column of data rates you want to check. Click on  fx  to the left of the input field at the top of the Excel window.  Select the AVERAGE function (highlight the function name and click OK).  Then highlight (or type in) the range of cells above that you want averaged.
 
In another empty cell, select the function STDEV which averages each entry's distance from the average.  The standard deviation is a calculation of how far, on average, every data point is from the mean.  If every reading were identical, the mean would be obvious and the standard deviation simply zero.  





11/17/2004
19:46:00
660
645
11/17/2004
19:47:00
683
622
11/17/2004
19:48:00
656
677
11/17/2004 19:49:00 647
658
11/17/2004 19:50:00
650
668
11/17/2004 19:51:00 673
620
11/17/2004 19:52:00 659
634
11/17/2004 19:53:00 649
647
11/17/2004 19:54:00 642
682
11/17/2004 19:55:00 632
642
11/17/2004 19:56:00 688
640






=AVERAGE(C2:C1145)
                            


=STDEV(


 
The added lines mark distances one and two standard deviations above and below the mean.  For this data set Excel reported a STDEV of 20.  So lines have been added at
609.5 ± 20.0 = 589.5, 629.5  and  609.5 ± 40.0 = 569.5, 649.5.
Most fluctations appear to lie within ±1 STDEV of the mean.  A few data points fall between 1-2 STDEV.  A very small number (here 4) lie more than 2 STDEV away from the mean.  None lie beyond 3 STDEV.  The STDEV describes to us how tightly clustered the fluctuations are about the mean, and defines a limit to how widely they might range.
With Standard Deviation Markings
Here's an alternate way of presenting that data.  Pictured at right is a frequency distribution of  the counts recorded in minute intervals, now over a three hour period (same Base Station detectors, but the next day).
Click here for a summary of how to
build frequency distributions in Excel.
Notice how the readings are bunched closely about the mean of 615?  Again zero has been suppressed; there was no (or very little) data below 500 (or above 700).   Now  vertical lines mark ranges that are ±1, 2 STDEV from the mean.  The rounded peak shows you most of the data is within ±1 STDEV of the mean.  Very rarely will counts be recorded  >3 STDEV from the mean.
Guassian distribution of data
This a beautiful example of a Gaussian Distribution, which you probably know simply as the "bell-shape" curve.  It is fit very well by the functional form:
Gaussian Function
 where µ is fixed at the mean and sigma at the measured standard deviation.
Characteristic of this shape is that the region between µ-sigma and µ+sigma contains 68% of the total area under the curve. In other words:
68% of the events fall within ±1 STDEV of the mean.
95% of the events fall within ±2 STDEV of the mean
99.7% of the events fall within ±3 STDEV of the mean
Gaussian Curve
   
You will need to check that the distributions of data are nicely Gaussian (bell-shaped).  There should be no multiple peaks - just one, centered in the distribution.  But we have not yet agreed upon how large a sigma should be allowed, and still be considered acceptable.  As surprising as it may seem upon 1st consideration, events that occur purely at random can have predictable sigmas!

Random events occuring at a reproducible average rate are well understood statistically.  These include coin flips, dice rolls, automobile accidents, radioactive decays, and cosmic rays.  The probability of counting a number n occurrences of a purely random event (heads in repeated coin tosses, a particular face value of repeated rolls of a dice, or cosmic rays in 5 minute run) is given by the Poisson distribution

sigma = SQRT(N)       Show me HOW!

where μ is the true average or mean count of an infinite number of such experiments.  This distribution has an exact calculable STDEV:

Sigma = SQRT{mu}         Show me HOW!

Complicating things just slightly is the fact that, with many cosmic rays coming from our own sun, the rates can varying slightly by the time of the day.  Here the random scatter of fluctuating readings is not around a flat average, but a baseline that regularly winds from a slightly lower to slightly higher value every 24 hours.

Plot from the University of Rochester PARTICLE   project.
Rates varying with time of day
Weather patterns can bring varying barometric pressure.  Pressure is an indication of the depth of the overhead atmosphere. The atmosphere filters some of the cosmic ray muons, and the barometric pressure affects does indeed affect rates.

Plot from Joseph Willie's Mendon High School Pittsford, NY Muon Research Project. Mendon High is a participant in the University of Rochester's PARTICLE Project. Rates are plotted at 5-minute intervals.
Rates varyign throughout day
In addition magnetic solar storms (associated with sunspot activity and solar flares) also affect the rate of cosmic ray muons detected at earth's surface.  Statistical fluctuations will appear as jumpy dot-to-dot data points within a band around a smoothly varying baseline (due to barometric pressure, time of day, etc).  Any real baseline changes of course should appear in every channel (not just 1-2).  Their affect will tend to slightly broaden the STDEV above SQRT(µ).

Performing the analysis

1. Following the directions linked to above, make frequency distributions for the 01 and 23 coincidences.
 
2. Confirm that the distributions are nice bell-shaped curves: no long tails of "out-lyers" and no secondary peaks (which could mean sources of excess noise).

3. Calculate the AVERAGE (mean), STDEV (sigma) and SQRT(AVERAGE) for each of the columns of data you are studying.

4. Recall that ideally sigma=SQR(mu)though a varying baseline may broaden it.  Let's flag as suspiscious any distributions that have a STDEV > 3× SQRT(µ).

5. Post your plots on a webserver, and circulate the URL (email me and all the SALTA schools)...or...paste the plots in a document and circulate that.  In either case, I will provide links to the plots from the main Henderson Project page.

Having everyone check the data is a good way to cross-check problems, and let everyone learn what good data looks like!
Any questions, suggestions, or difficulties contact me!  dclaes@unlhep.unl.edu
I'll post the 3rd assignment soon!
Dan