As I live in the Boston area, I'm very interested in this question, and was thinking about trying an experiment like this myself! I'd love to hear about your results.
There are two ways you could "prove" that there was a connection between Mass weather and El Nino. The first way is to run a controlled experiment using two planets which were identical except one was in an active El Nino phase while the other was in the opposite ("La Nina") phase. Since we don't have two identical copies of Earth, scientists generally use complicated computer simulations which run on supercomputers to do this sort of comparison. Such an experiment is probably too difficult, expensive, and time-consuming for a high-school project.
The second way is to establish a "statistical correlation" between El Nino and Mass weather. Here's how it works. Suppose I flip both a penny and a nickel, and for some weird reason, they tend to match: when the penny comes up heads, the nickel does too. Now, if I just flip them each once, and get heads on both, that could well have happened by chance. However, if I flip them 100 times, and they match 99% of the time, I'd be pretty sure that there was a connection. Of course, I don't know the mechanism that produces that connection...
Here's what you do. Form an "index time series" for both El Nino and your Mass weather data: a sequence of numbers as a function of time. You need to make sure your data points are far enough apart in time that each does not depend on the previous one. Taking yearly averages should work. Annual-average ENSO index and annual-average Worcester temperature are good time series. Find the average value of each timeseries and subtract it from each of the datapoints in that series to form an "anomaly time series". Now compute the standard deviation of each timeseries as follows:
sigma_x = sqrt[ sum( x2) / (N-1) ]where sigma is the sample standard deviation of the variable x, sqrt() is square root, sum(x2) means to add up the squares of all the values of x, and N is the number of data points you have.
Now compute the "cross-covariance" of the two time series. That is, compute
Cov(x,y) = sum(x * y) / (N-1)That is, multiply each value of x by the value of y that occured at the same time, add them all up, and divide by N-1. If this value is large, then x tends to be positive when y is positive, and negative when negative. If it is nearly zero, there is no connection between x and y.
Now compute the "cross-correlation" between x and y by dividing the covariance by the product of the standard deviations:
corr(x,y) = Cov(x,y)/(sigma_x*sigma_y)If the two time series track each other exactly (x = a*y for some value of a), then the correlation will equal +1. If they are exactly opposite (x = -a*y), then the correlation will equal -1. If they are unrelated, the correlation will be 0.
For example, suppose I have these two time series (which I just made up):
| Monthly ave. temp | My gas bill | |
| Jun | 70 | $30 |
| Jul | 75 | 25 |
| Aug | 80 | 37 |
| Sep | 72 | 32 |
| Oct | 60 | 40 |
| Nov | 45 | 75 |
| Dec | 40 | 100 |
| Jan | 35 | 120 |
| Feb | 37 | 125 |
| Mar | 50 | 90 |
| Apr | 60 | 60 |
| May | 67 | 40 |
An important question, though, is whether a correlation is "significant". That is, there's always a chance that a correlation could be observed between two completely independent random variables. That is, it's possible (though very unlikely) that if I flip a penny and nickel 100 times, they would be same-side up every time.
Scientists are generally convinced there's a connection when there's only a 1-in-20 or a 1-in-100 probability that the correlation could happen by chance. This means the correlation must be greater than the 95% or 99% "significance level". The significance level depends on the number of observations: the odds of getting a high correlation by chance are much less when you take lots of measurements. Here's a correlation table:
| N | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 | 150 | 200 |
| 90% conf | 0.5483 | 0.3774 | 0.3054 | 0.2634 | 0.2350 | 0.2141 | 0.1980 | 0.1851 | 0.1744 | 0.1653 | 0.1348 | 0.1166 |
| 95% conf | 0.6533 | 0.4496 | 0.3640 | 0.3138 | 0.2800 | 0.2552 | 0.2360 | 0.2205 | 0.2078 | 0.1970 | 0.1606 | 0.1389 |
| 99% conf | 0.8586 | 0.5909 | 0.4783 | 0.4125 | 0.3680 | 0.3353 | 0.3101 | 0.2898 | 0.2730 | 0.2589 | 0.2110 | 0.1826 |
I computed the significance values myself: you might want to check them against those in a statistics textbook. The values given above are really only good for N>20 or so.
For more details, find yourself a textbook on statistics; one written for a college undergraduate statistics course should be fine for you.
Try the Earth Sciences links in the MadSci Library for more information on Earth Sciences.