My intention for the last post, The Effect of Elections on Gasoline Prices, was to be as thorough and quantitative as possible. A friend who is properly trained in statistics pointed out the need to run significance tests on the results. This is good advice and the analysis will be complete with its inclusion.
That last post ended with a visualization of the non-seasonal changes in gasoline prices in the months leading up to the election (August to November) for election years (Presidential or midterm), and used the same data in the same timeframe in non-election years as a control. We used inflation-adjusted, constant 2008 dollars to properly subtract the real seasonal changes and discover real trends in the analysis. That final figure (below) clearly showed that there is no trend of election-related price decreases. In fact, prices have tended to increase somewhat as the election nears. But the question that I failed to adequately address last time is: Are the price changes in election years significantly different from those of non-election years? This is the definitive question.
Because any sampled data set will suffer from sampling errors (it would be extremely difficult for every gas station in the country to be included in the BLS study each month), the sampled distribution will differ somewhat from the actual distribution. This is important because we frequently represent and compare data sets using their composite statistical values, like their mean values. And two independent samplings of the same distribution will produce two sets with different mean values; this makes understanding significant differences between them an important problem. What we need is a way to determine how different the datasets are, and if these differences are meaningful or if they are simply sampling errors (errors of chance).
Fortunately we are not the first to need such a tool. Mathematicians have developed a way to compare datasets to determine if their differences are significant or not. These are “tests of significance.” The t-test is one of these tests and it determines the probability that the differences between the means of the two distributions are due to chance. The first thing we should do is look at the distributions of these price changes. The two large election-year price drops (2006, 2008) are very clearly seen to be outliers, and the significant overlap of the distribution of price changes is readily visible.
It is clear that were it not for the outliers in the election year data, these distributions would be considered to be very nearly identical. But to characterize the significance of their differences, we’ll run an independent t-test. The primary output of the test that we are concerned with is the p-value. This is the probability that differences between the two distributions are due to chance. Recall that the maximum value of a probability is 1. If it matters, I’m using R for data analysis.
Welch Two Sample t-test data: electionyear$changes and nonelectionyear$changes t = -0.6427, df = 21.385, p-value = 0.5273 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.2530637 0.1334810 sample estimates: mean of x mean of y -0.02367507 0.03611627
This p-value tells us that there is a 52.7% probability that differences between these two distributions are chance. The alternative hypothesis is then rejected and the difference in means is the same as 0. This answers the question that we posed and indicates that the changes in gas prices in election years are not significantly different from those of non-election years.