As we reach the end of another year of the competition I thought I would show you another way of looking at the accuracy of the predictions. The competition uses a very simple approach of taking the absolute error of each monthly prediction compared to the actual final CET and then adding up each monthly figure to get an annual total.
Another way of looking at it is to use a different statistical approach called the root mean square deviation. This measures the average of the squared value of each error and then takes the square root of the result. By squaring each error figure it automatically gets rid of any negative figures (e.g. where a prediction was lower than the actual result in our CET competition). The other effect of squaring the error is to enhance the impact of significant errors, i.e. larger errors will have a much greater impact on the overall error figure. As a result consistent smallish errors will have less impact on the overall error value than say several predictions that are almost exactly correct but with a couple of large errors.
The basic formula for the root mean square deviation is as follows:
Applying that to the 2014 CET competition results (based on the current estimated final outcome) would give the following table. I have shown in the final column the comparison to the actual outcome expected based on the absolute error approach that we use for the competition. For some people using the RMSD approach gives a very different result. The reason being that the absolute error approach penalises errors in an entirely linear manner no matter how small or large they are. Whereas the RMSD approach assigns much greater significance to larger errors.
Taking a couple of examples from the table. Sussex snow magnet does very well under the RMSD approach being 18 places higher than under the absolute error method. Why? Well the main reason is that the prediction error in August, while significant (1.35C) was far lower than the error recorded by many other people who had errors well in excess of 2C. Errors this big get heavily penalised in the RMSD method.
On the flip side Surrey John does very badly under the RMSD approach. Why? Well 7 of his predictions resulted in an error of less than 0.5C which is why he has done well under the absolute error method. He had a couple of bad months but the large number of very good months more than compensated for this. However under the RMSD approach he is heavily penalised for 2 bad predictions which gave errors of 2.48C and 2.75C.
I am not suggesting we change the methodology used for the competition. The current method does allow for people to recover their positions even if they have one or two bad months which I think makes the competition more interesting. But I do think that if you are looking to judge the predictive "skill" of an individual then the RSMD method is probably more useful as a means of identifying who is consistently giving good predictions (rather than say someone who can give good predictions but is prone to also make some real howlers).
One of the reasons I mention all of this is that I had a message from Devonian earlier in the year asking if the data I now hold from running this competition over several years allows me to glean any insight on predictive skill and where that skill is improving over time. That is a difficult question because there are so many potential variables including for example what the models were showing at the time people make their predictions and how much the actual outcome differs from this. Looking over a whole year I think the RMSD approach is a better guide to predictive skill than the more straightforward method we use for the competition.
I also looked at how the overall errors of the top 10 people in the competition have varied year on year. The table below has some data on this. The first column shows the total absolute difference between the actual CET mean and the 1971-2000 mean totalled for each 12 months of the year. This shows that in 2014 the CET was (as we now) often very far from the mean. 2010 saw a similar result but of course this was skewed by the Dec figure which accounted for a disproportionate amount of the total deviation as compared to 2014 where the deviation from the mean has been consistent throughout the year. 2011 was similar to 2014. 2012 by contrast saw temperatures generally close to the mean throughout the year.
The second column shows the cumulative prediction errors of the top 10 people in the final competition table. If we divide this by 10 it gives an average prediction error. In 2012 the average error is quite small but this is to be expected given the temperatures were generally close to the mean and typically in the competition most people tend to make predictions that are not too far from the mean. In 2010 and 2011 when temperatures deviated more markedly from the mean the prediction errors tended to be higher as a result.
In 2013 the CET deviations from the mean were between the low value recorded in 2012 and the much higher values in 2010 and 2011. However, the average prediction errors in 2013 were the highest recorded in the 6 years I have been running this competition. This suggests the skill level of the predictions in 2013 was really quite low across the board.
Conversely in 2014 the CET deviations from the mean were very high and only just below those in 2011. Yet the prediction errors in the competition this year have easily been the lowest I have ever seen. This suggests predictive skill this year has been very high indeed.
Looking at the final column of the table we see the number of months in each year where the CET mean deviated by more than 1C from the 1971-2000 mean. 2014 has the highest number by far of any year and yet still has the lowest prediction errors. Now of course 2014 has been consistently very warm throughout (August excepted) and hence this may have helped because we have not shifted from warm to cold or vice versa very much. But even so given the significant CET anomalies I think it is quite remarkable how well the CET has been predicted this year. It will be a tough job to improve on these figures in 2015.
Worth noting that up to now the smallest annual prediction error in the competition was the 7.7C recorded by Saint Snow in 2009. This year at least the top 5 in the competition will all better this figure just highlighting further how good the predictions have been.