As an Engineer it has long been a puzzle to me, why the mathematics in Climate data observations are so poor and yet not challenged in any way. Just considered ‘par for the course’.
The common use of jittering, sub-sampled single running means with all the distortions and ‘noise’ leakage that they imply. Any Engineer would be fired if they were to use or rely on such poor instrumentation/data summaries in their day to day work.
Greg Goodman (with Vaughan Pratt’s valuable input) had a great thread on how bad all this is at Judith Curry’s site.
Data corruption by running mean ‘smoothers’
Posted on November 22, 2013 by Greg Goodman
or visit Greg’s own site for the same article.
So this new tool is an attempt to overcome those known deficiencies and produce a more accurate treatment of Climate data and see what new perspectives, if any, it uncovers.
Firstly let’s deal with the Yearly signal and the need for ‘Normals’/‘Anomalies’. These ‘Normals’ are constructed by choosing some reference period, often 30 years long, accumulating the data over that period and then subtracting those figures from the Monthly data to create the ‘Anomaly’ set that plots just the Year to Year variations in the data.
This suffers from the problem that the reference periods can be different, data set to data set, and also from the problem that Monthly is a sub-sampled single running mean and only 30 (say) additions are used for each Month. This can and will leave a residual ‘noise’ in the ‘Normal’ that will then show up in all the data from then on when used to construct the ‘Anomaly’ data sets.
There is another, simple, alternative though. A 12 month/365 day Cascaded Triple Running Mean (CTRM) which will completely remove the annual ‘cycle’ as the CTRM is a near Gaussian low pass filter. In fact it is slightly better than Gaussian in that it completely removes the 12 month ’cycle’ whereas true Gaussian leaves a small residual of that still in the data.
This CTRM filter, used either at 365 days or 12 months in length, will completely remove the Annual cycle and still retain the underlying sampling frequency in the output set into the bargain.
(Never, ever, down sample a data set unless it is completely unavoidable. Keep the full sampling resolution whenever possible.
We do have computers now after all. 🙂 )
In fact it gets even better than that. It does not matter if the data used has been normalised already or not. The same CTRM filter will produce the same output on either raw or normalised data, just with a small offset due to whatever ‘Normal’ period chosen! No added distortions of any sort.
Similar treatments can be done for GISS and RSS.
There is a small problem in that these CTRM are ‘full kernel’ filters, that is to say, although their outputs will not change when new data is added (except to extend the existing plot), they do not extend up to the ends of the data available as can be seen above. To overcome that some additional work will be required.
I am sure that if I left it there then no-one would have any complaints. Just a small correction to the figures already used and in the dust with all the big, grownup, stuff we all need to really, really concentrate on.
But why stop at Annual? There is nothing except human perception/timescales that limits this work to Annual. The basic principles of filters work over all timescales. We are, after all, trying to determine how this complex load that is the Earth reacts to the constantly varying surface input and surface reflection/absorption with very long timescale storage and release systems including phase change, mass transport and the like.
If this were some giant mechanical structure slowly vibrating away we would run low pass filters with much longer time constants to see what was down in the sub-harmonics. So let’s do just that for Climate.
If you do a standard time/energy low pass filter sweep against the data you will notice that there is a sweet spot around 12-20 years where the output changes very little. This looks like it may well be a good stop/pass band binary chop point. So let’s choose 15 years as the roll off point and see what happens. Remember this is a standard low pass/band-pass filter, similar to the one that splits telephone from broadband to connect to the Internet. All frequencies of any period above 15 years will be preserved in the output. All frequencies below that point will be removed.
Now it starts to get interesting. I have been accused of all sorts of ‘cycle mania’ for that plot.
I did nothing. I just ran a filter. Out pops some wriggle in that plot which the data draws all on its own at around ~60 years and I get it in the neck. It’s the data what done it – not me!
If you see a ‘cycle’ then that’s your perception. What you can’t do is say it is not there. That’s what the DATA says is there.
The extra >75 years single mean is to remove the discovered ~60 year line as one would normally do to get whatever residual is left.
The UAH and RSS data series are too short to run a full >15 year pass on them but it is possible to do a >7.5 Year which I’ll leave as an exercise to the reader.
And that full kernel problem? Let’s add a Savitzky-Golay filter to the set. Why S-G? Well it is the Engineering equivalent of LOWESS in statistics so should not meet much resistance from statisticians (want to bet?).
We can verify that the parameters chosen are correct because the line closely follows the full kernel filter if used as a training/verification guide. I have removed the early part of the line for one very good reason. It is the same reason why this should not be considered an absolute guide to the future also. Like LOWESS, S-G will ‘whip’ around on new data like a caterpillar searching for a new leaf. It is likely that it will follow some similar trajectory but this is an estimate, not a certainty.
Currently shows that we are over a local peak and headed downwards! That may well change so……”Caution Will Robinson”.
Let’s add in all of the major data sets now, UAH, RSS, HadCrut and GISS. With Annual and >15 year CTRM filters added.
The first thing to note here is that, rather obviously, they are not on the same Anomaly baselines. Not surprising really and, again, I do find it odd that after all this time we still have not attempted to bring together the Satellite and Thermometer sets so let’s do that step and align them over the whole of their overlap period, from 1979 to today.
A couple of points to note here. The data sets all align rather well 1979 – 2013 (inclusive). There is a significant divergence between HadCrut and GISS from 1979 back to the start of the record where they do seem to align once again. I am not sure why that is the case.
Please also note, these are just vertical offset changes. The values come from standard OLS matching over the period in question. Nothing unusual there. Cowtan & Way did it locally, this is just a Global version of that same step.
To give the whole thing context and to just stretch the plot so we have a further 50 years back into the past, let’s add in some well-known proxy data sets.
And there we have it. A simple data treatment of the various Temperature data sets we have brought together in one place and on a common baseline.
Something to test the various claims made as to how this whole thing works. Want to compare it against CO2. Go for it. Want to check SO2. Again fine. Volcanos? Be my guest. Want to add more Proxies? The picture changes slightly. Just don’t complain if the data doesn’t match your expectations. This is data and summaries of the data. An Occam’s Razor of a temperature series. Very, very simple but incredibly revealing.
And now the bun fight starts