16/11/2025
Many important political debates and policy decisions depend on reliable government statistics, but as revealed in this report some of these statistics are far less reliable than the general public may be lead to believe.
Datageddon: Britain's stats have become dangerously unreliable
Britain is facing a quiet crisis — its data is breaking down, and the government’s numbers are increasingly unreliable.In this episode of Reality Check, econ...
28/10/2025
Congratulations to my friend Theo Acheampong, who has just published some research along similar lines to that of Pepeah Ateh that I posted last year. Theo's (and co-author Emmanuel Cobbold) work looks at the relationship between natural resource depletion and external debt.
They use GMM (generalized method of moments) to estimate linear relationships between financial & agricultural indicators across 53 countries from 1998 to 2021 (i.e. panel data). Instrumental & lagged variables are used to control for potential biases caused by omitted variables & endogeneity. Robust standard errors are used to fix potential problems with heteroskedasticity, cross-sectional dependence and autocorrelation.
Their findings illuminate the channels by which external government debt often leads to an increase in natural resource depletion, and they have some policy recommendations to reduce this problem.
Here's a link to the paper:
www.sciencedirect.com
26/01/2025
Back in 2020 I posted a link to rootclaim.com: a company that uses Bayesian probabilistic analysis to calculate probabilities of various current events.
Their analysis showed that Covid-19 was more likely to have originated as a result of a lab leak than an infection from an animal, contradicting the consensus among most government agencies and news outlets at that time, and considered somewhat of a conspiracy theory: https://www.rootclaim.com/analysis/What-is-the-source-of-COVID-19-SARS-CoV-2
However, rootclaim.com can feel somewhat vindicated: the CIA, which puts a lot of resources into investigating important current events such as this, recently announced that they also think the Covid-19 pandemic was more likely a result of a lab leak than an infection from an animal, although they are not very sure about it: https://www.nytimes.com/2025/01/25/us/politics/cia-covid-lab-leak.html
What is the source of COVID-19 (SARS-CoV-2)? | Rootclaim
Rootclaim helps people understand complex issues by combining the power of crowdsourced information with the mathematical validity of statistics.
19/06/2024
Statisticians like myself often complain about the way statistics are reported in the media, but I think this guy has done a good job here:
Inflation drop is 'watershed moment' - Ed Conway analysis
Sky's economics and data editor Ed Conway reflects on the drop in inflation, which has fallen to the Bank of England's target rate of 2% for the first time i...
22/08/2023
I recently found out about a new study of the effects of the ULEZ (Ultra-low emmissions zone) scheme on air pollution in London.
The findings a different from those of a previous study that I wrote about on this page in 2019.
The 2019 study reported a 29% drop in NO2 pollution levels which I estimated to be roughly equivalent to around 1 month of extra life expectancy per person working full-time in central London.
However, the new study estimates that only around 3% of the reduction in NO2 levels can actually be attributed to the introduction of the ULEZ scheme itself. The remaining 26% could be attributed to other causes such as the introduction of the T-charge (a charge levied on highly polluting vehicles which was introduced in October 2017 and replaced by ULEZ in 2019), or the general increase in uptake of low emissions vehicles that was already occurring.
This newer study is much more sophisticated than the previous one; they use a machine learning model (GBM) to predict pollution levels based on environmental variables such as temperature, air pressure, wind speed, day of week, and crucially, time. This model is then used to find the dependence of pollution levels on time alone by averaging out predictions over different configurations of the other variables. The motivation behind this approach is to remove the effects of these potentially confounding environmental variables (and other sources of unwanted variation) from the data more accurately than a linear model could. The new normalized data is then used in a breakpoint and RDD (regression discontinuity design) analysis to find the causal effect of the ULEZ scheme.
Since some drivers may alter their behaviour in anticipation of ULEZ or slightly after its introduction, a time window around the ULEZ date is removed from the data to be more sure that the full effect can be determined. To determine the appropriate location of this window various robustness and model validation checks were performed.
Overall this study is much more sophisticated & nuanced than the previous one. The difference in findings is probably mostly due to the fact that the previous study included the effects of both the T-charge and ULEZ combined, whereas the newer study examines the effects of ULEZ in isolation (i.e. how much it adds to the effect already produced by the T-charge).
The newer study was also more careful to control for the general trend of decreasing vehicle emissions due to higher usage of low emissions vehicles that would have occurred even without ULEZ.
However, the study time period only covers the first 9 months of ULEZ. Money obtained by the scheme is to be reinvested into other emissions reducing measures (such as purchasing more low emissions buses), and that may take more time to have an effect. It may be that the time period covered by the study is not long enough to give an accurate figure of overall emissions reductions caused by ULEZ.
Has the ultra low emission zone in London improved air quality? - IOPscience
Published 16 November 2021 • © 2021 The Author(s). Published by IOP Publishing Ltd Environmental Research Letters, Volume 16, Number 12 Citation Liang Ma et al 2021 Environ. Res. Lett. 16 124001 DOI 10.1088/1748-9326/ac30c1
13/08/2022
The devastating fires seen this summer across Europe are warning signs of what's to come in years ahead due to global warming.
At the same time Europe is suffering from an energy crisis due to an embargo on Russian gas, and will probably need to implement rationing policies this coming winter.
Forecasting energy demand (load forecasting) is an important but difficult task that plays a vital role in mitigating both of these disasters.
Energy companies needs to supply enough gas & electricity to fill the demand from their customers, but not too much.
Electricity cannot yet be stored in large quantities, and so any excess production goes to waste.
Most of the worlds electricity is currently produced by coal or gas fired generators which emit large amounts of CO2 adding to global warming, so its important to minimize any excess production, and hence the need for good forecasts.
Many years ago I did some work developing a load forecasting model for Matrica, an energy forecasting company: https://matrica.co.uk/
At that time they were using standard statistical time series models, but were interested in looking into new techniques.
I implemented an advanced machine learning model in c++ code, which was cutting edge at that time; Markov Chain Monte Carlo sampling of neural network parameters in a Bayesian framework, based on the work of Radford Neal: https://www.cs.toronto.edu/~radford/
Neural network models have increased dramatically in size & complexity since then, typically using billions of parameters, whereas my model used just a few hundred. The range of tasks now possible using deep learning neural network models is very impressive; world beating board game players, natural language processing, novel art generation, accurate protein folding prediction, etc.
DeepMind, a British AI company owned by Google, implemented a deep learning model that reduced the electricity consumption of their data centres by 40% by predicting the best times to turn the cooling units on & off: https://www.blog.google/outreach-initiatives/environment/deepmind-ai-reduces-energy-used-for/
However, these large models require massive datasets and a huge amount of computing power to train.
Electricity demand on the grid is dependent on many factors for which there may not be much data, such as political events (e.g. the Russian invasion of Ukraine).
For this reason there is still a place for statistical models in this important area.
The home of energy forecasting | Matrica Products
Matrica provide: energy consultancy, data management, energy demand management and general forecasting systems for the power, gas and renewable sectors.
06/02/2022
A new systematic review from John Hopkins University has looked at studies of the effects of Covid-19 government policies on mortality rates:https://sites.krieger.jhu.edu/iae/files/2022/01/A-Literature-Review-and-Meta-Analysis-of-the-Effects-of-Lockdowns-on-COVID-19-Mortality.pdf
The authors have been very careful to only include certain studies that they consider good quality, i.e. data-based empirical studies that use a difference-in-difference methodology.
They have excluded studies that use epidemiological models that try to predict how many people would have died under different policies. These epidemiological models can be inaccurate if the input parameters are not chosen correctly, or if there are some factors that the model has not properly taken into account (e.g. voluntary changes in behaviour of the public). For example the forecast published by Imperial College in March 2020 has been found to have overestimated deaths by more than 50%: https://www.thelancet.com/journals/lanmic/article/PIIS2666-5247(21)00029-X/fulltext
(this was due more to the inaccuracy of early estimates of infection fatality rates, rather than the model itself).
They are also careful to distinguish between effects due to government mandated policies that restrict freedoms of citizens, and effects due to changes in behaviour of the public in response to information about the pandemic.
Compared to policies based solely on recommendations (e.g. recommending that people stay at home, but not enforcing it), they found that on average the extra effect of lockdown policies was very small, reducing mortality by just 0.2% on average.
More specifically in the case of SIPO's (Shelter In Place Orders) they found that the average effect across all studies was again small; just a 2.9% reduction in deaths.
However, it should be stressed that 2.9% is an average figure, and there is a lot of variation in the results. One paper estimated a reduction of 40.8% deaths due to SIPOs, whereas another reported an increase of 13.1%.
The studies that reported the largest effects were also the ones which covered the shortest time periods, which suggests that perhaps SIPOs are effective at delaying deaths but not so effective at preventing them.
One important question about this meta-study is; have they been biased in their selection criteria for choosing which studies to include, or the weight given to them?
The question of lockdown effectiveness is still a subject of research and debate, and more research needs to be done not only on the health effects but also the economic effects.
A useful resource for anyone interested in Covid-19 government policy is OxCGRT, the Oxford Covid-19 Government Response Tracker: https://covidtracker.bsg.ox.ac.uk
They have created a "stringency" index to measure the strictness of government lockdown policies in different countries, and it is used in several of the studies mentioned above.
sites.krieger.jhu.edu
26/11/2021
Recent news reports from Ethiopia are worrying. The current conflict between the Ethiopian government and Tigray Defence Force is escalating and there is fear of genocide: https://www.theguardian.com/commentisfree/2021/nov/26/ethiopia-genocide-warning-signs-abiy-ahmed
Statistical analysis plays an important role in trying to prevent disasters of this kind. Many years ago I worked with Peter Brorsen (https://rusi.org/people/brorsen), analyzing conflict data and building statistical models to try and determine the political & economic factors which lead to such conflicts, how much effect they have, and the effects of interventions.
Peter later went on to help develop the early warning systems employed by the E.U. and African Union for predicting such conflicts. No doubt it is warnings from the African Union CEWS (Continental Early Warning System) that is currently ringing alarm bells about the situation in Ethiopia: https://www.peaceau.org/en/article/the-continental-early-warning-system
For our analysis we used data from a wide range of sources including:
The World Bank: https://data.worldbank.org
The Uppsala Conflict Data Program: https://www.ucdp.uu.se
The International Peace Research Institute: https://www.prio.org/
Pippa Norris Political Data: https://www.pippanorris.com/data
Freedom House: https://freedomhouse.org/
World Values Survey: https://www.worldvaluessurvey.org
among others.
One of the main problems was trying to sort through the data to find the relevant variables, and integrate it all in a consistent and coherent way.
Another problem that had to be dealt with was the non-normality of the data; several of the variables have fat-tailled distributions which makes extreme events more likely, and invalidates normal analysis. To deal with this problem appropriate transformations need to be applied, however this also makes interpretations more difficult.
This issue was later highlighted by the eminent statistician Nassim Taleb in his critique of Steven Pinkers book "The better nature of our angels": https://www.vox.com/2015/5/21/8635369/pinker-taleb
This fascinating academic debate has huge implications for the future of world peace
Two of the world's most famous public intellectuals are fighting over a really important question: is war actually in decline?
22/11/2021
If there are any medical researchers reading this page, you might be interested in this excellent guide to patient recruitment for clinical trials by my dear friend Dr Gillian Lakareber:
A Guide to Patient Recruitment & Retention in Clinical Research : The 6 Core Outcome Factors Which Affect Patient Recruitment
A Guide to Patient Recruitment & Retention in Clinical Research : The 6 Core Outcome Factors Which Affect Patient Recruitment eBook : Lakareber, Gillian : Amazon.co.uk: Kindle Store