Comments on the COVID-19 model source code from Imperial
At the end of March a paper modelling the impact of various scenarios on the spread of COVID-19 infections, by the MRC Centre for Global Infectious Disease Analysis at Imperial College, appears to have influenced the policy of the powers that be. This group recently started publishing their modelling code on GitHub (good for them).
Most of my professional life has been spent analyzing other peoples’ code, for one reason or another (mostly Fortran, then Pascal, and then C). I had heard that the Imperial software was written in C, but the released code is written in R (as of six hours ago, there is the start of a Python version). Ok, I can work with R, but my comments will be general, since I don’t have lots of in depth experience reading R code.
The code comes from a research context, and is evolving, i.e., some amount of messiness is to be expected.
There is not a lot of code to talk about (248 lines setting things up, 111 lines for a Stan model, 371 lines of plotting code, and 85 lines of utility code). The analysis is performed by creating a model using the Stan statistical inference language (in which the high level structure of the problem is specified, compiled to a lower level form and then run; the Stan language is very similar to R). These days, lots of problems are coded using a relatively small number of lines that call fancy libraries to do the heavy lifting. It is becoming rare to have to write tens of thousands of lines of code to solve a problem.
I have two points to make about the code, all designed to reduce the likelihood of mistakes being made by the person working on the source. These points mainly apply to the Stan code, because that is where the important stuff happens, but are equally applicable to all code.
- Numeric literals are embedded in the code, values include:
2.4
,1.0
,0.5
,0.03
,1e-5
, and1e-9
. These values obviously mean something to the person who wrote the code, and they can probably be interpreted by experts in the spread of virus infections. But why are they scattered about the code, rather than appearing together (as a sequence of assignments to variables with meaningful names)? Having all the constants in one place makes it easier to spot when a mistake has been made, e.g., one value has been changed without a corresponding change in another value; it also makes it easier for people new to the code to figure out what is going on, - when commenting out code, make it very obvious, e.g., have
/**********************
on its own line, and*****************************/
on its own line. Using just/*
and*/
makes it easy to miss that code has been commented out.
Why have they started a Python implementation? Perhaps somebody on the team is more comfortable working with Python (when deadlines loom, it is always best to go with what you know).
Having both an R and Python version is good, in that coding mistakes are likely to show up as inconsistencies in the results produced. It’s always good to have the output of two independently written programs to compare (apart from the fact it may cost twice as much).
The README mentions performance issues. I imagine that most of the execution time is spent in the Stan code, so R vs. Python is not a performance issue.
Any reader with expertise tuning Stan models for performance might like to check out the code. I’m sure the Imperial folk would be happy to hear about worthwhile speed-ups.
Update
The R source code of the EuroMOMO model, which aims to “… explain number of deaths by a baseline, influenza activity and extreme ambient temperature.”
Thanks for posting this. The world could do with more reviews of code.
It looks to me as if the model does not include what seems to be the single most important intervention: to manage sick people at home as long as possible, and when hospitalisation is necessary, to effectively isolate covid-19 patients from other patients and staff.
“Why have they started a Python implementation?”
– perhaps because they are trying to get R as small as possible?
😉
@Michael Power
Building a reliable model requires lots of data, and I imagine that detailed data on epidemics is hard to come by (data on most things is hard to come by, because people are not willing to invest in collecting it). I am guessing that the current model is what’s they have from work on previous epidemics.
The model stuck me as very simplistic. There is no modeling of population density and interaction between subpopulations. But this all takes lots of data, which I guess is being gathered now and will be used for future models.
The combination of unit tests and code review can take a project from one level to the next in terms of professionalism. The biggest difference in my experience between good amateur programmers and experienced professional programmers is that the professionals know how to work on large code bases in large groups, which means a different set of standards for code quality, especially in terms of readability. It all starts with these kinds of reviews!
As a computer scientist, I find R a challenge to read. This is despite working with statisticians in R for the last ten years. Like Microsoft Office, R tries to guess what you want to do, making it very hard to control scope and data types (because of the language) and namespaces (because of the practices that have evolved in the language).
The numeric literals can be recoded as data in Stan. If they are delcared as data, they need to be input from the outside. If they are declared as transformed data, they can be defined in the transformed data block.
Commented out code is almost always a bad idea. But if you are going to comment out code, then double forward slashes in front of all the lines is the way to go. Just as it is in C++. That’s because you can see the shape of the comment that way and don’t have to do any begin/end matching.
The Stan program style guide appears as an appendix in the user’s guide. Just yesterday, I finished revising an outstanding pull request (https://github.com/stan-dev/docs/pull/143) for the Stan user’s guide with a bunch of new chapters on model testing, including simulation based calibration to evaluate the software, prior predictive checks to evaluate the prior, posterior predictive checks to evaluate model fit to data, and cross-validation to evaluate model fit to held out data.
One of the motivations for developing the Stan language standalone (like BUGS, JAGS, and ADMB, but unlike PyMC3, Pyro, and NIMBLE) was to allow portability of models among users of REPL-style analysis languages like R, Python, Julia, and MATLAB.
I thought this was the epidemiology model we already code reviewed and helped speed up, but apparently not. I’ll review their Stan code for efficiency directly as an issue on their GitHub repo.
P.S. The Stan C++ code could use a review. We try to follow good practices, but it’s hard in a fast-moving project built by a wide range of volunteers.
P.
@Derek Jones
There is a huge literature on epidemics and lots of data available on previous outbreaks, both large and small scale. Like other complicated data environments, it’s not so easy to find data exactly like what you want.
Here’s an overview of the model they’re using: https://www.imperial.ac.uk/mrc-global-infectious-disease-analysis/covid-19/report-13-europe-npi-impact/
And here’s an overview of the general class of SIR (susceptible/infected/recovered) models and how to code them in Stan by an overlapping set of authors: https://www.sciencedirect.com/science/article/pii/S1755436519300325
Most of these numbers are explained in the appendix to their report, I think, e.g. 2.4 is the mean of the prior for R_0, but the names in the code are perhaps not always connected to these. They also have different IFRs fore different countries based on contact between age groups. I made a fork (before they started the Python implementation) to make it easier to specify the number of iterations and run the code with updated ECDC data.
@Bob Carpenter
Testing, unit or otherwise, is always good. It’s amazing how many people fail to invest in testing. I’m a fan of high code coverage; code that has not been covered by executing a test is obviously untested.
Code reviews are an expensive way of finding coding mistakes, but good for finding high level problems and educating everybody about what everybody else is doing.
I’m a fan of static analysis tools. They allow developers to find mistakes in private (nobody likes their mistakes made public). GCC’s
-Wall
option picks up lots of issues (ok, sometimes it picks up trivial stuff, but it’s worth fixing these for the sake of a clean compile).Domain specific languages, such as R, have a very specific view of the world. And yes, they push you along the path of this view of the world; I’m not sure what accent native R users have, perhaps that is what makes reading other peoples’ R a chore, there is no common style. General purpose languages, as the name suggests, have no view of the world (but they do have a view of how computers operate, and languages like C++ have a view about how algorithms should be glued together).
I’m not a big C++ expert, but I know people who are. I will suggest the Stan code as something they might like to look at to help the COVID effort.
@Michael Power
A later paper Report 9: Impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand looks like it is modeling the issues you are interested in.
Thanks for the comments and link Derek – I’d heard the model was OpenSource but couldn’t find it.
Apparently Microsoft have devoted engineers to improve the model – mentioned in this Nature report https://www.nature.com/articles/d41586-020-01003-6 but I’ve read it elsewhere with mention of refactoring.
Although I have no knowledge of R I found the code interesting to look at – funny how some of it is devoted to handling command line arguments! This big important piece of code and it still needs to parse argc and argv!
While I’m a little bit worried about the size of some functions and the lack of unit testing I’m also impressed that this little bit of “codified knowledge” is driving the world (or at least the UK) right now. Who would have thought 200 odd lines of code could change the world?
@Allan Kelly
Whoops, 200 -> “a few hundred lines of code”