Street cred has no place in guidelines for nuclear power stations
The UK Government recently gave the go ahead to build a new nuclear power station in the UK. On Friday I spotted the document COMPUTER BASED SAFETY SYSTEMS published by the UK’s Office for Nuclear Regulation.
This document does a good job of enumerating all of the important software engineering issues in short, numbered, sentences, until sentence 54 of Appendix 1; “A1.54 The coding standards should prohibit the following practices:-“. Why-o-why did the committee of authors choose to stray from the approach of providing a high level overview of all the major issues? I suspect they wanted to prove their street cred as real software developers. As usually happens in such cases the end result looks foolish and dated (1970-80s in this case).
The nuclear industry takes it procedures a lot more seriously than most other industries, which means some poor group of developers are going to have to convince a regulator with minimal programming language knowledge that they are following this rather nebulous list of prohibitions.
What does the following mean? “5 Multiple use of variables – variables should not be used for more than one function;”. It could be read to mean no use of global variables, but is probably intended to cover something like the role of variables idea.
How is ‘complicated’ calculated in the following? “9 Complicated calculation of indexes;”
Here is my favorite: “15 Direct memory manipulation commands – for example, PEEK and POKE in BASIC;”. More than one committee member obviously had a BBC Micro or Sinclair Spectrum as a teenager.
What should A1.54 say? Something like: “A coding guideline document listing the known problematic areas of the language(s) used along with details of how to handle each area will be written. All staff will be given training on the use of these guidelines.”
The regulator needs to let the staff hired following A1.4 do their job: “A1.4 Only reputable companies should be used in all stages of the lifecycle of computer based protection systems. Each should have a demonstrably good track record in the appropriate field. Such companies should only use staff with the appropriate qualifications and training for the activities in which they are engaged. Evidence that this is the case should be provided.”
After A1.54 has been considerably simplified, A1.55 needs to be deleted: “A1.55 The coding standards should encourage the following:-“. Either require it or not. I suspect the author of “6 Explicit initialising of all variables;” had one of a small number of languages in mind, those that support implicit initialization with a defined value: many don’t, illustrating how language specific coding guidelines need to be.
Following the links in the above document led to: Verification and Validation of Software Related to Nuclear Power Plant Instrumentation and Control which contained some numbers about Sizewell B I had not seen before in public documents: “The total size of the source code for the reactor protection functions, excluding comments, support software for the autotesters and communications to other systems, is around 100 000 unique lines. A typical processor contains between 10 000 and 40 000 lines of source code, of which about half are typically from common functions, and the remainder form application code. In addition to the executable code, the PPS incorporates around 100 000 lines of configuration and calibration data per guardline associated with the reactor protection functions.”
If you look at something like the Fukushima reactor that failed a few years back, there isn’t much mystery about what went wrong. Yes, well there was a tsunami but Japan is known to be earthquake prone… that wasn’t the real problem. The reactor was old, past its design life, and should have been shut down. To make things worse, they did not have operational backup power onsite that could guarantee backup power in the event that the main grid went down.
I think it’s great that people study software reliability but nuclear power is mostly a trust problem. Look at the Chernobyl disaster: multiple safety systems were manually shut down, an experiment was performed but not in the way originally intended, there were multiple physical design flaws (such as graphite tips on the control rods, and positive feedback) in the reactor itself, the crew performing the experiment were not adequately briefed on the process. In short, a cascade of human errors.
Fixing software wouldn’t have prevented Chernobyl and Fukushima from happening.
It may stop some other disaster that hasn’t happened yet but when the reactor operators don’t even follow the appropriate design rules, software is the least of your problems.