The units of measurement for software reliability
How do the people define software reliability? One answer can be found by analyzing defect report logs: one study found that 42.6% of fault reports were requests for an enhancement, changes to documentation, or a refactoring request; a study of NASA spaceflight software found that 63% of reports in the defect tracking system were change requests.
Users can be thought of as broadly defining software reliability as the ability to support current (i.e., the software works as intended) and future needs (i.e., functionality that the user does not yet know they need, e.g., one reason I use R, rather than Python, for data analysis, is because I believe that if a new-to-me technique is required, a package+documentation supporting this technique is more likely to be available in R).
Focusing on current needs, the definition of software reliability depends on the perspective of who you ask, possibilities include:
- commercial management: software reliability is measured in terms of cost-risk, i.e., the likelihood of losing an amount £/$ as a result of undesirable application behavior (either losses from internal use, or customer related losses such as refunds, hot-line support, and good will),
- Open source: reliability has to be good enough and at least as good as comparable projects. The unit of reliability might be fault experiences per use of the program, or the number of undesirable behaviors encountered when processing pre-existing material,
- user-centric: mean time between failure per uses of the application, e.g., for a word processor, documents written/edited. For compilers, mean time between failure per million lines of source translated,
- academic and perhaps a generic development team: mean time between failure per million lines/instructions executed by the application. The definition avoids having to deal with how the software is used,
- available data: numeric answers require measurement data to feed into a calculation. Data that is relatively easy to collect is cpu time consumed by tests that found some number of faults, or perhaps wall time, or scraping the bottom of the barrel the number of tests run.
If an organization wants to increase software reliability, they can pay to make the changes that increase reliability. Pointing this fact out to people can make them very annoyed.
In my experience, fulfillment of user needs is commonly used to define software quality.
@Joel Thurlby
Quality is a nebulous concept that invariably involves reliability. The ISO standard definition of quality is “fitness for purpose”, which sounds great, but actually just moves the discussion on to arguing over fitness and purpose.
I agree. However, for me this begs the question if software reliability should be treated like other software quality properties.
Like other software quality properties, the attempt to measure them independent of the target systems is futile since they vary over time and context.
In my experience, software quality properties are typically seen as goals and not measurable properties. Software reliability is something that should built into software not measured.
Are you familiar with the research by the company SoftRel? The presentations by Ann Marie Neufelder provide some interesting views into the impact of the software development process on software defect density.
@Joel Thurlby
Building reliability into software is the magic pixy dust model of development. Measurements might show that the pixy dust [insert trendy technique] does not work as claimed, so measuring is discouraged and those that exist are bad-mouthed.
I had not heard of Ann Marie Neufelder (companies focusing on defense consulting tend to be invisible outside that domain). I found a copy of her 1992 book “Ensuring Software Reliability”, and it is a readable summary of NASA/DOD work to date. The thing to remember about this early research is that the software was often embedded in a military system, where the cost of field updates is very high.