Learning useful stuff from the Projects chapter of my book
What useful, practical things might professional software developers learn from the Projects chapter in my evidence-based software engineering book?
This week I checked the projects chapter; what useful things did I learn (combined with everything I learned during all the other weeks spent working on this chapter)?
There turned out to be around three to four times more data publicly available than I had first thought. This is good, but there is a trap for the unweary. For many topics there is one data set, and that one data set may not be representative. What is needed is a selection of data from various sources, all relating to a given topic.
Some data is better than no data, provided small data sets are treated with caution.
Estimation is a popular research topic: how long will a project take and how much will it cost.
After reading all the papers I learned that existing estimation models are even more unreliable than I had thought, and what is more, there are plenty of published benchmarks showing how unreliable the models really are (these papers never seem to get cited).
Models that include lines of code in the estimation process (i.e., the majority of models) need a good estimate of the likely number of lines in the final software system. One issue that nobody had considered was the impact of developer variability on the number of lines written to implement the same functionality, which turns out to be large. Oops.
Machine learning has infested effort estimation research. What the machine learning models actually do is estimate adjustment, i.e., they do not create their own estimate but adjust one passed in as input to the model. Most estimation data sets are tiny, and only contain a few different variables; unless the estimate is included in the training phase, the generated model produces laughable results. Oops.
The good news is that there appear to be lots of recurring patterns in the project data. This is good news because recurring patterns are something to be explained by a theory of software project development (apparent randomness is bad news, from the perspective of coming up with a model of what is going on). I think we are still a long way from having workable theories, but seeing patterns is a good sign that one or more theories will be possible.
I think that the main takeaway from this chapter is that software often has a short lifetime. People in industry probably have a vague feeling that this is true, from experience with short-lived projects. It is not cost effective to approach commercial software development from the perspective that the code will live a long time; some code does live a long time, but most dies young. I see the implications of this reality being a major source of contention with those in academia who have spent too long babbling away in front of teenagers (teaching the creation of idealized software that lives on forever), and little or no time building software systems.
A lot of software is written by teams of people, however, there is not a lot of data available on teams (software or otherwise). Given the difficulty of hiring developers, companies have to make do with what they have, so a theory of software teams might not be that useful in practice.
Readers might have a completely different learning experience from reading the projects chapter. What useful things did you learn from the projects chapter?
I’m still working through your “Most software has a very short lifespan” finding. Its intriguing.
Looking back at you February post, and the killed by Google graph, assuming (bit assumption) that this is representative of the industry as a whole would you go along with the statement “Most software has a very short lifespan, but a few pieces have very long lifespans.”
e.g. Most of the banking software written in the 1970s is long gone but a few examples survive, even some system may survive but with every line change
Assuming that there are two alternative lessons one might draw:
A) Stop worrying about the future because our system will probably die soon
or
B) We need to do a better job of increasing life expectancy.
If A is true then don’t worry about future maintainability, documentation, or regression tests. If it lives then pay the price for these things when it happens.
If B is true then the industry needs to do a better job at those same things so that software lives longer.
My *guess* is that most software dies not because it is inherently bad (although some is) but because the customer moves on. Therefore A is true.
@allan kelly
There are various survival curves in the projects chapter showing the lifespan of source code and packages, the Google/Japanese mainframe graph is for shipping products.
Why does software have to live forever?
Software does a job, the job goes away, the software stops being used; or in Google’s case there are not enough users to make it worth their while supporting the software.
Companies want their software to live forever, because then they don’t have to spend money writing new software.
The question that finite software lifetime raises is: what is the minimum I need to invest in creating this software.
It’s cheaper to fix what survives than invest the same amount in everything.