Projects chapter added to “Empirical software engineering using R”
The Projects chapter of my Empirical software engineering book has been added to the draft pdf (download here).
This material turned out to be harder to bring together than I had expected.
Building software projects is a bit like making sausages in that you don’t want to know the details, or in this case those involved are not overly keen to reveal the data.
There are lots of papers on requirements, but remarkably little data (Soo Ling Lim’s work being the main exception).
There are lots of papers on effort prediction, but they tend to rehash the same data and the quality of research is poor (i.e., tweaking equations to get a better fit; no explanation of why the tweaks might have any connection to reality). I had not realised that Norden did all the heavy lifting on what is sometimes called the Putnam model; Putnam was essentially an evangelist. The Parr curve is a better model (sorry, no pdf), but lacked an evangelist.
Accurate estimates are unrealistic: lots of variation between different people and development groups, the client keeps changing the requirements and developer turnover is high.
I did turn up a few interesting data-sets and Rome came to the rescue in places.
I have been promised more data and am optimistic some will arrive.
As always, if you know of any interesting software engineering data, please tell me.
I’m looking to rerun the workshop on analyzing software engineering data. If anybody has a venue in central London, that holds 30 or so people+projector, and is willing to make it available at no charge for a series of free workshops over several Saturdays, please get in touch.
Reliability chapter next.
Hi
Have a look at my homepage for studies where you could get data from. Especially http://www.torkar.se/resources/ese-exp-test.pdf might be of interest, but there are other fault prediction studies that might be of interest also? In the future I’ll always post all data on https://osf.io/ for the sake of openness. Many companies have no problems with that, and in the worst case you can anonymize the data fairly well.
@Richard Torkar
Thanks for the link to the paper; I have several of your papers in my archive, but for some reason not that one. The first author will be getting an email asking for data 🙂
Can I suggest that you post to multiple repositories; you never know when one may disappear.
It’s always easier to get data by promising not to make it public, even in anonymous form. I hope that, like you, researchers will push harder to be allowed to make their data public.