The fuzzy line between reworking and enhancing
One trick academics use to increase their publication count is to publish very similar papers in different conferences/journals; they essentially plagiarize themselves. This practice is frowned upon, but unless referees spot the ‘duplication’, it is difficult to prevent such plagiarized versions being published. Sometimes the knock-off paper will include additional authors and may not include some of the original authors.
How do people feel about independent authors publishing a paper where all the interesting material was derived from someone else’s paper, i.e., no joint authors? I have just encountered such a case in empirical software engineering.
“Software Cost Estimation: Present and Future” by Siba N. Mohanty from 1981 (cannot find a non-paywall pdf via Google; must exist because I have a copy) has been reworked to create “Cost Estimation: A Survey of Well-known Historic Cost Estimation Techniques” by Syed Ali Abbas and Xiaofeng Liao and Aqeel Ur Rehman and Afshan Azam and M. I. Abdullah (published in 2012; pdf here); they cite Mohanty as the source of their data, some thought has obviously gone into the reworked material and I found it useful and there is a discussion on techniques created since 1981.
What makes the 2012 stand out as interesting is the depth of analysis of the 1970s models and the data, all derived from the 1981 paper. The analysis of later models is not as interesting and doe snot include any data.
The 2012 paper did ring a few alarm bells (which rang a lot more loudly after I read the 1981 paper):
- Why was such a well researched and interesting paper published in such an obscure (at least to me) journal? I have encountered such cases before and had email conversations with the author(s). The well-known journals have not always been friendly towards empirical research, so an empirical paper appearing in less than a stellar publication is not unusual.
As regular readers will know I am always on the look-out for software engineering data and am willing to look far and wide. I judge a paper by its content, not the journal it was published in
- Why, in 2012, were researchers comparing effort estimation models proposed in the 1970s? Well, I am, so why not others? It did seem odd that I could not track down papers on some of the models cited, perhaps the pdfs had disappeared since 2012??? I think I just wanted to believe others were interested in what I was interested in.
What now? Retraction watch offers some advice.
The Journal of Emerging Trends in Computing and Information Sciences has an ethics page, I will email them a link to this post and see what happens (the article in question is listed as their second most cited article last year, with 19 citations).
Microcomputers ‘killed’ Ada
In the mid-70s the US Department of Defense decided it could save lots of money by getting all its contractors to write code in the same programming language. In February 1980 a language was chosen, Ada, but by the end of the decade the DoD had snatched defeat from the jaws of victory; what happened?
I think microcomputers are what happened; these created whole new market ecosystems, some of which were much larger than the ecosystems that mainframes and minicomputers sold into.
These new ecosystems sucked up nearly all the available software developer mind-share; the DoD went from being a major employer of developers to a niche player. Developers did not want a job using Ada because they thought that being type-cast as Ada programmers would overly restrict their future job opportunities; Ada was perceived as a DoD only language (actually there was so little Ada code in the DoD, that only by counting new project starts could it get any serious ranking).
Lots of people were blindsided by the rapid rise (to world domination) of microcomputers. Compilers could profitably be sold (in some cases) for thousands or hundreds of dollars/pounds because the markets were large enough for this to be economically viable. In the DoD ecosystems, compilers had to be sold for thousands or hundreds of thousands of dollars/pounds because the markets were small. Micros were everywhere and being programmed in languages other than Ada; cheap Ada compilers arrived after today’s popular languages had taken off. There is no guarantee that cheap compilers would have made Ada a success, but they would have ensured the language was a serious contender in the popularity stakes.
By the start of the 1990s, Ada supporters were reduced to funding studies to produce glowing reports of the advantages of Ada compared to C/C++ and how Ada had many more compilers, tools and training than C++. Even the 1991 mandate “… where cost-effective, all Department of Defense software shall be written in the programming language Ada, in the absence of special exemption by an official designated by the Secretary of Defense.” failed to have an impact and was withdrawn in 1997.
The Ada mandate was cancelled as the rise of the Internet created even bigger markets, which attracted developer mind-share towards even newer languages, further reducing the comparative size of the Ada niche.
Astute readers will notice that I have not said anything about the technical merits of Ada, compared to other languages. Like all languages, Ada has its fanbois; these are essentially much older versions of the millennial fanbois of the latest web languages (e.g., Go and Rust). There is virtually no experimental evidence that any feature of any language is best/worse than any feature in any other language (a few experiments showing weak support for stronger typing). To its credit, the DoD did fund a few studies, but these used small samples (there was not yet enough Ada usage to make larger sample possible) that were suspiciously glowing in their support of Ada.
We hereby retract the content of this paper
Yesterday I came across a paper in software engineering that had been retracted, the first time I had encountered such a paper (I had previously written about how software engineering is great discipline for an academic fraudster).
Having an example of the wording used by the IEEE to describe a retracted paper (i.e., “this paper has been found to be in violation of IEEE’s Publication Principles”), I could search for more. I get 24,400 hits listed when “software” is included in the search, but clicking through the pages there are just 71 actual results.
A search of Retraction Watch using “software engineering” returns nine hits, none of which appear related to a software paper.
I was beginning to think that no software engineering papers had been retracted, now I know of one and if I am really interested the required search terms are now known.
Two approaches to arguing the 1969 IBM antitrust case
My search for software engineering data has made me a regular customer of second-hand book sellers; a recent acquisition is: “Big Blue: IBM’s use and abuse of power” by Richard DeLamarter, which contains lots of interesting sales and configuration data for IBM mainframes from the first half of the 1960s.
DeLamarter’s case, that IBM systematically abused its dominant market position, looked very convincing to me, but I saw references to work by Franklin Fisher (and others) that, it was claimed, contained arguments for IBM’s position. Keen to find more data and hear alternative interpretations of the data, I bought “Folded, Spindled, and Mutilated” by Fisher, McGowan and Greenwood (by far the cheaper of the several books that have written on the subject).
The title of the book, Folded, Spindled, and Mutilated, is an apt description of the arguments contained in the book (which is also almost completely devoid of data). Fisher et al obviously recognized the hopelessness of arguing IBM’s case and spend their time giving general introductions to various antitrust topics, arguing minor points or throwing up various smoke-screens.
An example of the contrasting approaches is calculation of market share. In order to calculate market share, the market has to be defined. DeLamarter uses figures from internal IBM memos (top management were obsessed with maintaining market share) and quote IBM lawyers’ advice to management on phrases to use (e.g., ‘Use the term market leadership, … avoid using phrasing such as “containment of competitive threats” and substitute instead “maintain position of leadership.”‘); Fisher et al arm wave at length and conclude that the appropriate market is the entire US electronic data processing industry (the more inclusive the market used, the lower the overall share that IBM will have; using this definition IBM’s market share drops from 93% in 1952 to 43% in 1972 and there is a full page graph showing this decline), the existence of IBM management memos is not mentioned.
Why do academics risk damaging their reputation by arguing these hopeless cases (I have seen it done in other contexts)? Part of the answer is a fat pay check, but also many academics’ consider consulting for industry akin to supping with the devil (so they get a free pass on any nonsense sprouted when “just doing it for the money”).
Books similar to my empirical software engineering book
I am sometimes asked which other books are similar to the Empirical Software Engineering book I am working on.
In spirit, the most similar book is “Software Project Dynamics” by Abdel-Hamid and Madnick, based on Abdel-Hamid’s PhD thesis. The thesis/book sets out to create an integrated model of software development projects, using system dynamics (the model can be ‘run’ to produce outputs from inputs, assuming the necessary software is available).
Building a model of the software development process requires figuring out the behavior of all the important factors, and Abdel-Hamid does a thorough job of enumerating the important factors and tracking down the available empirical work (in the 1980s). The system dynamics model, written in Dynamo, appears in an appendix (I have not been able to locate any current implementation).
In the 1980s, I would have agreed with Abdel-Hamid that it was possible to build a reasonably accurate model of software development projects. Thirty years later, I have tracked down a lot more empirical work and know a more about how software projects work. All this has taught me is that I don’t know enough to be able to build a model of software development projects; but I still think it is possible, one day.
There have been other attempts to build models of major aspects of software development projects (all using system dynamics), including Madachy’s PhD and later book “Software Process Dynamics”, and Buettner’s PhD (no book, yet???).
There are other books that include some combination of the words empirical, software and engineering in their title. On the whole, these are collections of edited papers, whose chapters are written by researchers promoting their latest work; there is even one that aims to teach students how to do empirical work.
Dag Sjøberg has done some interesting empirical work and is currently working on an empirical book, this should be worth a look.
“R in Action” by Kabacoff is the closest to the statistical material, but at a more general level. “The R Book” by Crawley is the R book I would recommend, but it is not at all like the material I have written.
Recent Comments