Joke: Student subjects in software engineering experiments
Most academic experiments in software engineering use the students available to the researcher as subjects, often classifying first year as novices and final year or postgrads as experts. If professional developers (i.e., non-student) subjects are used the paper will trumpet this fact; talk of comparing novices and experts is the give-away for an all undergraduate subject line-up. Most computing academics don’t write much software, so they are blissfully ignorant that they and their students are novices compared to a professional developer with a couple of years experience.
Results from well designed and executed experiments can reasonably be extended to cover people who share the skills used by subjects in the experiment. Becoming an expert programmer takes several years of continuous (i.e., several hours a day) practice. Using real experts in a programming experiment means that no measurable change in programming skill will occur during the experiment, while novices are likely to noticeably learn during the experiment and thus introduce unwanted sources of variation into the results. Of course novices will also take longer and are likely to have patterns of behavior that are not yet been selectively tuned to something that works in practice.
There is also an elephant in the room of student subjects in software engineering; some of the students are never going to get jobs in software engineering because they are completely useless at it. How does a student manage to get a degree in a software related subject and be unemployable as a software engineer? Money. Students are attracted by the money and lifestyle they hear a job in software engineering will offer and many Universities are happy to treat the computing department as a cash cow by offering courses that allow students to concentrate on “strategic” subjects and avoid having to get involved in nitty gritty details like programming. The University is probably defrauding some students by accepting them for a software related degree course.
My experience is that professional developers are happy to donate some time to taking part in a software engineering experiment. They just have to be asked, of course I do have the advantage of actually knowing some professional software developers.
The POPL 2015 papers involving C
SIGPLAN (the ACM Special Interest Group on Programming LANguages) has just made available many of the papers that have been accepted for their 2015 POPL conference (Principles of Programming Languages). Good for them. I wish more conferences would do this.
There are three papers involving C, so obviously I have read those first. Two papers are heavy on the mathematics and one not quite so heavy:
- Sound Modular Verification of C Code Executing in an Unverified Context: Describes a tool that takes C source annotated with separation logic and translates it to C source containing runtime checks; it is intended that these runtime checks verify the conditions expressed in the separation logic. Why does the developer add the interface checks in separation logic and then translate, rather than adding them in C in the first place? This was question was not addressed
- Common compiler optimizations are invalid in the C11 memory model and what we can do about it: This sounds like bad news, but the introduction mentions specialist optimizations that are common in that specialist area. There follows 11 pages of mathematics + another five pages in an appendix. Page 12 tells us what it is all about. Some requirements in C11 would be muck up the nice mathematics should CompCert, which currently supports C90, be upgraded to C11. In other words, a compiler implementor is complaining that wording in the standard is making their life difficult (hey, join the queue) and has published a paper about it.
- Formal verification of a C static analyzer: An interesting piece of work spoiled by claims that a soap powder manufacturer would not be able to get away with. Verasco, the static analysis tool described, does its checking on an intermediate language that is two-steps removed from the original C source. Using the authors’ logic, I could bolt on one of the existing Fortran-to-C translators and claim to have a formally-verified Fortran static analyzer, with C being just an intermediate language in the chain. The problem with analyzing an intermediate language is that the transformations that have occurred along the way have changed the semantics of the original code, so the results of any analysis could be different than if applied to the original source. An example from the paper, the code:
z = f(x) + 2 * g(y)
is transformed to:
t1 = f(x); t2 = g(y); z = t1 + 2 * t2;
The implementation thus selects one of the two possible evaluation orders for the functions
f
andg
. It is possible that callingf
beforeg
will result in behavior that is different from callingg
beforef
(no undefined behavior occurs because there is a sequence point before a function returns, using pre-C11 terminology).So Verasco is only checking one of the two possible execution paths in this code. Not a particularly sound proof.
C-semantics is the C formal methods tool that stands head and shoulders above anything else that currently exists (a fun Fibonacci example). It is actually based on the C source and does significantly more checking than verasco, but is not mentioned in the “Related work” section of the paper.
Some of the other POPL papers look a lot more interesting and potentially useful.
Workshop on App Store Analysis
I was at the 36th CREST Open Workshop, on App Store Analysis, at the start of this week. The attendee list reads like a who’s who of academics researching App stores. What really stood out for me was the disconnect between my view of the software engineering aspects of developing mobile Apps and the view of many, but not all, academics in the room.
Divergent points of view on App development being different because… included:
Academics: they are written by a small number (often one) of developers.
Me: This was true in the early days of microprocessors and the web. When something new comes out only a small number of people are involved in it and few companies are willing to invest in setting up large development teams. If the new thing succeeds (i.e., there is money to be made) the money to create large teams will follow.
Academics: third party libraries make a significant contribution to functionality.
Me: This is true of a lot of web software and it is becoming more common for Apps on all platforms. It was not true in the past because the libraries were not available; Open Source changed all that.
Academics: they are not structured/written according to software engineering principles (someone in the room thought that waterfall was still widely used).
Me: This is true of most software produced by individuals who are writing something out of interest in their spare time or because they are not gainfully employed in ‘real’ work. When microcomputers were new the internal quality of most software on the market was truly appalling; it was primarily written by people who knew a market niche very well and taught themselves programming, the software sold because it addressed the needs to its customers and code quality was irrelevant (of course the successful products eventually needed to be maintained, which in when code quality became important, but they now had money to employ developers who knew about that kind of stuff).
Academics: the rapid rate of change (in tools and libraries etc) being experienced will continue into the foreseeable future.
Me: I was staggered that anyone could think this.
Academics: lots of money to be made for minimal investment:
Me: Those days are past.
Me: power drain issues (may) be a significant design issues.
Academics: Blank look.
Other things to report:
Various concerns raised by people who had encountered the viewpoint that mobile Apps were not considered worthy of serious academic study within software engineering; this point of view seemed to be changing. I don’t recall there every having been academic research groups targeting microcomputer software, but this certainly happened for web development.
I was a bit surprised at the rather rudimentary statistical techniques that were being used. But somebody is working on a book to change this.
Software engineering: A great discipline for an academic fraudster
I am a sporadic reader of In the Pipeline, a blog covering drug discovery and the pharma industry, subjects about which I have no real interest but the author is a no nonsense guy whose writing I enjoy reading. A topic that regularly crops up is retraction of a published paper (i.e., effectively saying “ignore that paper we published way back when”). Reasons for retraction include a serious mistake, plagiarism of somebody else’s work or outright fabrication of data.
Retraction of papers published in software engineering journals is rare, why is that? I don’t think software engineering researchers are more/less honest than researchers in other fields. I could not find any entries on Retraction Watch.
Plagiarism certainly occurs and every now and again a paper is retracted for this reason.
Corrections to previously published papers certainly occur on a regular basis, but I don’t recall seeing a retraction because of a serious error (but then I rarely get to gossip around the coffee table in university departments and am not that well up on such goings on).
Researchers are certainly not above using the subset of a benchmark that shines the most favorable light on their work, or simply performing misleading comparisons. Researchers who do such things are seem more as an embarrassment than a threat to academic integrity, they are certainly not in the same league as those who fabricate data
Fabrication of data in software engineering? I’m sure it goes on, but unless the people responsible own up I think it is unlikely to be detected (unless the claims are truely over the top). There is no culture of replication in software engineering or of building on other peoples’ work (everybody is into doing their own thing); two very serious problems, but not the topic of this discussion.
In fact software engineering is the ideal discipline for an academic fraudster: replication is very rare, everyone doing their own thing, a culture of poor/nonexistent record keeping and experimental data is rarely kept past the replacement of the machine on which it sits (I am regularly told this when I email authors asking for a copy of their raw data for my book). Even in disciplines whose characteristics are at the other end of the culture scale, it can take a long time for fraud to be uncovered.
From time to time authors I contact tell me that the numbers appearing in the published paper are incorrect; often there is an offer of the correct numbers and sometimes a vague recollection of what they might be. Sometimes authors don’t reply to my email, is the data fake or is talking to me not worth their time (I have received replies to this effect)?
Am I worried about fraud in software engineering research? No, incorrect data in published work is more likely to occur because of clerical mistakes, laziness or incompetence.
Cloning research needs a new mantra
The obvious answer to software engineering researchers who ask why their findings are not applied within industry is that their findings provide no benefits to industry. Anyone who digs into the published research finds that in fact there is lots of potentially useful stuff in there, the problem is that researchers often take too narrow a perspective.
A good example of a research area that is generally ignored by industry but has potential for widespread benefits is software cloning; that is chunks of source code that are duplicated within the same application (a chunk may be as little as five lines or may be more, and the definition of duplicate varies from exactly the same character sequence, through semantic equivalence to chilling out with a certain percentage of lines being the same {with various definitions for ‘same’}). (This is not about duplication of code in multiple versions of the same product, we all know how nasty that can be to maintain).
Researchers regard cloning as bad, while I suspect many developers are neutral on the subject or even in favor of creating and using duplicate code.
Clone research will be ignored by industry while researchers continue to push the mantra “clones are bad”. It just does not gel with industry’s view.
Developers are under pressure to deliver working software; if they can save time by (legally) making use of existing code then there is an immediate benefit to them and their employer. The researchers’ argument is that clones increase maintenance costs (a fault being fixed in one of the duplicates but not the other(s) is often cited as the killer case for all clones being bad). What developers know is that most code is never maintained (e.g., is is rewritten, or never used again or works fine and does not need to be changed).
Do company’s that own software care about it containing clones? They are generally more interested in meeting deadlines and being first to market. If a product is a success it will be worth paying its maintenance costs; why risk spending extra time/money on creating a beautifully written product when most products don’t well well enough to be worth maintaining? If the software is bespoke, for in-house use or by a client, then increased maintenance costs are good for those involved in writing the software (i.e., they get paid to maintain it).
The new clone research mantra should be that clones have benefits and costs, and the research results help increase benefits and decrease costs. How does this increase/decrease work? You’re the researchers, you tell me.
My own experience with clones is that they do sometimes multiply costs (i.e., work has to be done more than once) but overall their creation and use is very cost effective, as for ‘missed’ fault fixes clones are a small subset of this use case.
I have heard of projects where there has been rampant copying, plus minor modification, of code within the project. If such projects fail then the issue is one of project management and control, with cloning being one of the consequences.
The number of clones usually found in a large software system is surprisingly high; . If you want to check out the clones in your own code CCFinder is well worth a look. The most common use for such tools is plagiarism detection.
Wot, apply academic work in industry?
Academics often moan about industry not making use of their work (or at least they do within the code analysis niche I frequent, I have no real knowledge of other niches). There are three reasons for this state of affairs:
- The work that most academics do has no practical relevance to industry. This is the lion’s share of the reason and something that many academics will agree with if none of their colleges are likely to overhear them. I suspect many academics are not too fussed that their work is not used by industry and are happy to continue working on things they find interesting (or that they can write papers about that disconnected souls are happy to see published).
- Very very few people in the software industry ever read academic papers. But hey, not reading manuals is regarded as a badge of honour. Some people do read manuals and are quickly elevated to expert status. Academic papers do have a very low signal to noise ratio and learning to speed read them to locate the gold nuggets takes practice.
- If an academic’s work is applied by some company the last thing those involved will do is say anything about it. Industry is a cut-throat place and what is to be gained by freely giving useful information to the competition?
The second product my company ever produced was a range of code generators for an intermediate code that was currently interpreted; how best to match the patterns in the intermediate code and also reuse as much as possible for the different cpu targets? I found a solution in Mahadevan Ganapahi’s PhD thesis and now 33 years after publishing it he gets some credit for a long gone industrial application.
The fatal programming language research mistake
There is a fatal mistake often made by those involved in academic programming language research and a recent blog post (by an academic) asking if programming language research has a future has spurred me into writing about this mistake.
As an aside, I would agree with much of what the academic (Cristina (Crista) Videira Lopes) says about many popular modern programming languages being hacked together by kids who did not know much, if anything about, language design. However, this post is not a lament about the poor design quality of the languages commonly used in the commercial world; it is about the most common fatal mistake academics make when researching programming languages and a suggestion about how they can avoid making this mistake. What really endeared me to Crista was her critic of academic claims of language ‘betterness’ being completely unfounded (i.e., not being based on any empirical research).
The most common fatal mistake made by researchers in programing language design is to invent a new language. Creating an implementation for any language is a big undertaking and a new language has the added hurdles of convincing developers it is worth learning, providing the learning/reference materials and porting to multiple platforms. Researchers spend nearly all their time creating an implementation and a small percentage of their time actively researching the ‘new idea’.
The attraction of designing a new language is that it is regarded as ‘sexy’ activity and the first (and usually only) time around the work needed to create an implementations does not look that great.
If a researcher really does feel that their idea is so revolutionary it is worth creating a whole new language for and they want me, and others, to start using it, then they need to make sure they can answer yes to the following questions:
- Have you, or your students, created an implementation of the language that provides reasonable diagnostics, executes programs at an acceptable rate and is available free of charge on the operating systems I use for software development?
- Is sufficient documentation available for me to learn the language and act as a reference manual once I become more expert?
- For the next five years will you, or your students, be providing, free of charge, prompt bug fixes to errors in your implementation?
- Will you and your students spend the time necessary to build an active user community for your language?
- For the next five years will you, or your students knowledgeable in the language, provide prompt support (via an email group or bulletin board) to user queries?
Some new languages from academia have managed to answer yes to these questions (Haskell, R and OCaml spring to mind, but only R looks like it will have any significant industrial take-up).
In practice most new languages fail to get past fragile implementations only ever used by their designer, with minimal new research to show for all the effort that went into them.
What programming language researchers need to do, at least if they want people outside of their own department to pay any attention to their ideas, is to experiment by adding functionality to an existing language. Using an existing language as a base has the advantages:
- modifying an existing implementation is significantly less work than creating a new one,
- having to address all of the features present in real world languages will help weed out poor designs that only look good on paper (I continue to be amazed that people can be involved in programming language research without knowing any language very well),
- documentation for most of the language already exists,
- more likely to attract early adopters, developers tend to treat existing language+extensions as being a much smaller jump than a new language.
Programming language research is something of a fashion industry and I can well imagine people objecting to having to deal with a messy existing language. Well yes, the real world is a messy place and if a new design idea cannot handle that it deserves to be lost to posterity.
One cannot blame students for being idealistic and thinking they can create a language that will take over the world. It is the members of staff who should be ridiculed for being either naive or intellectually shallow.
Criteria for knowing a language
What does it mean for somebody to claim to know a computer language? In the commercial world it means the person is claiming to be capable of fluently (i.e., only using knowledge contained in their head and without having to unduly ponder) reading, and writing code in some generally accepted style applicable to that language. The academic world generally sets a much lower standard of competence (perhaps because most of its inhabitants leave before any significant expertise is acquired). If I had a penny for every recent graduate who claimed to know a language and was incapable of writing a program that read in a list of integers and printed their sum (I know companies that set tougher problems, but they do not seem to have higher failure rates), I would be a rich man.
One experiment asked 21 postgraduate and academic staff which of the following individuals they would regard as knowing Java:
The results were:
_ NO YES
A 21 0
B 18 3
C 16 5
D 8 13
E 0 21
These answers reflect the environment from which the subjects were drawn. When I wrote compilers for a living, I did not consider that anybody knew a language unless they had written a compiler for it, a point of view echoed by other compiler writers I knew.
I’m not sure that commercial developers would be happy with answer (E), in fact they could probably expand (E) into five separate questions that tested the degree to which a person was able to combine various elements of the language to create a meaningful whole. In the commercial world, stage (E) is where people are expected to start.
The criteria used to decide whether somebody knows a language depends on which group of people you talk to; academics, professional developers and compiler writers each have their own in-group standards. In a sense the question is irrelevant, a small amount of language knowledge applied well can be used to do a reasonable job of creating a program for most applications.
Recent Comments