2015: A new C semantics research group
A very new PhD student research group working on C semantics has just appeared on the horizon. You can tell they are very new to C semantics by the sloppy wording in their survey of C users (what is a ‘normal’ compiler and how does it differ from the ‘current mainstream’ compiler referred to in some questions? I’m surprised the outcome appeared clear to the authors, given the jumble of multiple choice options given to respondents).
Over the years a number of these groups have appeared, existed until their members received a PhD and then disappeared. In some cases one of the group members does something that shows a lot of potential (e.g., the C-semantics work), but the nature of academic research means that either the freshly minted PhD moves to industry or else moves on to another research area. Unfortunately most groups are overwhelmed by the task and pivot into meaningless subsets of concentrating on mathematical organisms. Very, very occasionally interesting work gets supported once the PhD is out of the way, Coccinelle being the stand-out example for C.
It takes implementing a full compiler (as part of a PhD or otherwise) to learn C semantics well enough to do meaningful research on it. The world seems to be stuck in a loop of using research to educate know-nothings until they know-something and then sending them off on another track. This is why C language researchers keep repeating themselves every 10 years or so.
Will anybody in this new group do any interesting work? Alan Mycroft set the bar very high for Cambridge by submitting a 100 page comment document on the draft C89 standard that listed almost as much ambiguous wording as everybody else put together found (but he was implementing a compiler in his spare time and not doing it for a PhD, so perhaps he does not count).
One suggestion I would make to this new group is that if they really are interested in actual usage they should measure actual usage, developer beliefs about compiler behavior is rarely very accurate and always heavily tainted by experiences from when they first started out.
Someone pointed me to this blog post. As a member of that research group, to correct some possible misunderstandings:
– while we are not C compiler developers, we are not especially new to C semantics (for example, we contributed significantly to the ISO-standard C++11 and C11 concurrency model, and also developed the CompCertTSO verified compiler for a concurrent C-like language)
– this survey is only one of many ways to investigate C-as-it-is, and it was deliberately condensed, at the cost of some imprecision, to make it answerable in reasonable time. We have also had more in-depth discussion with a range of people and done a certain amount of testing (though some of these things are very difficult to test experimentally). You might be interested in the experimental data by Chisnall et al., in Fig. 1 of http://www.cl.cam.ac.uk/~dc552/papers/asplos15-memory-safe-c.pdf
– I see you contributed to that survey, for which many thanks. If you can comment further, e.g. by identifying more precisely why you think the various idioms will and won’t work in practice, or suggesting ways in which compiler behaviour and existing C usage can be reconciled, that would be very welcome.
@Peter Sewell
Thanks for the comment and the link to a surreal paper on memory safety in C (the first time I have heard of a flat address-space with stack/heap referred to as the PDP-11 model; no mention of Intel’s Intel iAPX 432 which did a lot of this kind of checking or the many runtime C checkers; also somebody should let the authors know that C language requirements rule out the use of garbage collection).
Please don’t get me going about the marketing claims of CompCert and C-like languages.
Lots of analysis of C idioms.
An analysis of what developers believe about C would be a great topic for an anthropology PhD; tracing the evolution of whole models of C belief systems over the years and how they led to the code we see today.
I was aware of the work on the C11 concurrency model. There is lots of work that targets specific aspects of C, but (if I read the group’s web page correctly) you are now targeting the complete language semantics and such groups are not common.
I wish you luck in creating a C front end that is traceable to the requirements in the latest C Standard. The last such system was for C89 and a bit before its time.
Please don’t waste time reinventing wheels, otherwise you will never get anything useful done. There are parsers available that could be used as a starting point.
The formalization of the C/C++11 memory model by this group, and their cppmem tool, are very valuable for people who want to write concurrent code in C/C++. IMO, it is a better way to learn the memory model details than to actually look at the standard. Derek, if you can’t see the value of this group’s work, I wonder how much you actually know about concurrency…