Verified compilers and soap powder advertising
There’s a new paper out claiming to be about a formally-verified C compiler, it even states a Theorem about its abilities! If this paper appeared as part of a Soap powder advert the Advertising Standards Authority would probably require clarification of the claims. What clarifications might appear in the small print tucked away at the bottom of the ad?
- C source code is not verified directly, it is first translated to the formal notations used by the verification system; the software that performs this translation is assumed to be correct.
- The CompCert system may successfully translate programs containing undefined behavior. Any proof statements made about such programs may not be valid.
- The support tools are assumed to be correct; primarily the Coq proof assistant, which is written in OCaml.
- The CompCert system makes decisions about implementation dependent behaviors and any proofs only apply in the context of these decisions.
- The CompCert system makes decisions about unspecified behaviors and any proofs only apply in the context of these decisions.
Some notes on the small print:
The C source translator used by CompCert rarely gets mentioned in any of the published papers; what was done to check its accuracy (I have previously discussed some options)? Presumably the developers who wrote it tried very hard to make sure they did a good job, just like the authors of f2c, a Fortran to C translator, did. Connecting f2c as a front-end of the CompCert system gives us a verified Fortran compiler! I think the f2c translator is much more likely to be correct than the CompCert C source translator, it has been used by a lot more people, processed a lot more source and maintained over a longer period.
When they encounter undefined behavior in source code production C compilers sometimes generate code that has very unexpected behavior. Using the CompCert system will not avoid unexpected behavior in these situations; CompCert simply washes its hands for this kind of code and says all bets are off.
Proving the support tools correct would simply move the assumption of correctness to a different set of tools. I am not aware of any major effort to test whether the Coq system behaves as intended, but have not read all the papers describing it (the list of reported faults is does not appear to be publicly available); bugs have been found in the OCaml implementation.
Like all compilers that generate code, CompCert has to make implementation dependent decisions and select one of the possible unspecified behaviors. The C-Semantics tool generates all unspecified behaviors, rather than just one.
You seems quite skeptical or critical toward CompCert. While it is true that it has its limitations, the approach is interesting and it seems that it is also effective ! I remember about the CSmith fuzzer (see http://lambda-the-ultimate.org/node/4241 ; http://blog.regehr.org/archives/804 ; and the paper http://www.cs.utah.edu/~regehr/papers/pldi11-preprint.pdf ) that found bugs in every compiler but CompCert.
Also you link “bugs have been found” doesn’t relate to a Coq bug but an Ocaml one, and it seems to me has not been shown that it affects Coq.
@Mehdi Amini
I think CompCert is a very interesting piece of work, I just wish they would not oversell what they have done. They have addressed the important issue of code generation, a compiler also needs to get the semantics of the language correct and they have delegated this to a tool for which there are no proofs.
The Csmith fuzzer results are interesting (this is close to being purely a code generation issue). Were no bugs found in CompCert because they only perform a few optimizations compared to gcc/llvm/etc, is/was testing effort in gcc/llvm/etc tuned to the common cases at the expense of rare cases (a lot of the faults found was with code unlike that typically written by developers), or is it because of the formal techniques used to create CompCert? There is a lot to be said to limiting the optimizations done when correctness of generated code is an important issue.
Thanks for pointing out my Coq/OCaml mistake, I will fix it.
This particular horse might be long dead, but here are a few observations:
* Yes, the readily available stuff on the web site oversells their achievements, but I guess this is deliberate. They are most likely hoping to spin off their work at some later time and thus they’d like to attract potential investors. However, I think they take a fairly balanced stance in their published papers, pointing out pitfalls and limitations.
* They are, after all, academics, so they work on the “interesting” problems. The subset translation part is boring, full validation testing is boring (and costs money!). OCaml/Coq implementations correctness is “out of scope” etc, etc. And these issues are not really relevant for the questions thy try to answer, at least not in an academic setting.
* Turning what they have (and what they lack!) into a commercial product is not going to be plain sailing and will most likely take years if ever attempted, but that is definitely outside the scope of a typical research project.
* This research is very interesting, not only from a safety perspective, but also for the potential in changing how compilers are constructed. There’s a lot of effort in the research world today to develop tools and techniques for formal specification of things like optimizations, translation etc. The CompCert effort fits tight in.
/Anders, who works for IAR Systems
@Anders
The problem with the overselling is that there are a surprising number of people who actually believe that this compiler is ‘proved correct’ in a much more wider sense than it actually has.
Mathematicians are academics and they are not allowed to leave out what they consider to be the boring bits of a proof they are working on.
There is certainly a fair bit of research going on under the label “formal methods”. A lot seems to be aimed at systems that are ‘easy’ to prove stuff about and I’m sure there is plenty of overselling going on.
It seems that in “computer science” it’s perfectly OK to make the sort of assumptions done by the CompCert team, i.e. “this problem consists of subproblems A, B and C, so let’s work on C and leave the rest for someone else.” I think it makes sense in the compcert case to make the division they have, because:
* They are fully aware that the problem with the translation from the C language to the subset needs to be handled. But it looks like it can be handled with more or less the same methods and techniques as used for the translation problem. (Or by some sort of validation and verification activity.) But this part is only interesting if/when they ever decide to go commercial and does not really bring any new insights from a research perspective.
* From a pure research perspective on the translation problem it’s not very interesting if the implementation language and theorem prover are formally verified as being correct or not. This is of course a rather interesting question when trying to make a product out of the research, but I’m constantly baffled by what CS researchers get away with in terms of assumptions… 🙂
But on the other hand the proofs from Coq can most likely be inspected if needed.
* Formally verified translation can have it’s merits even if the surrounding infrastructure does not live up to the same promise. Just the other day John Regehr and his team published a preliminary write-up on his blog on their latest project to formally verify peephole optimizations in llvm based on a dsl for optimizations . Their aim is to code generate the needed implementation code. Basically the same situation, but maybe not the same hype.
I guess that the compcert team is painfully aware that whatever they claim today has to be solidly backed if they decide to go commercial. Further, selling such a product is not going to be like stealing candy from kids, at least not if they want to make some serious money from it… In the end the only way to success is to prove in the market that a “proven” compiler is actually less buggy than the competition.
@Anders
The Compcert team have not made assumptions, they have made a wildly inaccurate claim, i.e., they have not written a verified compiler.
If a research group has verified some components of a compiler then that is what they should claim credit for.
What kind of reputation would mathematics have if mathematicians went around claiming they had proved some major result when in fact they had only proved some smaller subset?