Tools that help handle floating-point dragons
There be dragons is a common refrain in any discussion involving code containing floating-point. The dragons are not likely to disappear anytime soon, but there has been a lot of progress since my 2011 post and practical tools for handling them are starting to become available to developers who are not numerical analysts.
All the techniques contain an element of brute force, very many possibilities are examined (cloud computing is starting to have a big impact on how problems are attacked). All the cloud computing on the planet would not make a noticeable dent in any problem unless some clever stuff was done to drastically prune the search space.
My current favorite tool is Herbie, if only because of the blog posts describing some of the techniques used (it’s currently limited to code without loops; if you need loop support check out Rosa).
It’s all very well having the performance of your floating-point code optimized, but who is to say nasty problems are not lurking in unexplored ranges of the underlying formula. Without an Oracle capable of generating the correct answer (whatever that might be; Precimonious has to be provided with training inputs), the analysis can only flag what is considered to be suspicious behavior. Craft attempts to detect cancellation errors, S3FP searches for input values that produce results containing large relative error and Rangelab simply provides bounds on the output values calculated from whatever input it is fed.
Being interested in getting very accurate results is a niche market. Surprisingly inaccurate results are good enough for many people and perhaps we should be using a language designed for this market.
Perhaps the problem of efficiently and accurately printing floating-point numbers might finally have just been solved.
Ada updated to support undefined behavior
The Ada language has now almost completely disappeared from developer consciousness and the Ada Standards’ committee has decided this desperate situation calls for desperate actions.
Undefined behavior in programming languages has been getting a lot of publicity over the last few years. It might not be good publicity, but the Ada committee has taken to heart the old adage ‘The only thing worse than being talked about is not being talked about.’
While the Ada standard does not use the phrase undefined behavior, it does contain a very close equivalent. The Ada community has bitten the bullet and decreed that constructs previously denoted by the term bounded error shall henceforth be referred to as undefined behavior. Technically, constructs generating a bounded error have effectively always been undefined behavior, but the original positioning of Ada as a sophisticated language suitable for critical applications required something less in-your-face unsafe sounding.
Will this minor change of wording (ISO rules are very strict on what changes can be made to published standards without going through long winded voting procedures) make any difference? I guess it will enable Ada users to jump on the ‘undefined behavior’ bandwagon (their claims that Ada was better because it did not contain undefined behavior always has the effect of annoying people, rather than generating any interest in changing languages).
I think there is a bigger public perception problem than the terminology used to denote undefined behavior. Ada Lovelace was born 200 years ago and gets lots of publicity as the first programmer; somehow this association with Lovelace has transferred to the language named after her, generating a perception of an old, fuddy-duddy language (being strongly typed has not helped, but this feature does appear to be coming back into fashion).
The following are some numbers from a while ago (both standards have been revised since these measurements were made).
In the Ada 2005 Standard there are:
36 occurrences of bounded error. 53 occurrences of the subsection header Erroneous execution and 116 occurrences of erroneous. 343 occurrences of implementation-defined. 373 occurrences of may, some of which describe optional behavior. 22 occurrences of must some of which that read as-if shall was intended. 38 occurrences of optional. zero occurrences of processor dependent and processor-dependent. 1,018 occurrences of shall of which 131 have the form shall not. 452 occurrences of should. 8 occurrences of undefined, one referencing to an undefined range, three having the form undefined range and the rest occurring in annexes (also see bounded error). • 89 occurrences of unspecified.
In the C99 Standard there are:
zero occurrences of bounded error. 3 occurrences of erroneous. 163 occurrences of implementation-defined. 862 occurrences of may. 1 occurrence of must. 34 occurrences of optional. zero occurrences of processor dependent and processor-dependent. 596 occurrences of shall of which 71 have the form shall not. 63 occurrences of should. 183 occurrences of undefined. 98 occurrences of unspecified.
In the C++ 2003 Standard (which now contains many more pages) there are:
zero occurrences of bounded error. 5 occurrences of erroneous. 236 occurrences of implementation-defined. 371 occurrences of may. 111 occurrences of must. 30 occurrences of optional. zero occurrences of processor dependent and processor-dependent. 779 occurrences of shall of which 211 have the form shall not. 38 occurrences of should. 195 occurrences of undefined. 113 occurrences of unspecified.
Habits are the peripheral vision of the mind
Achieving a basic proficiency in a new skill requires an investment of conscious cognitive effort, i.e., thinking a lot. Students are constantly in the process of achieving basic proficiency in new skills and conclude that thinking is required for all intellectual activities (an incorrect assumption also held by many teachers).
To get past the conscious thinking stage lots of time has to be spent performing the skill. Repetition provides the opportunity for performance via conscious thought to migrate to subconscious performance (driving being a common example).
Real-time performance requires fluency, that is, being able to handle technical details without having to think about them. Thinking (i.e., conscious thought) is slow and requires lots of effort. It is best held in reserve for the important stuff.
To paraphrase Alfred Whitehead: “Software development advances by extending the number of important operations which we can perform without thinking about them.”
Somebody who has spent 100 hours or so (an hour or two a week for a year) learning to code has the same level of fluency as I have in communicating in a foreign language using a phrase book, or Google translate.
After a 1,000 hours of programming a person should be a very fluent coder.
It is said that becoming an expert requires 10,000 hours of practice. The kind of practice involved is deliberate practice, not unconscious use of what is already known. Becoming an expert requires learning lots of new things, not constantly applying what is already known. Old habits have to be broken and new ones acquired.
Programming is not Zen, although it contains elements that are. Why would a developer want to create a program without conscious thought (that is what scripts are for)?
I used to run ‘advanced’ programming courses for professional developers with 2+ years in industry. In many ways the material was a rerun of what they had learned at the start of their programming career. The difference was that this time around they could ignore the mechanics of writing code, now an ingrained habit, and concentrate on the higher level stuff. The course had to have advanced in its title because experienced developers would never sign up for an introductory course. Most of my one-on-one tutoring effort went on talking people out of bad habits they had picked up over time.
Perhaps live coding can be done with a Zen mind, probably why I don’t regard it as real programming (which I think requires some conscious thought).
Talking about details and high level material in the same breath is what beginners do because they have not yet learned to tell the two apart and be able to ignore one of them.
Like life, programs are mostly built from sequences of commonly occurring patterns. Our minds have evolved to subconsciously detect and take advantage of patterns. Programmers don’t know what the common source code patterns are any more than a native speaker can specify the syntax rules of the language they speak.
cpu+FPGA: applications can soon have bespoke instructions
Compiler writers are always frustrated that the cpu they are currently targeting does not contain the one instruction that would enable them to generate really efficient code. If only it were possible to add new instructions to the cpu. Well, it looks like this will soon be possible; Intel have added an on chip FPGA to their Broadwell processor (available circa 2017).
Having custom instructions on a FPGA (they would be loaded at program startup) is not the same as having the instructions on the cpu itself, there will be communication overhead when the data operated on by the custom instruction get transferred back and forth between cpu/FPGA (being on-chip means this will be low). To make the exercise worthwhile the custom instruction has to do something that takes very many cycles on the cpu and either speeds it up or reduces the power consumed (the Catapult project at Microsoft has a rack of FPGA enhanced machines speeding up/reducing the power of matching search engine queries to documents).
A CPU+FPGA is like CPU+GPU, except that FPGAs are programmed at a much lower level, i.e., there is little in the way of abstraction between what the hardware does and what the coder sees.
Does the world need a FPGA attached to their cpu? Most don’t but there are probably a few customers who do, e.g., data centers with systems performing dedicated tasks and anybody into serious bit twiddling. Other considerations include Intel needing to add new bells and whistles to its product so that customers who have been trained over the years to buy the very latest product (which has the largest margins) stay on the buying treadmill. The FPGA is also a differentiator, not that Intel would ever think of AMD as a serious competitor.
Initially the obvious use case is libraries performing commonly occurring functionality. No, not matrix multiple and inverse, FPGA are predominantly integer operation units (there are approaches using non-standard floating-point formats that can be used if your FPGA unit does not have floating-point support).
From the compiler perspective the use case is spotting cpu intensive loops, where all the data can be held on the FPGA until processing is complete. Will there be enough of these loops to make it a worthwhile implementation target? I suspect not. But then I can see many PhDs being written on this topic and one of them could produce a viable implementation that bootstraps itself into one of the popular open source compilers.
Interpreters have to do a lot of housekeeping work. Perhaps programs written in Java or R could be executed on the FPGA that uses the cpu as a slave processor. It is claimed that most R programs spend their time in library functions that have been implemented in C and Fortran, but I’m seeing more and more code that appears to be all R. For some programs an R-machine implemented in hardware could produce orders of magnitude speed improvements.
The next generation of cryptocurrency proof-of-work algorithms are being designed to be memory intensive, so they cannot be efficiently implemented using ASIC-proof (this prevents mining being concentrated in a few groups who have built bespoke mining operations). The analysis I have seen is based on ‘conventional’ cpu and ASIC designs. A cpu+FPGA is a very different kind of beast and one that might require another round of cryptocurrency design.
These cpu+FPGA processors have the potential to dramatically upend existing approaches to structuring programs. Very interesting times ahead!
A survey of opinions on the behavior of various C constructs
The Cerberus project, researching C semantics, has written up the results of their survey of ‘expert’ C users (short version and long detailed version). I took part in the original survey and at times found myself having to second guess what the questioner was asking; the people involved were/are still learning how C works. Anyway, many of the replies provide interesting insights into current developer interpretation of the behavior of various C constructs (while many of the respondents were compiler writers, it looks like some of them were not C compiler writers).
Some of those working on the Cerberus project are proposing changes to the C standard based on issues they encountered while writing a formal specification for parts of C and are bolstering their argument, in part, using the results of their survey. In many ways the content of the C Standard was derived from a survey of those attending WG14 meetings (or rather x3j11 meetings back in the day).
I think there is zero probability that any of these proposed changes will make it into a revised C standard; none of the reasons are technical and include:
- If it isn’t broken, don’t fix it. Lots of people have successfully implemented compilers based on the text of the standard, which is the purpose of the document. Where is the cost/benefit of changing the wording to enable a formal specification using one particular mathematical notation?
- WG14 receives lots of requests for changes to the C Standard and has an implicit filtering process. If the person making the request thinks the change is important, they will:
- put the effort into wording the proposal in the stylized form used for language change proposals (i.e., not intersperse changes in a long document discussing another matter),
- be regular attendees of WG14 meetings, working with committee members on committee business and helping to navigate their proposals through the process (turning up to part of a meeting will see your proposal disappear as soon as you leave the building; the next WG14 meeting is in London during April).
It could be argued that having to attend many meetings around the world favors those working for large companies. In practice only a few large companies see any benefit in sending an employee to a standard’s meeting for a week to work on something that may be of long term benefit them (sometimes a hardware company who wants to make sure that C can be compiled efficiently to their processors).
The standard’s creation process is about stability (don’t break existing code; many years ago a company voted against a revision to the Cobol standard because they had lost the source code to one of their products and could not check whether the proposed updates would break this code) and broad appeal (not narrow interests).
Update: Herb Sutter’s C++ trip report gives an interesting overview of the process adopted by WG21.
p-values for programmers
Data analysis always contains a degree of uncertainty, in statistics this uncertainty is often expressed through the concept of p-value and possible interpretations of the data tested using the ideas behind null hypothesis testing. What is the best way of explaining the concept of p-value and null hypothesis testing to software developers?
For the empirical software engineering workshops I have been running, p-values get both a broad brush description and a use case in the form of null hypothesis testing (expressed as code). The broad brush approach to p-values worked and the null hypothesis as-code approach flopped miserably.
The broad brush approach defines p-values as a measure of uncertainty and leaves it at that, no mention of what exactly is uncertain. I quickly move on to discuss the more important point, from the real-world point of view, of selecting a p-value. Choice of p-value is one side of a cost/benefit business decision; would you sit in a lecture theater that had a 1 in 20 chance of collapsing? It all depends on what benefits accrued from sitting there.
Developers understand code and expressing the operations around the use of null hypothesis testing, as code, will obviously make everything clear. Perhaps my attempt to fit everything on one slide concentrated things a bit too much, or perhaps being firehosed with stuff to learn had left people yearning for the lunch that was fast approaching. The following function, accompanied by me babbling away was met with many blank stares.
The following function is my stab at explaining the operation of the null hypothesis and the role played by p-values. I hope that things are a bit clearer for readers who have not had me teaching them R in 15 minutes, followed by having a general what you need to know about probability and statistics in 30 slides.
void null_hypothesis_test(void *result_data, float p_value) { // H set by reality, its value is only accessible by running the appropriate experiments if ((probability_of_seeing_data_when_H_true(result_data) < p_value) || !H) printf("Willing to assume that H is false\n"); else printf("H might be true\n"); } null_hypothesis_test(run_experiment(), 0.05); |
Workshop on survival and time series analysis in empirical SE
In January the material in my book on Empirical software engineering using R had its first exposure to professional software developers at a one day workshop (there was a rerun last week; slides here). The sessions were both fully booked, but as often happens on half turned up, around 15 at each workshop. A couple of people turned up expecting to be taught R and found themselves in a software engineering workshop that assumed the attendees could learn R in 10 minutes (because professional developers are experienced enough to learn the basics of programming in any language in that amount of time).
The main feedback was that people wanted to see more code, something to give them a starting point for the hands-on sessions (I hate seeing reams and reams of code on slides and had gone too far in the minimalist direction).
The approach of minimizing what developers have to learn/remember, even if it means using more computer resources (e.g., always using glm), went down very well.
The probability & statistics material, in the session before lunch, had a mixed reception; this is partly because I have not come up with a clear message and I’m trying to get over several disparate ideas, i.e., sampling issues, life is complicated and p-values. People were a bit unhappy about the life is complicated, data is messy and you have to think about what you are doing message.
I think the regression model building material was a hit. People saw the potential in fitting curves to data and found it easy to do once the parameters to plot
, lines
, glm
and predict
were sorted out. There was lots of interesting discussion around interpreting the fitted models, with me continually hamming on the point that p-value selection was a business risk factor decision and what constituted a good model depended on what it was going to be used for.
Those attending wanted to learn more, which is great since the main aim was to show people what useful things could be done to motivate them to go off and learn more (relatively little may be known about empirical software engineering, but there is more than can be fitted into a one day workshop)
There is a part-2 workshop in March and the plan is to cover survival analysis, time series analysis and if there is time something else. It will be assumed that people have the skill level of those who attended the first workshop, e.g., can write basic R and fit a simple regression model.
Thanks to The Rise for sponsoring the venue for all three workshops.
pycparser: a serious entry in the not-written-in-C C-parser category
Sometimes it feels like everybody and his dog has had a go at writing a C parser. On the whole these parsers are small subsets and the choice of subset seems to be driven by the developer’s taste, what they find easy to do and how much time they are willing to put into the project.
Some time ago a blog post discussing parsing C type declarations made me think that pycparser might be a cut above the usual learning experience projects and a quick look showed it was quite good. I recently tried it out again on the examples from my C book and it did a surprisingly good job of handling this rather weird set of edge cases (it failed to handle the code in 20 out of 957 files).
There is a type of person who insists that the C parser used by the C source analysis project they are working on be written in a ‘high-level’ language, i.e., they don’t want to use one of the perfectly adequate (and correct) parsers written in C/C++. I’m not sure whether this is because having to actually use C would expose the poor state of their knowledge of the language, language snobbery (its ok to analyse C source, but write it, heaven forbid) or they are members of a One True Language guild.
Up until now the parser of choice, for people not wanting to use C/C++, was CIL (a slightly more up to date version on Github); used by Coccinelle and many other tools.
If you really don’t want to parse C source using a parser written in C/C++, I think pycparser has now reached the stage where it is worth considering, along with CIL.
Human vs automatically generated source code: an arms race?
How well do I understand the process of writing source, as performed by human developers? If I could write a program that could generate source code that was indistinguishable from human written code, then I think I could claim to have a decent understanding of the human processes.
An important question is the skill of the entity attempting to distinguish automatically generated and human written code. As of today I think I could fool existing tools (mostly because such tools don’t really exist; n-gram analysis is really all there is), but a human developer with a smattering of coding experience would not be fooled.
I think I know enough to write a tool that could generate single function definitions that would fool most developers, but not a moderately sophisticated analysis tool (I don’t know enough to combine multiple functions into a plausible call graph).
At the CREST workshop today I met Martin Monperrus who shared my view that the only way to prove an understanding of source is to be able to generate it; other people at the workshop seemed more interested in the detection side of things.
Perhaps the time is ripe to kick off a generator/detector arms race would certainly increase our knowledge of the characteristics of human written code.
What counts as automatically generated code? Printing complete function definitions mined from Github is obviously not what most people would associate with the concept of automatic source generation. What about printing statements mined from Github (with each statement coming from a different function definition)? Well, if people could make that work, then good luck to them (it hardly seems worth the bother because most statements are very simple and easily generated on a token by token basis).
Should functions be judged in isolation, or should a detector get to see multiple instances of each case (which would make the detector’s life easier)? It might be best to go with whatever rules make for an interesting arms race.
To start the ball rolling, here is my first entry to the generator side of the race (a rather minimal entry, but still the leader in its field; the recurrent neural network trained on the Linux source would easily beat it, but my interest is in understanding developer behavior and will leave it to others to go down the neural network path).
Maximizing profit selling C compilers
Upgrades are the lifeblood of established software companies. I recently came across the paper Information Goods Upgrades: Theory and Evidence and what caught my attention was one of the datasets the author had collected, first purchase and upgrade price of various PC C/C++ compilers between 1987 and 1997. What’s more the author still had the data and was willing to share it, yay!
By the early 1990s I was no longer actively involved in C compilers, but was involved in C static analysis on non-PC platforms. So my view of the 1990s C compiler market is a bit sketchy.
Compiler companies, like other companies, want to maximize their revenue and THE decision that has to be made is the price to charge for a compiler (compiler writers are also developers and hate high prices for compilers and those that failed to charge enough for their product soon went bust). My recollection is that compiler pricing was based around the spending authority of a senior development engineer and also what other companies were charging. Just under £500 was common, with a few companies failing to make a go of selling around the £100 mark. Zorland (later renamed to Zortech) gained huge market share in the mid/late 1980s selling a great C compiler for £29, but a few years later were selling a C++ compiler for a lot more.
To some extent each compiler vendor operates in a monopoly market; developers write code that depends on the features supported by the compiler used and it can be very expensive to port code to a different compiler. How much can vendors charge for a compiler upgrade? Selling the product at a high price provides a rationale for higher priced upgrades (the percentage discount will look good). I wonder how many vendors continued to advertise a high price product just to justify a high upgrade price.
Management always feel an affinity for the OS vendor and Microsoft sold a C compiler and later a C++ compiler. They were both awful and easy, product quality wise, to compete against. Microsoft had to have their own compiler for strategic internal use, with sales to developers being insignificant compared to sales of Word and Excel (Microsoft compiler people I talked to at the time said they had thought of giving the compiler away for free and later it was possible to essentially get the compiler for free by joining the various developer programs). Over time Microsoft improved and compiler companies found easier ways to make money, so the number of compiler vendors dropped to almost one (a company selling C compiler validation suites once told me in the late 1990s that they had sold over 150 copies; someone has to be serious about their compiler to shell out $5,000-$10,000 for software to test it).
By the late 1980s the C compiler market was quite saturated and vendors needed something else to sell. IDEs and debuggers were popular choices. Then along came C++. Yay! A new language meant a new compiler to sell. Compiler vendors’ need for a new compiler to sell is a significantly underestimated factor in C++ gaining traction in developer mind share.
A rarely talked about compiler revenue stream is being paid to port a compiler to a new platform (either because there is an important application hat depend son it or because the platform does not yet have a C compiler). This is the market where gcc had its first successes. Its hard to say whether gcc spread because these niche platforms spread or because gcc cut off revenue to compiler vendors making remaining in the compiler market unattractive to them.
I don’t have any sales figures for any ‘mass’ market C compilers or compilers for any languages. Can any readers help out? In fact any data on compiler sales would be most welcome.
Recent Comments