Automatically improving code
Compared to 20 or 30 years ago we know a lot more about the properties of algorithms and better ways of doing things often exist (e.g., more accurate, faster, more reliable, etc). The problem with this knowledge is that it takes the form of lots and lots of small specific details, not the kind of thing that developers are likely to be interested in, or good at, remembering. Rather than involve developers in the decision-making process, perhaps the compiler could figure out when to substitute something better for what had actually been written.
While developers are likely to be very happy to see what they have written behaving as accurately and reliably as they had expected (ignorance is bliss), there is always the possibility that the ‘less better’ behavior of what they had actually written had really been intended. The following examples illustrate two relatively low level ‘improvement’ transformations:
- this case is probably a long-standing fault in many binary search and merge sort functions; the relevant block of developer written code goes something like the following:
while (low <= high) { int mid = (low + high) / 2; int midVal = data[mid]; if (midVal < key) low = mid + 1 else if (midVal > key) high = mid - 1; else return mid; }
The fault is in the expression
(low + high) / 2
which overflows to a negative value, and returns a negative value, if the number of items being sorted is large enough. Alternatives that don’t overflow, and that a compiler might transform the code to, include:low + ((high - low) / 2)
and(low + high) >>> 1
. - the second involves summing a sequence of floating-point numbers. The typical implementation is a simple loop such as the following:
sum=0.0; for i=1 to array_len sum += array_of_double[i];
which for large arrays can result in
sum
losing a great deal of accuracy. The Kahan summation algorithm tries to take account of accuracy lost in one iteration of the loop by compensating on the next iteration. If floating-point numbers were represented to infinite precision the following loop could be simplified to the one above:sum=0.0; c=0.0; for i = 1 to array_len { y = array_of_double[i] - c; // try to adjust for previous lost accuracy t = sum + y; c = (t - sum) - y; // try and gets some information on lost accuracy sum = t; }
In this case the additional accuracy is bought at the price of a decrease in performance.
Compiler maintainers are just like other workers in that they want to carry on working at what they are doing. This means they need to keep finding ways of improving their product, or at least improving it from the point of view of those willing to pay for their services.
Many low level transformations such as the above two examples would be not be that hard to implement, and some developers would regard them as useful. In some cases the behavior of the code as written would be required, and its transformed behavior would be surprising to the author, while in other cases the transformed behavior is what the developer would prefer if they were aware of it. Doesn’t it make sense to perform the transformations in those cases where the as-written behavior is least likely to be wanted?
Compilers already do things that are surprising to developers (often because the developer does not fully understand the language, many of which continue to grow in complexity). Creating the potential for more surprises is not that big a deal in the overall scheme of things.
I am a little surprised and disturbed by this proposal.
Why would it be good that C semantics be randomly broken by a compiler in either an undocumented way, or so with so many criterion that they would be extremely hard to understand, and with probably every compiler doing subtlety different things?
If you are dissatisfied by the semantics of the language, why not try to change it in the future Cxx or create an other one, or just use one that already exists. The alternative is no more having a specification and instead having just a little piece of paper with “THE COMPILER DOES SOMETHING GOOD FOR YOU” written on it. I doubt serious users would continue to use such a language…
@xilun
There are two ways of looking at the semantics of code, 1) the behavior specified by the language semantics and 2) the behavior expected by the developer. We now have enough experience to know that there are often differences between these two ways of looking at code.
It is no good saying that developers should be better educated, on the whole writing code is done by those just starting out on their careers and who, once they gain sufficient experience, are promoted to a level where they don’t write code (unlike medicine where those cutting up the bodies are the most well trained people).
Then the only way to help uneducated programmers would be to align the specified language semantics with the behavior expected by the developers. I can’t see another way to have a language that is both well specified and matches developers expectations.