July 24, 2011 Derek Jones 7 comments

I would like to propose a new way of measuring the quality of a compiler’s code generator: The highest quality compiler is one that generates identical code for all programs that produce the same output, e.g., a compiler might spot programs that calculate pi and always generate code that uses the most rapidly converging method known. This is a very different approach to the traditional methods based on using (mostly) execution time or size (usually code but sometimes data) as a measure of quality.

Why is a new measurement method needed, and why choose this one? It is relatively easy for compiler vendors to tune their products to the commonly used benchmark and they seem to have lost their role as drivers for new optimization techniques. Different developers have different writing habits and companies should not have to waste time and money changing developer habits just to get the best quality code out of a compiler; compilers should handle differences in developer coding habits and not let it affect the quality of generated code. There are major savings to be had by optimizing the effect that developers are trying to achieve rather than what they have actually written (these days new optimizations targeting at what developers have written show very low percentage improvements).

Deducing that a function calculates pi requires a level of sophistication in whole program analysis that is unlikely to be available in production compilers for some years to come (ok, detecting 4*atan(1.0) is possible today). What is needed is a collection of compilable files containing source code that aims to achieve an outcome in lots of different ways. To get the ball rolling the “3n times 2” problem is presented as the first of this new breed of benchmarks.

The “3n times 2” problem is a variant on the 3n+1 problem that has been tweaked to create more optimization opportunities. One implementation of the “3n times 2” problem is:

if (is_odd(n))
   n = 3*n+1;
else
   n = 2*n; // this is n = n / 2; in the 3n+1 problem

There are lots of ways of writing code that has the same effect, some of the statements I have seen for calculating n=3*n+1 include: n = n + n + n + 1, n = (n << 1) + n + 1 and n *= 3; n++, while some of the ways of checking if n is odd include: n & 1, (n / 2)*2 != n and n % 2.

I have created a list of different ways in which 3*n+1 might be calculated and is_odd(n) might be tested and written a script to generate a function containing all possible permutations (to reduce the number of combinations no variants were created for the least interesting case of n=2*n, which was always generated in this form). The following is a snippet of the generated code (download everything):

if (n & 1) n=(n << 2) - n +1; else n*=2;
if (n & 1) n=3*n+1; else n*=2;
if (n & 1) n += 2*n +1; else n*=2;
if ((n / 2)*2 != n) { t=(n << 1); n=t+n+1; } else n*=2;
if ((n / 2)*2 != n) { n*=3; n++; } else n*=2;

Benchmarks need a means of summarizing the results and here I make a stab at doing that for gcc 4.6.1 and llvm 2.9, when executed using the -O3 option (output here and here). Both compilers generated a total of four different sequences for the 27 'different' statements (I'm not sure what to do about the inline function tests and have ignored them here) with none of the sequences being shared between compilers. The following lists the number of occurrences of each sequence, e.g., gcc generated one sequence 16 times, another 8 times and so on:

gcc    16   8   2   1
llvm   12   6   6   3

How might we turn these counts into a single number that enables compiler performance to be compared? One possibility is to award 1 point for each of the most common sequence, 1/2 point for each of the second most common, 1/4 for the third and so on. Using this scheme, gcc gets 20.625, and llvm gets 16.875. So gcc has greater consistency (I am loathed to use the much overused phrase higher quality).

Now for a closer look at the code generated.

Both compilers always generated code to test the least significant bit for the conditional expressions n & 1 and n % 2. For the test (n / 2)*2 != n gcc generated the not very clever right-shift/left-shift/compare while llvm and'ed out the bottom bit and then compared; so both compilers failed to handle what is a surprisingly common check for a number being odd.

The optimal code for n=3*n+1 on a modern x86 processor is (lots of register combinations are possible, let's assume rdx contains n) leal 1(%rdx,%rdx,2), %edx and this is what both compilers generated a lot of the time. This locally optimal code is not always generated because:

gcc fails to detect that (n << 2)-n+1 is equivalent to (n << 1)+n+1 and generates the sequence leal 0(,%rax,4), %edx ; subl %eax, %edx ; addl $1, %edx (I pointed this out to a gcc maintainer sometime ago, and he suggested reporting it as a bug). This 'bug' occurs three times in total.
For some forms of the calculation llvm generates globally better code by taking the else arm into consideration. For instance, when the calculation is written as n += (n << 1) +1 llvm deduces that (n << 1) and the 2*n in the else are equivalent, evaluates this value into a register before performing the conditional test thus removing the need for an unconditional jump around the 'else' code:
leal (%rax,%rax), %ecx testb $1, %al je .LBB0_8 # BB#7: orl $1, %ecx # deduced ecx is even, arithmetic unit not needed! addl %eax, %ecx .LBB0_8:
leal (%rax,%rax), %ecx testb $1, %al je .LBB0_8 # BB#7: orl $1, %ecx # deduced ecx is even, arithmetic unit not needed! addl %eax, %ecx .LBB0_8:
This more efficient sequence occurs nine times in total.

The most optimal sequence was generated by gcc:

	testb	$1, %dl
	leal	(%rdx,%rdx), %eax
	je	.L6
	leal	1(%rdx,%rdx,2), %eax
.L6:

with llvm and pre 4.6 versions of gcc generating the more traditional form (above, gcc 4.6.1 assumes that the 'then' arm is the most likely to be executed and trades off a leal against a very slow jmp):

	testb	$1, %al
	je	.LBB0_5
# BB#4:
	leal	1(%rax,%rax,2), %eax
	jmp	.LBB0_6
.LBB0_5:
	addl	%eax, %eax
.LBB0_6:

There is still room for improvement, perhaps by using the conditional move instruction (which gcc actually generates within the not-very-clever code sequence for (n / 2)*2 != n) or by using the fact that eax already holds 2*n (the potential saving would come through a reduction in complexity of the internal resources needed to execute the instruction).

llvm insists on storing the calculated value back into n at the end of every statement. I'm not sure if this is a bug or a feature designed to make runtime debugging easier (if so it ought to be switched off by default).

Missed optimization opportunities (not intended to be part of this benchmark and if encountered would require a restructuring of the test source) include noticing that if is odd then 3n+1 is always even, creating the opportunity to perform the following multiply by 2 without an if test.

Perhaps one day, compilers will figure out when a program is calculating pi and generate code that uses the best known algorithm. In the meantime, I am interested in hearing suggestions for additional different-algorithm-same-code benchmarks.

Categories: Uncategorized Tags: 3n+1, benchmark, code generation, compiler, gcc, llvm, optimizing, Pi, quality, testing

Secret instruction sets about to make a come-back

January 28, 2010 Derek Jones 4 comments

Now it has happened Apple’s launch of a new processor, the A4, seems such an obvious thing to do. As the manufacturer of propriety products Apple wants to have complete control over the software that customers run on those products. Using a processor whose instruction set and electrical signals are not publicly available goes a very long way to ensuring that somebody else’s BIOS/OS/drivers/etc do not replace Apple’s or that the distributed software is not ‘usefully’ patched (Apple have yet to reveal their intentions on publishing instruction set details, this is an off-the-cuff prediction on my part).

Why are Apple funding the development of the LLVM C/C++ compiler? Because it enables them to write a back-end for the A4 without revealing the instruction set (using gcc for this purpose would require that the source be distributed, revealing the instruction set). So much for my prediction that Apple will stop funding LLVM.

The landscape of compute intensive cpus used to be populated with a wide variety of offerings from many manufacturers. The very high price of riding the crest of Moore’s law is one of the reasons that Intel’s x86 acquired such a huge chunk of this market; one by one processor companies decided not to make the investment needed to keep their products competitive. Now that the applicability of Moore’s law is drawing to an end the treadmill of continued processor performance upgrades is starting to fading away. Apple are not looking at a processor upgrade cycle, that they need to keep up with to be competitive, that stretches very far into the future.

Isn’t keeping an instruction set secret commercial suicide? The way to sell hardware is to make sure that lots of software runs on it and software developers want instruction set details (well ok, a small number do, but I’m sure lots of others get a warm fuzzy feeling knowing that the information is available should they need it). This is very much a last century view. The world is awash with an enormous variety of software, including lot of it very useful open source, and there is a lot more information available about the set of applications many customers want to use most of the time. Other than existing practice there is little reason why a manufacturer of a proprietary product, that is not a processor, needs to release instruction set details.

In the early 1980s I was at the technical launch of the Inmos Transputer and was told that the instruction set would not be published because the company did not want to be tied down to backwards compatibility. Perish the thought that customers would write assembler because the Inmos supplied compilers produced poor quality code or that a third party compiler would good enough to gain a significant market share. In marketing ability Inmos was the polar opposite of Apple. For a while HP were not very forthcoming on the instruction set behind their PA-RISC processor.

Will the A4 instruction set remain secret? On the whole, perhaps it can. The software based approaches to reverse engineering an instruction set require access to a compiler that generates it. For instance, changing a plus to a minus in one expression and looking for the small change in the generated executable; figuring out which status flags are set under which conditions is harder, but given time it can be done. Why would Apple release executable compilers if it wants to control the software that runs on the A4?

If the instruction set were cracked where would a developer go to obtain a compiler targeting the A4?

Given the FSF‘s history with Apple I would not be surprised if there was a fatwa against support for a proprietary instruction set in gcc. I suspect Apple would frown very heavily on such support ever being in the standard llvm distribution. I could see Apple’s lawyers sending letters to anybody making available a compiler that generated A4 code.

In the past manufacturers have tried to keep processor instruction sets secret and found that commercial reality required them to make this information freely available. Perhaps in the long term, like Moore’s law, the publication of detailed instruction set information may just be a passing phase.

Categories: Uncategorized Tags: Apple's A4, instruction set, llvm, Moore's law, processor, Transputer

Assessing my predictions for 2009

January 24, 2010 Derek Jones No comments

I have been rather remiss in revisiting the predictions I made for 2009 to see how they fared. Only two out of the six predictions were sufficiently precise to enable an assessment one year later; the other four talking more about trends. What of these two predictions?

The LLVM project will die. Ok, the project is still going and at the end of December the compiler could build itself but the build is not yet in a state to self host (i.e., llvm compiler creates an executable of itself from its own source and this executable can build an executable from source that is identical to itself {modulo date/time stamps etc}). Self hosting is a major event and on some of the projects I have worked on it has been a contractual payment milestone.

Is llvm competition for gcc? While there might not be much commercial competition as such (would Apple be providing funding to gcc if it were not involved in llvm?), I’m sure that developers working on both projects want their respective compiler to be the better one. According to some llvm benchmarks they compile code twice as fast as gcc. If this performance difference holds up across lots of source how might the gcc folk respond? Is gcc compiler time performance an issue that developers care about or is quality of generated code more important? For me the latter is more important and I have not been able to find a reliable, recent, performance comparison. I understand that almost all gcc funding is targeted at code generation related development, so if compile time is an issue gcc may be in a hole.

I still don’t see the logic behind Apple funding this project and continue to think that this funding will be withdrawn sometime, perhaps I was just a year off. I do wish the llvm developers well and look forward to them celebrating a self hosted compiler.

Static analysis will go mainstream. Ok, I was overly optimistic here. There do not seem to have been any casualties in 2009 among the commercial vendors selling static analysis tools and the growth of PHP has resulted in a number of companies selling tools that scan for source security vulnerabilities (.e.g, SQL injection attacks).

I was hoping that the availability of Treehydra would spark the development of static analysis plugins for gcc. Perhaps the neatness of the idea blinded me to what I already knew; many developers dislike the warnings generated by the -Wall option and therefore might be expected to dislike any related kind of warning. One important usability issue is that Treehydra operates on the abstract syntax tree representation used internally by gcc, GIMPLE. Learning about this representation requires a big investment before a plugin developer becomes productive.

One tool that did experience a lot of growth in 2009 was Coccinelle, at least judged by the traffic on its mailing list. The Coccinelle developers continue to improve it and are responsive the questions. Don’t be put off by the low version number (currently 0.2), it is much better than most tools with larger version numbers.

Categories: Uncategorized Tags: Coccinelle, compiler, gcc performance, Gimple, llvm, predictions, static analysis, Treehydra

Compiler writing: The career path to World domination

November 7, 2009 Derek Jones 3 comments

Compiler writing is not usually thought of as a career path that leads to becoming Ruler of the World. Perhaps this is because compiler writing is a relatively new profession and us compiler writers are still toiling in obscurity awaiting the new dawn.

What might be a workable plan for a compiler writer to become Ruler of the World? One possibility is to write a compiler for the language in which most of the World’s critical software is written (i.e., C) and for that compiler to become the one that the vendors of this critical software all use (i.e., gcc). This compiler needs to do more that just compile the source code it is feed, it also needs to generate code that creates a backdoor in important programs (e.g., the login program).

But, you say, this cannot happen with gcc because its source is available for everybody to read (and spot any backdoor generator). In his 1984 Turing acceptance lecture Ken Thompson showed how a compiler could contain a backdoor that was not visible in its source. The idea is for the compiler writer to modify a compiler to detect when it is being used to compile itself and to insert the backdoor generating code into its own executable. This modified compiler is then used to compile itself and the resulting executable made the default compiler; the backdoor modifications are then removed from the compiler source, they are no longer needed because the previously compiled compiler will spot when it is being used to compile its own source and generate the code necessary to propagate the backdoor code into the executable it creates.

How would the world counter the appearance of such a modified gcc? Obviously critical programs would need to be recompiled by a version of gcc that did not contain the backdoor. Today there are several companies and many amateur groups that distribute their own Linux distributions which they build from source. It should be relatively easy to obtain a usable executable of gcc from 10 years ago; remember what is needed is a version capable of compiling the latest gcc sources.

The ideal time to create a backdoor’ed version of gcc is while its development was under the control of one person, so early in the development history that all versions available anywhere are very likely to be derived from it. How can we prove that the original author of gcc did not do just this?

It could be argued that the very substantial changes to the gcc sources (most of the source has probably been rewritten several times) mean that the coding patterns searched for by the executable to detect that it is compiling itself have long gone and at some point the backdoor failed to propagate itself to the next executable.

Compilers other than gcc might also include backdoors that propagate themselves. However, the method of propagation is likely to be different. Compiling the gcc sources with a non-gcc compiler creates an executable that should exhibit the same behavior as a gcc-compiled executable. Differences in the behavior of these independently built executables is a cause for concern (one difference might be caused by differences in the conversion of floating-point literals, a recent PhD thesis provides more detail).

The problem with compiling the gcc sources is that they make use of language extensions that few, if any, other compilers support. I know IBM added modified one of their C compilers to support those gcc extensions needed to compile the Linux kernel, but I don’t know if this compiler is capable of compiling the gcc sources. The LLVM project intended to support many gcc extensions but I don’t know if they aim to be able to compile the gcc sources.

Another option is to compare the assembler generated when gcc compiles itself against the corresponding source code. A very expensive task for source code measured in hundreds of thousands of lines. Adding the necessary language extension support to another compiler would probably be cheaper and also create a tool that could be used to check future releases of gcc.

Categories: Uncategorized Tags: backdoor, critical software, gcc, language extensions, llvm, source code, world domination

Newer Entries

The Shape of Code

Archive

Compiler benchmarking for the 21st century

Secret instruction sets about to make a come-back

Assessing my predictions for 2009

Compiler writing: The career path to World domination

Recent Posts

Recent Comments

Archives

Meta