December 4, 2022 Derek Jones 4 comments

Programming do-gooders sometimes fall into the trap of thinking that banning the use of a problematic language construct removes the possibility of the problems associated with that construct’s usage construct from occurring. The do-gooders overlook the fact that developers use language constructs because they solve a coding need, and that banning usage does not make the coding need go away. If a particular usage is banned, then developers have to come up with an alternative to handle their coding need. The alternative selected may have just as many, or more, problems associated with its use as the original usage.

The C++ committee has fallen into this do-gooder trap by deprecating the use of some unary operators (i.e., ++ and --) and compound assignment operators (e.g., += and &=) on objects declared with the volatile type-specifier. The new wording appears in the 2020 version of the C++ Standard; see sections 7.6.1.5, 7.6.2.2, 7.6.19, and 9.6.

Listing a construct as being deprecated gives notice that it might be removed in a future revision of the standard (languages committees tend to accumulate deprecated constructs and rarely actually remove a construct; breaking existing code is very unpopular).

What might be problematic about objects declared with the volatile type-specifier?

By declaring an object with the volatile type-specifier a developer is giving notice that its value can change through unknown mechanisms at any time. For instance, an array may be mapped to the memory location where the incoming bytes from a communications port are stored, or the members of a struct may represent the various status and data information relating to some connected hardware device.

The presence of volatile in an object’s declaration requires that the compiler not optimise away assignments or accesses to said object (because such assignments or accesses can have effects unknown to the compiler).

   volatile int k = 0;
   int i = k, // value of k not guaranteed to be 0
       j = k; // value of k may have changed from that assigned to i
 
   if (i != j)
      printf("The value of k changed from %d to %d\n", i, j);

If, at some point in the future, developers cannot rely on code such as k+=3; being supported by the compiler, what are they to do?

Both the C and C++ Standards state:
“The behavior of an expression of the form E1 op = E2 is equivalent to E1 = E1 op E2 except that E1 is evaluated only once.”

So the code k=k+3; cannot be relied upon to have the same effect as k+=3;.

One solution, which does not make use of any deprecated language constructs, is:

   volatile int k;
   int temp;
   /* ... */
   temp=k;
   temp+=3;
   k=temp;

In what world is the above code less problematic than writing k+=3;?

I understand that in the C++ world there are templates, operator overloading, and various other constructs that can make it difficult to predict how many times an object might be accessed. The solution is to specify the appropriate behavior for volatile objects in these situations. Simply deprecating them for some operators is all cost for no benefit.

We can all agree that the use of volatile has costs and benefits. What is WG21’s (the ISO C++ Committee) cost/benefit analysis for deprecating this usage?

The WG21 proposal P1152, “Deprecating volatile”, claims that it “… preserves the useful parts of volatile, and removes the dubious / already broken ones.”

The proposal is essentially a hatchet job, with initial sections written in the style of the heroic fantasy novel The Name of the Wind, where “…kinds of magic are taught in the university as academic disciplines and have daily-life applications…”; cut-and-pasting of text from WG14 (ISO C committee) documents and C++17 adds bulk. Various issues unrelated to the deprecated constructs are discussed, and it looks like more thought is needed in some of these areas.

Section 3.3, “When is volatile useful?”, sets the tone. The first four paragraphs enumerate what volatile is not useful, before the fifth paragraph admits that “volatile is nonetheless a useful concept to have …” (without listing any reasons for this claim).

How did this deprecation get accepted into the 2020 C++ Standard?

The proposal appeared in October 2018, rather late in the development timeline of a standard published in 2020; were committee members punch drunk by this stage, and willing to wave through what appears to be a minor issue? The document contains 1,662 pages of close text, and deprecation is only giving notice of something that might happen in the future.

Soon after the 2020 Standard was published, the pushback started. Proposal P2327, “De-deprecating volatile compound operations”, noted: “deprecation was not received too well in the embedded community as volatile is commonly used”. However, the authors don’t think that ditching the entire proposal is the solution, instead they propose to just de-deprecate the bitwise compound assignments (i.e., |=, &=, and ^=).

The P2327 proposal contains some construct usage numbers, obtained by grep’ing the headers of three embedded SDKs. Unsurprisingly, there were lots of bitwise compound assignments (all in macros setting various flags).

I used Coccinelle to detect actual operations on volatile objects in the Silabs Gecko SDK C source (one of the SDKs measured in the proposal; semgrep handles C and C++, but does not yet fully handle volatile). The following table shows the number of occurrences of each kind of language construct on a volatile object (code and data):

 Construct    Occurrences
    V++            83
    V--             5
    ++V             9
    --V             2
    bit assign    174
    arith assign   27

Will the deprecated volatile usage appear in C++23? ~~Probably, purely because the deadline for change has passed. Given WG21’s stated objective of a 3-year iteration, the debate will have to wait for work on to start on C++26.~~

The increment and decrement deprecation remains. In response to National Body comments on balloting of the draft C++23 document, it was “resolved to un-deprecate all volatile compound assignments.” Thanks to Jonathan Wakely for the correction.

Categories: Uncategorized Tags: C, coding guideline, embedded systems, language, Standard, volatile

C code is 90% unspecified behavior: more uninformed scare mongering

March 19, 2015 Derek Jones 1 comment

Another C coding guidelines document, another clueless blanket ban on use of code containing unspecified behavior (no link so its visibility is not increased; the 90% is a back of the envelope calculation, knock yourself out here).

The C Standard defines unspecified behavior as “… provides two or more possibilities and imposes no further requirements on which is chosen in any instance.” Given this one item of information a ban on using constructs that contain unspecified behavior appears to be a good idea (writing code where the compiler gets to choose among several possible choices of behavior does not sound like recipe for consistent program behavior).

What most people lack when thinking about unspecified behavior is an understanding of the design aims for the production of the C Standard; the aim was to be concise. An example of this conciseness is the wording for the order of evaluation of subexpressions “… the order in which side effects take place are both unspecified.”

Consider the subexpression x+y; should the compiler evaluate x first (putting its value in a register) and then y (putting its value in another register), or should it evaluate y followed by x? It most situations the final result does not depend on the choice of evaluation order and the Standard gives the compiler the freedom to choose the order that produces the best quality code.

A coding guideline that bans the use of code containing unspecified behavior bans the use of any binary operator (assignment is a binary operator in C, ruling out use of the statement z=0;). The only executable statements that could be written, following this guideline, would be calls of functions containing zero or one argument (order of evaluation is unspecified, which rules out calls containing two arguments) or global variables appearing on their own in an expression statement.

One case where operand evaluation order matters is printf("Hello")+printf("World"), which can result in either HelloWorld or WorldHello being printed (printf returns the number of characters written). This is an example of the kind of usage that the authors of coding guideline want to ban.

Coming up with guideline wording that delineates the undesirable unspecified behaviors from the harmless ones is hard. Requiring that the external behavior of code does not depend on the compiler’s choice of unspecified behavior is one possibility (now that power consumption can be an external behavior of note, this framing could be too narrow). The wording used by MISRA C is “No reliance shall be placed on … unspecified behavior”; this raises the flag that it is possible to rely on unspecified behavior and leaves it up to others to fill in the details.

Categories: Uncategorized Tags: C, coding guideline, unspecified

Using identifier prefixes results in more developer errors

April 25, 2012 Derek Jones 1 comment

Human speech communication has to be processed in real time using a cpu with a very low clock rate (i.e., the human brain whose neurons fire at rates between 10-100 Hz). Biological evolution has mitigated the clock rate problem by producing a brain with parallel processing capabilities and cultural evolution has chipped in by organizing the information content of languages to take account of the brains strengths and weaknesses. Words provide a good example of the way information content can be structured to be handled by a very slow processor/memory system, e.g., 85% of English words start with a strong syllable (for more details search for initial in this detailed analysis of human word processing).

Given that the start of a word plays an important role as an information retrieval key we would expect the code reading performance of software developers to be affected by whether the identifiers they see all start with the same letter sequence or all started with different letter sequences. For instance, developers would be expected to make fewer errors or work quicker when reading the visually contiguous sequence consoleStr, startStr, memoryStr and lineStr, compared to say strConsole, strStart, strMemory and strLine.

An experiment I ran at the 2011 ACCU conference provided the first empirical evidence of the letter prefix effect that I am aware of. Subjects were asked to remember a list of four assignment statements, each having the form id=constant;, perform an unrelated task for a short period of time and then recall information about the previously seen constants (e.g., their value and which variable they were assigned to).

During recall subjects saw a list of five identifiers and one of the questions asked was which identifier was not in the previously seen list? When the list of identifiers started with different letters (e.g., cat, mat, hat, pat and bat) the error rate was 2.6% and when the identifiers all started with the same letter (e.g., pin, pat, pod, peg, and pen) the error rate was 5.9% (the standard deviation was 4.5% and 6.8% respectively, but ANOVA p-value was 0.038). Having identifiers share the same initial letter appears to double the error rate.

This looks like great news; empirical evidence of software developer behavior following the predictions of a model of human human speech/reading processing. A similar experiment was run in 2006, this asked subjects to remember a list of three assignment statements and they had to select the ‘not seen’ identifier from a list of four possibilities. An analysis of the results did not find any statistically significant difference in performance for the same/different first letter manipulation.

The 2011/2006 experiments throw up lots of questions, including: does the sharing a prefix only make a difference to performance when there are four or more identifiers, how does the error rate change as the number of identifiers increases, how does the error rate change as the number of letters in the identifier change, would the effect be seen for a list of three identifiers if there was a longer period between seeing the information and having to recall it, would the effect be greater if the shared prefix contained more than one letter?

Don’t expect answers to appear quickly. Experimenting using people as subjects is a slow, labour intensive process and software developers don’t always answer the question that you think they are answering. If anybody is interested in replicating the 2011 experiment the tools needed to generate the question sheets are available for download.

For many years I have strongly recommended that developers don’t prefix a set of identifiers sharing some attribute with a common letter sequence (its great to finally have some experimental backup, however small). If it is considered important that an attribute be visible in an identifiers spelling put it at the end of the identifier.

See you all at the ACCU conference tomorrow and don’t forget to bring a pen/pencil. I have only printed 40 experiment booklets, first come first served.

Categories: Uncategorized Tags: ACCU, brain, coding guideline, culture, error, evolution, experiment, human, human memory, identifier, language, speech

sizeof i++

November 11, 2009 Derek Jones No comments

It is quite common for coding guideline documents to contain at least one guideline recommending against the use of a construct that developers very rarely use, for instance: “The operand of the sizeof operator shall not contain side-effects.”

... sizeof i++; // Is the author expecting i to be incremented?

Why do such recommendations get incorporated into guideline documents? The obvious answer is that their author(s) are unaware of actual developer usage and believe the recommendation has value.

I have heard people claim that such recommendations are harmless, after all the adherence cost is minimal. Besides, the fact that code very likely already adheres to it will increase the “pass” rate for the set of guidelines the first time developers check their code. However, such guidelines unjustifiably increase peoples confidence in software (as measured by the number of guidelines adhered to). They not only fail to add value to a set of coding guidelines, but their presence could result in the probity of the other guidelines being questioned.

I continue to be surprised by the amount of resistance encountered by my attempts to have the “sizeof” guideline removed from, or not included in, a set of coding guidelines. In the case of an existing set of guidelines there is obviously a resistance to change, but I have not yet managed to extract a single promise to consider removing the guideline in a future revision.

People seem unimpressed by the amount of source code I have searched in a vain attempt to find a violation of the “sizeof” guideline, but they often have some vague memory of having seen an instance of this elusive usage. My questions asking after the name of the source file, the name of the program, the name of the project, or simply the name of the company they were working for at the time are greeted with uncertainty. Perhaps the only instance they have seen is in the example underneath the text of the recommendation? Growls and pointed looks.

Another factor is existing practice, if it appears in other guideline documents it must have some benefit. People don’t want to go out on a limb. Besides, basing decisions on measurement of source code, who does that these days?

Whatever else might be said about the “sizeof” guideline, it does make a great example that developers can use to regale management.

Superior tone: Less experienced developers
Earnest voice: don’t always have a complete understanding of C/C++
Shock: They make the mistake of thinking
Talking very fast: that the code sizeof i++
Incredulity: will cause i to be incremented!
Emphasis: This guideline
Relieved voice: ensures that this mistake os warned about.

This article originally appeared in an earlier blog of mine which I did not keep up.

Categories: Uncategorized Tags: coding guideline, management, sizeof

The Shape of Code

Archive

C++ deprecates some operations on volatile objects

C code is 90% unspecified behavior: more uninformed scare mongering

Using identifier prefixes results in more developer errors

sizeof i++

Recent Posts

Recent Comments

Archives

Meta