Exercises in Programming Style: the python way
Exercises in Programming Style by Cristina Lopes is an interesting little book.
The books I have previously read on programming style pick a language, and then write various programs in that language using different styles, idioms, or just following quirky rules, e.g., no explicit loops, must use sets, etc. “Algorithms in Snobol 4” by James F. Gimpel is a fascinating read, but something of an acquired taste.
EPS does pick a language, Python, but the bulk of the book is really a series of example programs illustrating a language feature/concept that is central to a particular kind of language, e.g., continuation-passing style, publish-subscribe architecture, and reflection. All the programs implement the same problem: counting the number of occurrences of each word in a text file (Jane Austin’s Pride and Prejudice is used).
The 33 chapters are each about six or seven pages long, and contain a page or two or code. Everything is very succinct, and does a good job of illustrating one main idea.
While the first example does not ring true, things quickly pick up and there are lots of interesting insights to be had. The first example is based on limited storage (1,024 bytes), and just does not make efficient use of the available bits (e.g., upper case letters can be represented using 5-bits, leaving three unused bits or 37% of available storage; a developer limited to 1K would not waste such a large amount of storage).
Solving the same problem in each example removes the overhead of having to learn what is essentially housekeeping material. It also makes it easy to compare the solutions created using different ideas. The downside is that there is not always a good fit between the idea being illustrated and the problem being solved.
There is one major omission. Unstructured programming; back in the day it was just called programming, but then structured programming came along, and want went before was called unstructured. Structured programming allowed a conditional statement to apply to multiple statements, an obviously simple idea once somebody tells you.
When an if-statement can only be followed by a single statement, that statement has to be a goto
; an if
/else
is implemented as (using Fortran, I wrote lots of code like this during my first few years of programming):
IF (I .EQ. J) GOTO 100 Z=1 GOTO 200 100 Z=2 200 |
Based on the EPS code in chapter 3, Monolithic, an unstructured Python example might look like (if Python supported goto
):
for line in open(sys.argv[1]): start_char = None i = 0 for c in line: if start_char != None: goto L0100 if not c.isalnum(): goto L0300 # We found the start of a word start_char = i goto L0300 L0100: if c.isalnum(): goto L0300 # We found the end of a word. Process it found = False word = line[start_char:i].lower() # Ignore stop words if word in stop_words: goto L0280 pair_index = 0 # Let's see if it already exists for pair in word_freqs: if word != pair[0]: goto L0210 pair[1] += 1 found = True goto L0220 L0210: pair_index += 1 L0220: if found: goto L0230 word_freqs.append([word, 1]) goto L0300 L0230: if len(word_freqs) <= 1: goto L0300: # We may need to reorder for n in reversed(range(pair_index)): if word_freqs[pair_index][1] <= word_freqs[n][1]: goto L0240 # swap word_freqs[n], word_freqs[pair_index] = word_freqs[pair_index], word_freqs[n] pair_index = n L0240: goto L0300 L0280: # Let's reset start_char = None L0300: i += 1 |
If you do feel a yearning for the good ol days, a goto package is available, enabling developers to write code such as:
from goto import with_goto @with_goto def range(start, stop): i = start result = [] label .begin if i == stop: goto .end result.append(i) i += 1 goto .begin label .end return result |
Fortran 2008 Standard has been updated
An updated version of ISO/IEC 1539-1 Information technology — Programming languages — Fortran — Part 1: Base language has just been published. So what has JTC1/SC22/WG5 been up to?
This latest document is bug a release of the 2010 standard, known as Fortran 2008 (because the ANSI Standard from which the ISO Standard was derived, sed -e "s/ANSI/ISO/g" -e "s/National/International/g"
, was published in 2008) and incorporates all the published corrigenda. I must have been busy in 2008, because I did not look to see what had changed.
Actually the document I am looking at is the British Standard. BSI don’t bother with sed
, they just glue a BSI Standards Publication page on the front and add BS to the name, i.e., BS ISO/IEC 1539-1:2010.
The interesting stuff is in Annex B, “Deleted and obsolescent features” (the new features are Fortranized versions of languages features you have probable seen elsewhere).
Programming language committees are known for issuing dire warnings that various language features are obsolescent and likely to be removed in a future revision of the standard, but actually removing anything is another matter.
Well, the Fortran committee have gone and deleted six features! Why wasn’t this on the news? Did the committee foresee the 2008 financial crisis and decide to sneak out the deletions while people were looking elsewhere?
What constructs cannot now appear in conforming Fortran programs?
- “Real and double precision DO variables. .. A similar result can be achieved by using a DO construct with no loop control and the appropriate exit test.”
What other languages call a for-loop, Fortran calls a DO loop. So loop control variables can no longer have a floating-point type.
- “Branching to an END IF statement from outside its block.”
An if-statement is terminated by the token sequence
END IF
, which may have an optional label. It is no longer possible toGOTO
that label from outside the block of the if-statement. You are going to have to label the statement after it. - “PAUSE statement.”
This statement dates from the days when a computer (singular, not plural) had its own air-conditioned room and a team of operators to tend its every need. A
PAUSE
statement would cause a message to appear on the operators’ console and somebody would be dispatched to check the printer was switched on and had paper, or some such thing, and they would then resume execution of the paused program.I think WG5 has not seen the future here. Isn’t the
PAUSE
statement needed again for cloud computing? I’m sure that Amazon would be happy to quote a price for having an operator respond to aPAUSE
statement. - “ASSIGN and assigned GO TO statements and assigned format specifiers.”
No more assigning labels to variables and GOTOing them, as a means of leaping around 1,000 line functions. This modern programming practice stuff is a real killjoy.
- “H edit descriptor.”
First programmers stopped using punched cards and now the H edit descriptor have been removed from Fortran; Herman Hollerith no longer touches the life of working programmers.
In the good old days real programmers wrote
11HHello World
. Using quote delimiters for string literals is for pansies. - “Vertical format control. … There was no standard way to detect whether output to a unit resulted in this vertical format control, and no way to specify that it should be applied; this has been deleted. The effect can be achieved by post-processing a formatted file.”
Don’t panic, C still supports the
\v
escape sequence.
Undefined behavior is a design decision
Every few years or so some group of people in the C/C++ community start writing about the constructs specified as having undefined behavior in those languages. A topic that always seems to be skipped is why a language committee would choose to specify that the behavior in a particular case was undefined.
A quick refresher for readers on the definition of Undefined behavior, from the C Standard: “behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements”. The two key features are that the behavior applies when an error has occurred and any behavior whatsoever is permitted after one of these errors occurs. Examples of constructs that have undefined behavior are divide by zero, the result of an arithmetic operation on a signed value not being representable in its type (i.e., overflowing) and indexing an array outside of its defined bounds.
The point to note about all undefined behaviors is that the C/C++ language committee could have chosen to specify the behavior that a conforming implementation is required to support. Some language specifications do attempt to explicitly define the behavior for all constructs, e.g., Java, while other languages have a smaller set of undefined behaviors (e.g., Ada, which uses the term Bounded error instead of undefined behavior; there are 35 of them in Ada 2005). To understand why languages take these different approaches we need to look at the language design aims.
The design aims of C included it being implementable for any processor and for the generated code to be efficient (I’m not sure to what extent these might still be major design aims for C++). Computing systems come in all shapes and sizes, some with hundreds of bytes of memory and others with gigabytes, some raise exceptions when certain operations occur while others set processor status flags and others don’t do anything special.
A willingness to accept whatever behavior happens to occur, in an error situation, is the price that has to be paid for efficient execution on a wide range of disparate processors. The C/C++ designers were willing to pay this price while while the Java designers were not, with the Ada designers willing to tolerate less variability than C/C++.
Undefined behavior need not be nasty behavior, an implementation could chose to generate a helpful message or try to recover from it.
There are C tools and compilers (when certain options are specified) that check, at runtime, for various kinds of undefined behavior. I am in the minority in having Boundschecker installed as my default C compiler (as the name suggests it checks that array and pointer accesses to an object are within the defined bounds); for reasons I don’t understand few C/C++ developers are willing to use tools like this. For production code I use a non-boundschecking compiler; I don’t know whether Ada programs are tested with the mandated bounds checking switched on and then have it switched off for the production version (this is what Pascal developers do, in my experience). Of course Java developers have no choice but to permanently live with checking turned on.
The number of companies that make a living selling runtime checking tools is a small percentage of the number of companies based on selling static analysis tools. There continues to be a steady stream of runtime checking tools appearing and quickly disappearing, but until a developers start being sent to jail for faults in their code I don’t foresee the market growing.
Optimizations to figure out when code need not be generated to perform a bounds check because the access is known to be within bounds is an active research area. These days the performance penalty is not so much executing the checking instructions but the disruption to the instruction pipeline caused by the branches that might be taken (if the bounds check fails).
The cost of all the checking required by Java is that the minimal permitted configuration requires at least 256K of memory (Oracle’s K virtual machine, used by the Java Micro Edition which is intended for embedded systems, also makes floating-point optional and allows implementations some freedom in how some constructs are handled). So the Java motto really out to be “Write once, run anywhere with at least 256K and don’t depend on floating-point”.
I have heard stories of Ada code being liberally scattered with various forms of unchecked (how the developer tells the compiler not to do any runtime checking) but have not seen any empirical analysis (a study of goto usage in Ada did not have any trouble finding plenty of uses to analyze).
Recent Comments