March 15, 2020 Derek Jones 3 comments

Exercises in Programming Style by Cristina Lopes is an interesting little book.

The books I have previously read on programming style pick a language, and then write various programs in that language using different styles, idioms, or just following quirky rules, e.g., no explicit loops, must use sets, etc. “Algorithms in Snobol 4” by James F. Gimpel is a fascinating read, but something of an acquired taste.

EPS does pick a language, Python, but the bulk of the book is really a series of example programs illustrating a language feature/concept that is central to a particular kind of language, e.g., continuation-passing style, publish-subscribe architecture, and reflection. All the programs implement the same problem: counting the number of occurrences of each word in a text file (Jane Austin’s Pride and Prejudice is used).

The 33 chapters are each about six or seven pages long, and contain a page or two or code. Everything is very succinct, and does a good job of illustrating one main idea.

While the first example does not ring true, things quickly pick up and there are lots of interesting insights to be had. The first example is based on limited storage (1,024 bytes), and just does not make efficient use of the available bits (e.g., upper case letters can be represented using 5-bits, leaving three unused bits or 37% of available storage; a developer limited to 1K would not waste such a large amount of storage).

Solving the same problem in each example removes the overhead of having to learn what is essentially housekeeping material. It also makes it easy to compare the solutions created using different ideas. The downside is that there is not always a good fit between the idea being illustrated and the problem being solved.

There is one major omission. Unstructured programming; back in the day it was just called programming, but then structured programming came along, and want went before was called unstructured. Structured programming allowed a conditional statement to apply to multiple statements, an obviously simple idea once somebody tells you.

When an if-statement can only be followed by a single statement, that statement has to be a goto; an if/else is implemented as (using Fortran, I wrote lots of code like this during my first few years of programming):

      IF (I .EQ. J)
      GOTO 100
      Z=1
      GOTO 200
100   Z=2
200

Based on the EPS code in chapter 3, Monolithic, an unstructured Python example might look like (if Python supported goto):

for line in open(sys.argv[1]):
    start_char = None
    i = 0
    for c in line:
        if start_char != None:
           goto L0100
        if not c.isalnum():
           goto L0300
        # We found the start of a word
        start_char = i
        goto L0300
        L0100:
        if c.isalnum():
           goto L0300
        # We found the end of a word. Process it
        found = False
        word = line[start_char:i].lower()
        # Ignore stop words
        if word in stop_words:
           goto L0280
        pair_index = 0
        # Let's see if it already exists
        for pair in word_freqs:
            if word != pair[0]:
               goto L0210
            pair[1] += 1
            found = True
            goto L0220
            L0210:
            pair_index += 1
        L0220:
        if found:
           goto L0230
        word_freqs.append([word, 1])
        goto L0300
        L0230:
        if len(word_freqs) <= 1:
           goto L0300:
        # We may need to reorder
        for n in reversed(range(pair_index)):
            if word_freqs[pair_index][1] <= word_freqs[n][1]:
               goto L0240
            # swap
            word_freqs[n], word_freqs[pair_index] = word_freqs[pair_index], word_freqs[n]
            pair_index = n
            L0240:
        goto L0300
        L0280:
        # Let's reset
        start_char = None
        L0300:
        i += 1

If you do feel a yearning for the good ol days, a goto package is available, enabling developers to write code such as:

from goto import with_goto
 
@with_goto
def range(start, stop):
    i = start
    result = []
 
    label .begin
    if i == stop:
        goto .end
 
    result.append(i)
    i += 1
    goto .begin
 
    label .end
    return result

Categories: Uncategorized Tags: books, goto, Python, style

The most worthwhile R coding guidelines I know

March 2, 2013 Derek Jones 2 comments

Since my post questioning whether native R usage exists (e.g., a common set of R coding patterns) several people have asked about coding/style guidelines for R. My approach to style/coding guidelines is economic, adhering to a guideline involves paying a cost now for some future benefit. Obviously to be worthwhile the benefit must be greater than the cost, there is also the issue of who pays the cost and who reaps the benefit (why would anybody pay the cost if somebody else reaps the benefit?). The following three topics are probably where the biggest benefits are to be had and only the third is specific to R (and given the state of my R knowledge may be wrong).

Comment your code. Investing 5-10 seconds per few lines of code now could save substantially more time at some future date. Effective commenting is a skill that has to be learned, start learning now. Think of commenting as sending a text message or tweet to the person you will be in 6 months time (i.e., the person who can hum the tune but has forgotten the details).

Consistently use variable names that mean something to you. This should be a sub 2-second decision that is probably going to save you no more than 5-10 seconds, but in many cases you reap the benefit soon after the investment, without having to wait many months. Names evoke associations in your mind, take advantage of this associative lookup to reduce the cognitive load of working with your code. Effective naming is a skill that has to be learned, start learning now. There are people who ignore the evidence that different people’s linguistic preferences and associations can be very different and insist that everybody adhere to one particular naming convention; ignore them.

Code organization and structure. Experience shows that there are ways of organizing and structuring +1,000 line programs that have a significant impact on the effort needed to actively work on the code, the more code there is the greater the impact. R programs tend to be short, say around 100 lines (I dare say much longer ones exist). Apart from recommending that code be broken up into separate functions, I cannot think of any organizational/structural issue that is worth recommending for 100 lines of code (if you don’t appreciate the advantage of using separate functions you need some hands on training, not words in a blog post).

Is that it, are there no other worthwhile recommendations? There might be, I just don’t have enough experience using R to know. Does anybody else have enough experience to know? I suspect not; where would they have gotten the information needed to do the cost/benefit analysis? Even in the rare case where a detailed analysis is made for a language the results are rather thin on the ground and somewhat inconclusive.

What is the reason behind those R style guides/coding guideline documents that have been written? The following are some possibilities:

reducing maintenance costs (the official reason touted by purveyors of received wisdom): this is a very good reason that is let down by the complete lack of any empirical evidence that following any guidelines makes the slightest difference to maintenance costs. You R users are likely to have a lot more experience than me dealing with people claiming stuff for which no there is evidence and I will not presume to suggest how you might handle such claims (if somebody does show you some good data do please send me a copy),
marketing (sometimes openly given as a reason): managers like to tell + customers like to hear about the existence of such a document and its role in ensuring delivery of a quality product. If you are being shown around a company and are told that they follow some style guideline its always interesting to see what happens when you ask to see a copy of this guideline document, e.g., not being able to find a copy is a surprisingly common occurrence.
fashion (rarely admitted to): behaving like a herd and following trend setters is a common human trait, not only are there lots of ways of designing clothes but there are lots of ways in which code can be written. What kind of manager wants to have unfashionable developers working for them and who wouldn’t like to take a few days off to attend a boutique conference or chat to a friendly uncle (these guys can be messianic speakers and questioning them about lack of evidence can draw a negative response from the crowd).

and no, I don’t have any empirical data to backup my guidelines 🙁

Categories: Uncategorized Tags: coding guidelines, comments, cost/benefit, fashion, naming, R, style

The Shape of Code

Archive

Exercises in Programming Style: the python way

The most worthwhile R coding guidelines I know

Recent Posts

Recent Comments

Archives

Meta