Eiffel, English and Erlang
E is for Eiffel, English and Erlang.
Eiffel was a contender to be THE OO language of choice for developers. The Bertrand Meyer‘s book on how to write OO software using Eiffel blew me away, I was convinced that this was the way forward. But to be widely used a language needs popular compilers on the mainstream platforms, and Meyer’s Eiffel compiler was a commercial product of a company he had started around the language. Eiffel may have been a much better language than C++ back in 1986 (or even today), but Cfront was available for free, for non-commercial use, and the Zortech C++ compiler came out at a very low price point in 1988 (developers hate paying for the tools of their trade). Meyer and his research group are still plugging away at Eiffel today and it probably has supporters all around the world, but I have never met one.
English is effortlessly spoken by hundreds of millions of people; how much easier programming would be if it could be done in English. The fact that little effort is required to see through this idea on so many levels (not least of which is the fact that most people are terrible at writing English) has not prevented a few misguided souls implementing the idea in some form or another. Grace Hopper can be forgiven for thinking that using English keywords in Cobol would make it easier to use, computer languages were brand new in the 1950s.
The Osmosian Order have created an English-like language, plus implementation and IDE written in their language, that is the best of its kind I have seen (I have only read through the compiler source and not written any non-trivial code). The Attempto project would be a good starting point for anybody looking to create an even more ambitious ‘English’ compiler.
Erlang is one of those languages whose usage continues to quietly grow and spread. Having a widely available usable compiler is a necessary but not sufficient condition for a language to grow and spread, the language has to be very good for solving an important and commonly occurring problem. Erlang supports language-level features (i.e., not library calls) that make it relatively easy to write programs that create and manage processes.
Things to read
Object-Oriented Software Construction by Bertrand Meyer (get the shorter, more readable, 1988 edition).
Representation and Inference for Natural Language: A First Course in Computational Semantics by Patrick Blackburn and Johan Bos (an early draft).
Longman Grammar of Spoken and Written English by Douglas Biber, Stig Johansson, Geoffrey Leech, Susan Conrad, and Edward Finegan. For people who need to move up from reading dictionaries.
The Semantics of English Prepositions. Spatial Scenes, Embodied Meaning and Cognition by Andrea Tyler and Vyvyan Evans. Full of delightful examples, targeting a tiny fraction of the language, that are ideal for illustrating that English is not at all simple and unambiguous.
Plain English Programming by The Osmosian Order of Plain English Programmers (I found the compiler source more readable).
The 30% of source that is ignored
Approximately 30% of source code is not checked for correct syntax (developers can make up any rules they like for its internal syntax), semantic accuracy or consistency; people are content to shrug their shoulders at this this state of affairs and are generally willing to let it pass. I am of course talking about comments; the 30% figure comes from my own measurements with other published measurements falling within a similar ballpark.
Part of the problem is that comments often contain lots of natural language (i.e., human not computer language) and this is known to be very difficult to parse and is thought to be unusable without all sorts of semantic knowledge that is not currently available in machine processable form.
People are good at spotting patterns in ambiguous human communication and deducing possible meanings from it, and this has helped to keep comment usage alive, along with the fact that the information they provide is not usually available elsewhere and comments are right there in front of the person reading the code and of course management loves them as a measurable attribute that is cheap to do and not easily checkable (and what difference does it make if they don’t stay in sync with the code).
One study that did attempt to parse English sentences in comments found that 75% of sentence-style comments were in the past tense, with 55% being some kind of operational description (e.g., “This routine reads the data.”) and 44% having the style of a definition (e.g., “General matrix”).
There is a growing collection of tools for processing natural language (well at least for English). However, given the traditionally poor punctuation used in comments, the use of variable names and very domain specific terminology, full blown English parsing is likely to be very difficult. Some recent research has found that useful information can be extracted using something only a little more linguistically sophisticated than word sense disambiguation.
The designers of the iComment system sensibly limited the analysis domain (to memory/file lock related activities), simplified the parsing requirements (to looking for limited forms of requirements wording) and kept developers in the loop for some of the processing (e.g., listing lock related function names). The aim was to find inconsistencies between the requirements expressed in comments and what the code actually did. Within the Linux/Mozilla/Wine/Apache sources they found 33 faults in the code and 27 in the comments, claiming a 38.8% false positive rate.
If these impressive figures can be replicated for other kinds of coding constructs then comment contents will start to leave the dark ages.