Archive
R now has its own shelf in Dillons
I was in Dillons, the one opposite University College London, at the start of the week and what did I spy there?

There is now a bookshelf devoted to R (right, second from top) in the programming languages section. The shelf would be a lot fuller if O’Reilly did not have a complete section devoted to their books.
A trolley of C/C++ books was waiting to refill the shelves near the door.

Being adjacent to a university means that programming language books make up a much larger percentage of software books.

And there is O’Reilly in the corner with two stacks of shelves.

And yes, this is a big bookshop, the front is a complete block; computing/mathematics/physics/chemistry/engineering/medicine are in the basement. You can buy skeletons and stethoscopes in the medical section a few rooms down from computing; a stethoscope is useful for locating strange noises in computer cases without having to open them.

Readers a bit younger than me probably know this shop as Waterstones.
Writing language standards is a cottage industry
In the beginning programming language standards were written by one country’s National Standards body (e.g., ANSI did C/Cobol/Fortran for the USA and BSI did Pascal for the UK) and other countries were free to write their own version, adopt the existing work or do nothing (I don’t know of any country writing their own version, a few countries sometimes stuck their own front page on an existing document and the majority did nothing; update 4 Dec 2012, thanks to David Muxworthy for pointing out that around 1974 the UK, US, Japan and ECMA were all independently developing a standard for BASIC, by 1982 this had evolved to just ANSI and ECMA).
The UK people who created the Pascal Standard wanted the rest of the world (i.e., the US) to adopt it, and the way to do this was to have it adopted as an ISO Standard. The experience of making this happen convinced the folk at BSI that in future, language standards should be produced as an international effort within ISO (those pesky Americans wanted changes made to the document before they would vote for it).
During the creation of the first C Standard various people from Europe joined the ANSI committee, X3J11, so they could take part. Initially the US members were not receptive to the European request for a mechanism to handle keyboards that did not contain certain characters (e.g., left/right square brackets) but responded promptly on hearing that those (pesky) Europeans planned to publish an ISO C Standard that would contain those changes to the ANSI Standard needed to support trigraphs; the published ANSI Standard included support for trigraphs. The C ANSI committee were very receptive to the idea of future work being done at the ISO level; Bill Plauger/Tom Plum did a lot of good work to ensure it happened.
The C++ language came along and long story short an ISO committee was set up to create an ISO Standard for it, then Java came along and the Java Study Group failed to become an ISO committee and then various non-specific language committees happened.
A look at the SC22 website shows that ISO Standards exist for Forth and ECMAscript (it has not yet been updated to include Ruby) with no corresponding ISO committees. What is going on?
One could be cynical and say that special interests are getting a document of their choosing accepted by ECMA and then abusing the ISO fast track procedure to sidestep the need to set up an international committee that has the authority to create a document of its choosing. The reality is that unless a language is very widely used by lots of people (e.g., in the top five or so most commonly used languages) there are unlikely to be enough people (or employers) willing and able to commit the time and money needed to be actively involved in an ISO Standard committee.
Once a document has been fast tracked to become ISO Standard, any updates to it are supposed to be carried out under ISO rules (i.e., an ISO committee). In practice this is not happening with ECMAscript which continues to be very active (I don’t know what is happening with Forth or how the Ruby people plan to handle any updates), holding bi-monthly meetings; over the years they have fast tracked two revisions to the original fast tracked document (the UK did raise the issue during balloting but nothing came of it, I don’t think anybody really cares).
Would moving the ECMAscript development work from ECMA to ISO make a worthwhile difference? There might be a few people out there who would attend an ISO meeting who are not currently attending ECMA meetings (to join ECMA companies with five or fewer employees pay an annual fee of 3,500 Swiss francs {about the same number of US dollars} and larger companies pay a lot more) but I suspect the number would not be large enough to make up for the extra hassle of running an ISO committee (e.g., longer ISO balloting timescales).
Production of programming language standards is really a cottage industry that relies on friends in high places (e.g., companies with an existing membership of ECMA or connections into the local country standards’ body) for them to appear on the international stage.
Learning R as a language
Books written to teach a general purpose programming language are usually organized according to the features of the language and examples often show how a particular language feature is interpreted by a compiler. Books about domain specific languages are usually organized in a way that makes sense in the corresponding application domain and examples usually illustrate how a particular domain problem can be solved using the language.
I have spent a lot of time using R over the last year and by dint of reading lots of R code and various introductions to the language I have managed to piece together a model of the language. I rarely have any trouble learning a general purpose language from its reference manual, but users of domain specific languages are rarely interested in language details and so these reference manuals are usually only intended to be read by people who know the language well (another learning problem is that domain specific languages often contain quirky features rarely seen in other languages; in the case of R I was not lucky enough to know enough other languages to cover all its quirky features).
I managed to one introduction to R written from the perspective of the programming language (and not the application domain): the original The Art of R Programming by Norman Matloff has been expanded and is now available as a book.
Summary. If you know another language and want to quickly learn about the languages features of R I recommend this book. I have not taught raw beginners for over 30 years and have no idea if this book would be of any use to them.
This book does not attempt to teach you to think ‘R’, it is not about the art of R programming. The value of this book is as a single source for a broad coverage of lots of language features explained using lots of examples. Yes, more time could have been spent on the organization and fixing inconsistencies in the layout; these are not show stoppers.
Some people might tell you to buy “Software for Data Analysis” by John Chambers. Don’t; if you are a fan of Finnegans Wake and are nostalgic for the mainframe world of the 1970s you might like to give it a go. (I think Bertrand Meyer’s “Object-oriented Software Construction” is still the best book about the design of a language).
Meanderings. What books are good examples of “The Art of …” writing for domain specific languages? Two that spring to mind are: “Algorithms in Snobol 4” by James Gimpel (still spotted from time to time on second hand book sites) and more recently “SQL For Smarties: Advanced SQL Programming” by Joe Celko.
Yes, I know that R is not really a domain specific language but a language that is primarily used in one domain. Frink is an example of a language containing a major behavior feature that is specific to its intended application domain. I cannot think of any major language feature of R that is specific to statistics.
Will language choice converge to a few?
Will the number of commonly used programming languages converge to a few that remain commonly used for ever, will there be many relatively common languages in use, or will the (relatively) commonly used languages change over time?
There are plenty of advantages to having one programming language that everybody uses for ever. English+local dialects seems to be heading towards becoming the World’s one native language, but the programming language world seems to be moving in the direction of diversification and perhaps even experiencing changing popularity of those in common use.
What are the forces that drive programing language usage?
Existing code. If a company wants to maintain and update its software products it needs to hire people to use the language they are written in. This is a force that maintains the status quo.
Existing programmer skills. When given the task of writing new software where language usage is not specified, developers are likely to pick a language they already know. In the case of group development the choice is made by group leaders. This is a force that maintains the status quo.
Fashion. Every field has fashions and programming language usage is no exception. Using a particular language can be seen as sexy, leading-edge, innovative, the next big-thing, etc. Given the opportunity some developers will chose to learn and write code in this language.
Desire to learn a new language. Some developers like to learn new things and this includes programming languages. Given the opportunity such developers will sometimes chose to learn and write code in a language they find interesting.
The cost of creating and implementing a new language continues to be within the reach of one individual who is willing to invest the considerable effort required. Hundreds, if not thousands of new languages have been created every year almost since computers were first invented. The only change here over the last 40 years is probably an increase in the number of new languages.
What has changed in the last 15 years is ease of transmission (e.g., a ubiquitous computing platform and the Internet) and the growth of the fashion industry (e.g., book publishers).
Computing is bathed in newness. New products, new chips, new gadgets, new software, new features, new and improved, the latest. What self respecting developer would want to be caught dead using a language invented before they were born?
Publishers need a continuous stream of new subjects that will drive customers to buy books. What better subject than a hot new programming language?
At the moment we seem to be living through a period of programming language usage divergence. Will this evolutionary trend continue or are we currently in the Cambrian explosion period of software engineering evolution?
What are the forces acting against the use of new languages? At the moment the only significant forces acting against the use of new languages are existing source and existing developer expertise. There are weaker forces, for instance, the worry that in the years to come it will be difficult to find developers to maintain existing software written in what has become an obscure language, but most software has a short lifetime and in many application domains this is not an issue. Whether the fashion for newness will eventually diminish enough to significantly slow the take-up of new languages remains to be seen.
How widely used is a language?
How widely used is a language? Nobody really knows and since there is nothing anybody can do to control usage (both IBM and the US DOD have tried) the question is probably of only academic interest.
Languages are used in a variety of ways and contexts, and it is possible that while one language currently occupies the greatest number of programmer hours, a different one has the greatest number of lines of code ever written in it, another the greatest number of lines of code currently in existence, and a fourth utilize the most CPU time.
Some languages are very popular for particular kinds of applications or within industries. For example, COBOL is still strong in corporate data centers; Fortran in engineering applications; C in embedded applications and operating systems; C++ and Java for writing desktop applications; Perl, PHP, etc. for web based applications.
There are various methods of measuring language popularity, each subject to a different bias over what is measured, that might be used, including:
- counting the number of job advertisements that mention the language. Money counts, so this may be a solid indicator. However, beware, companies like to paint a rosy picture and sometimes mention languages they don’t use just to attract people to apply for the job.
- measuring the financial value of companies making a living through selling tools for a particular language. As an ex-compiler writer I know that compared to applications software, compilers make very little money (compiler companies invariably go bust or get bought by a hardware vendor looking for leverage). Microfocus is the only (primarily) compiler based company to have grown to a significant size and exist independently for a long period of time
- the number of books sold that teach or describe the language. This will be biased towards the more recently popular languages.
- estimates of the number of existing lines of code written in the language—which will underestimate languages not often found in public searches, e.g., there is very little Cobol source available on the web (is this because the kind of people who write Cobol are not the kind to make it publicly available or are the applications so specialized that web distribution is not considered worthwhile?)
- counts of language references (i.e., to the name of the language) found using a web search engine.
Some of the above material exists in a section of a Wikipedia article I wrote some time ago.
Recent Comments