Archive

Archive for August, 2015

Eye-tracking of developers reading code is now in start-up mode

August 26, 2015 No comments

Readability has always been the meaningless go to attribute for designers of new languages and code restructuring techniques that needed a worthy sounding benefit to tout.

Market researchers, being more interested in empirical data than arm-waving, have been long time users of eye-tracking technology; gaze direction providing a direct link to where cognitive attention is being invested.

Over the last few years, a small number of researchers have started measuring where software developers look when they read code. Analysing and interpreting data on eye movement while reading code is still in start-up mode. One group has started collecting data that others can use, the obligatory R packages (saccade, gazepath and itsadug) and Python library now exist, and the eye-movement in programming conference has its third meeting in November.

Apart from one tantalizing image (see below, code+data here, original research paper+data) my book should arrive too soon to say anything useful about code readability based on eye-tracking data.

Heatmap of eye movement around code

It has taken several decades for researchers to create reasonably reliable models of attention and eye-movement for reading text. Code reading adds vertical eye-movement to the horizontal movements that occur when reading text; the models are probably going to be a lot more complicated. I discussed a few of the issues in my first book (the E-Z Reader model is still one of the top performers).

Accurately tracking eye motion during software development is technically difficult. Until recently obtaining the necessary accuracy required keeping the subject’s head fixed (achieved by having subjects clamp their teeth on a bite bar); somewhat impractical for developers wanting to view a large screen. Accurately tracking what developers are looking at requires tracking both head and eye motion. The necessary hardware is coming down in price, but still contains one too many zeroes for me to buy one to play with (I was given an Intel RealSense at a hackathon, now I just need the software…).

Next time somebody claims that so-and-so is more readable, ask them what eye-tracking research has to say on the subject.

2015 in the programming language standard’s world

August 18, 2015 No comments

Last Tuesday I was at the British Standards Institute for a meeting of IST/5, the programming languages committee.

My suggestion that the Cobol 2014 standard may be the last revision of that language appears to be coming true; there has been a steep decline in membership of the US Cobol committee (this is where all the work is done, with the rest of the world joining this committee or rubber stamping what comes out of it), and nobody has expressed interest in being involved in new work items.

Fortran appears to be going strong, with a revised standard planned for 2017.

In October C++ are rectifying the fact that they have not meet in Hawaii for three years. In fairness I ought to point out that the Fortran committee, when hosted by INCITS/PL22.3, regularly hold meetings in Las Vegas (I’m told its because the hotel rooms are cheap; Nevada is where US underground atom bomb tests were located and lots of super-computers executing programs written in Fortran were involved, or perhaps readers can think of an alternative explanation that does not invoke secret government organizations).

I found out that PL/1 is still an ISO Standard.

Work on the C and Ada standards continues.

Prolog has a new convenor, Ulrich Neumerkel. There was a meeting during April in Dresden, Germany but no minutes have been published. Did anybody attend?

ISO/IEC 23360-1:2006, the ISO version of the Linux Base Standard is almost 10 years behind the specification published by the organization who actually does the work. Some voices have expressed an interest in updating the ISO document. What does ISO’s version of the Linux Standard Base have to do with the committee responsible for programming languages?

Well, a long time ago in a galaxy far away, or the late 1980s in London, some people decided to set up a committee that specified O/S related library functions callable from C programs. SC22, programming languages, was the existing ISO committee having the closest fit with this new working group; initially it produced a specification that went under the name POSIX. Jump forward 15 years and Linux was the big POSIX success story (ok, the Linux people might see things differently) and dare I suggest that one of the motivations for creating ISO/IEC 23360-1:2006 might have been to bask in the reflected success of Linux. I understand the motivation of people involved in the standard’s process for wanting to published an update that reflects the current state of play (seriously out of date standards degrade the brand), but I don’t see why the Linux Foundation would be interested in going through the hassle of making this happen (unless they are having a mid-life crisis and are seeking approval of their work from an authority figure). Watch this blog for a 2016 status update.

Categories: Uncategorized Tags: , , , ,

Adding house numbers to Open Street Map

August 12, 2015 No comments

Team OSM-house-numbers (Pavel and yours truly) was at the Open Street Map London hack weekend a few days ago.

When Phyllis Pearsall was out walking the streets of London in the 1930s gathering information for her Geographer’s A-Z London street map she recorded house numbers and included this information on her maps. House number information is included in OSM data when people have added it. Is there a way of automatically adding this information in bulk?

The UK Land Registry maintains a database of house sales; the information includes postcode, street and house number. The database is available under the Open Government License, which is compatible with the OSM license.

It is straight-forward to match all house sales having the same postcode/street to obtain a min/max house number for a postcode/street.

The first half of a UK postcode specifies a large area or district (e.g., GU14 is my district code), while the second half has a granularity of around a quarter of a mile or less (depending on housing density).

It was decided that house numbers on a map become useful when streets are long enough, where long enough is defined as containing houses having different postcodes. Assuming that street names are unique within a given postcode district, filtering out not-long enough streets was trivial.

The Land registry started recording sales in 1995 and it is possible that some streets are not considered to be long enough because they contain houses that have not been sold within the last 20 years; this problem will also affect the min/max value of some house number ranges.

To tie this postcode/street information to Open Street Map data we need latitude/longitude information.

Information on house number locations is very useful to governments and in the UK is collected by our national mapping body, the Ordinance Survey, who like all UK government bodies have a long history of being loath to make information available for general public use. The current situation, according to Wikipedia, is that the Ordinance Survey mapping from postcode to latitude/longitude is available as open data.

Adding information for postcode lat/long and feeding everything into a webpage produces a map such as the one below (opposite sides of the street having different postcodes is plainly visible):

House numbers at the low end of

We cannot guarantee that the house number data we have created is 100% accurate; there may be mistakes in our code or the Land registry/Ordinance Survey data we processed. Experienced OSM hackers at the event told us about minor mistakes in automatically generated data, that had occurred in the past and had a disproportionate impact on user confidence in OSM accuracy. So we did not upload our data to OSM; you can find it on github (saved in compressed form to reduce download time), along with the code used to create it.

The hackathon finished at five, with people decamping to a local pub. We were more or less done by three. What next (perhaps for another hackathon or a dedicated OSM hacker)?

What is needed is a simple way to overlay house number range information on an OSM image which can be easily used by people with local knowledge to confirm whether it is correct or not, with the data being added to OSM if it is correct.

Other possible OSM uses for the Land registry data include estimating density of houses along a street, e.g., number-of-unique-house-numbers divided by distance-between-adjacent-postcodes and perhaps even a house price heatmap (ok, that’s a bit specialized).

What about trees? This is my tree hugging side showing itself. If you plan to go out mapping house numbers, don’t forget to map the trees!

How will C code in 2045 look different from today?

August 5, 2015 No comments

What constructs will be in such common use in C source code written by developers in 2045 that people looking at C written in 2015 will know it comes from a much earlier era (a previous post looked back at C written in 1986)?

C is a high level language that allows developers to get close to the hardware, so to get some idea of what everyday C might be like in 2045 we have to ask what everyday hardware will be like 10-20 years from now (the C standard committee waits for hardware feature to become established before adding features to support them).

I think the following hardware trends will have a big impact on the future appearance of C source code:

Power consumption: Runtime performance is an integral part of the design of C. In the past performance has been about program execution time and/or memory usage; the spread of mobile computing has created a third strand: electrical power consumption. A variety of techniques have been proposed for reducing program power consumption, including: type specifiers that enable developers to tell the compiler accuracy can be traded off against power in calculations involving a given variable and scaling cpu voltage/frequency in non-time critical code (researchers are currently trying to do this without developer involvement, but a storage/type specifier like register or inline would provide useful information to the compiler),

Unreliable hardware: running hardware at lower voltages (to reduce power consumption) increases the probability of noise having an effect on program output, as does use of smaller line widths in cpu fabrication (more chips per die increases manufacturer profits). Proposed solutions include adding type specifiers to variables that can tolerate holding approximate values or more making probabilistic assertions.

Non-volatile memory: Like most languages C has an implicit model of programs sitting on a slow storage device, e.g., hard disk, and being loaded into very fast storage for execution. Non-volatile storage could have a very dramatic impact on this view of the world. For years gaming consoles have stored code+data as a memory image in ROM for rapid loading, but being able to write to storage that is only an order of magnitude slower than main memory opens up all sorts of interesting opportunities. The concept of named address spaces defined in Programming languages – C – Extensions to support embedded processors is waiting to expand out of its current niche of C on embedded processors.

There is at least one language construct that is likely to be rarely seen by developers working in 2045: inline. The reason that today’s developers have been given the ability to define functions inline is that compilers are not yet good enough to reliably make good function inlining decisions, rather like they were not good enough to reliability make good register allocation decisions 30 years ago (ok, register can still be useful for developers using weird and wonderful processor architectures or brain dead compilers).

I have not yet said anything about parallel processing or multiprocessor hardware. The C11 Standard updated C99 to provide generic support (i.e., _Atomic plus associated sequence point wording updates and the threads library) for this kind of hardware. Support for a specific parallel/multiprocessor model will happen if a specific model becomes the industry standard (rather like IEEE floating-point not being anointed by C90 because it was not yet what every hardware vendor used; other formats were on their last legs and by C99 could be treated as dead).