May 2, 2012 Derek Jones 7 comments

The European Court of Justice has published its decision in SAS v WPL; the title of the press release says it all “The functionality of a computer program and the programming language cannot be protected by copyright”. To summarise the background, World Programming Ltd developed a system that was capable of emulating the input/output behavior of programs written in what the SAS Institute Inc were claiming to be their copyrighted scripting language, along with various file formats.

According to the Court of Justice, “the Court holds that neither the functionality of a computer program nor the programming language and the format of data files used in a computer program in order to exploit certain of its functions constitute a form of expression. Accordingly, they do not enjoy copyright protection.”

This EU ruling is not quiet what it seems. The SAS v WPL case is before the High Court in London and the EU Court of Justice has been asked for advice based on European Law. So the UK dispute has not yet been decided, but given that the UK is signed up to adhere by EU laws people who know about the legal stuff seem to think the High Court in London will follow the EU ruling. Assuming this, then…

This ruling is not just bad news for SAS, it is also bad news for their competitors. Competition is likely to lead to better/cheaper products for users of the SAS language, resulting in less incentive for them to move to an alternative (the R language included; incidentally what exactly are The R Foundation for Statistical Computing claiming copyright over in that notice that pops up when R is started?)

The Oracle vs. Google Java API lawsuit involves similar territory. There are plenty of details over at Groklaw and I’m not going to go there.

This ruling makes it much more likely that behave-alike implementations of more ‘corporate languages’ will be created, at least in Europe. Previously the threat of a lawsuit would have been enough to deter most people, irrespective of whether what they wanted to do was legal or not.

What languages might we see implemented any time soon? The one that immediately springs to mind for me is Mathematica, which is the leader in its field and a fork of Maxima that supported the Mathematica language would move it out of the ghetto. Octave and Matlab are already very close, so no change there.

I imagine there are corporate languages scattered over every conceivable application domain. A lot of these domains will be sufficiently specialized that there is a very low probability of somebody creating an open source implementation; if it looks like there is money to be made it has become more likely that an alternative commercial implementation will be created.

It looks like being a compiler writer is back as flavor of the month again 🙂

Categories: Uncategorized Tags: compiler writer, court case, Mathematica, Maxima, R, SAS

Variations in the literal representation of Pi

March 12, 2010 Derek Jones No comments

The numbers system I am developing attempts to match numeric literals contained in a file against a database of interesting numbers. One of the things I did to quickly build a reasonably sized database of reliable values was to extract numeric literals from a few well known programs that I thought I could trust.

R is a widely used statistical package, and Maxima is a computer algebra system with a long history. Both contain a great deal of functionality and are actively maintained.

To my surprise, the source code of both packages contain a large variety of different literal values for , or to be exact, the number of digits contained in the literals varied by more than I expected. In the following table, the value to the left of the representation is the number of occurrences; values listed in increasing literal order:

     Maxima                              R
   2 3.14159
                                      14 3.141592
   1 3.1415926
   1 3.14159265                        2 3.14159265
   3 3.1415926535
   4 3.14159265358979
  14 3.141592653589793
   3 3.1415926535897932385             3 3.1415926535897932385
   9 3.14159265358979324
                                       1 3.14159265359
                                       1 3.1415927
                                       1 3.141593

The comments in the Maxima source led me to believe that some thought had gone into ensuring that the numerical routines were robust. Over 3/4 of the literal representations of have a precision comparable to at least that of 64-bit floating-point (I’m assuming an IEEE 754 representation in this post).

In the R source, approximately 2/3 of the literal representations of have a precision comparable to that of 32-bit floating-point.

Closer examination of the source suggests one reason for this difference. Both packages make heavy use of existing code (translated from Fortran to Lisp for Maxima and from Fortran to C for R); using existing code makes good sense and because of its use in scientific and engineering applications many numerical libraries have been written in Fortran. Maxima has adapted the slatec library, whereas the R developers have used a variety of different libraries (e.g., specfun).

How important is variation in the representation of Pi?

A calculation based on a literal that is only accurate to 32-bits is likely to be limited to that level of accuracy (unless errors cancel out somewhere).
Inconsistencies in the value used to represent Pi are a source of error. These inconsistencies may be implicit, for instance literals used to denote a value derived from such as often seem to be based on more precise values of Pi than appear in the code.

The obvious solution to this representation issue of creating a file containing definitions of all the frequently used literal values has possible drawbacks. For instance, numerical accuracy is a strange beast, and increasing the precision of one literal without doing the same for other literals appearing in a calculation can sometimes reduce the accuracy of the final result.

Pulling together existing libraries to build a package is often very cost-effective, but numerical accuracy is a slippery beast, and this inconsistent usage of literals suggests that developers from these two communities have not addressed the system level consequences of software reuse.

Update 6 April: After further rummaging around in the R source distribution, I found that things are not as bad as they first appear. Only two of the single precision instances of listed above occur in the C or Fortran source code, the rest appear in support files (e.g., m4 scripts and R examples).

Categories: Uncategorized Tags: floating-point, Fortran, Maxima, Pi, R, testing

The Shape of Code

Archive

EU rules that computer languages cannot be copyrighted

Variations in the literal representation of Pi

Recent Posts

Recent Comments

Archives

Meta