A trip down memory lane with Microsoft Word 1.1
Microsoft has donated the source code of Word for Windows 1.1a to the Computer history Museum and I have been rummaging around in this C code from the last 1980s.
What immediately struck me is how un-Microsoft Windows like the code looks, in some ways it looks more like Unix code. There are:
- very few
near
/far
/huge
pointer declarations. The Intel x86 architecture is based on segmented addressing of memory, a fact that most developers are blissfully unaware of because 32-bit segments are big enough to be able to ignore this fact; back in the day we had 16-bit segments and there were near pointers that could only point within a segment, far pointers that could point to the contents of other segments (there were alignment issues associated with these) and huge pointers which are essentially 32-bit pointers. Many developers obsessed over saving time/space and did their utmost to only use near pointers (who is ever going to need a data structure larger than 64k?) and Windows code was often littered with different kinds of pointer usage, - very few
#if
/#endif
. Why do developers use#if
/#endif
? They want to port their code to different versions of MS-DOS, use different vendor compilers and to use different third-party libraries. Microsoft were well known for not being backwards compatible with themselves (they have improved somewhat on that score) and obviously had no interest in using other vendor compilers and third-party libraries. Unix code of the day and today is of course packed with#if
/#endif
.
For any developer who has only used C since 2000 the most obvious code ‘quirk’ is probably the use of K&R style function definitions. This used to be the only way of defining functions until the first C Standard, in 1989, introduced function prototypes and the mass migration of the 1990’s occurred. The K&R style function definition looks like no big deal:
f() int a; struct T b; { /* blah blah */ } |
instead of:
f(int a, struct T b) { /* blah blah */ } |
until you find out that in the first case, when f
is called, there is no checking of the arguments against the parameters appearing in the definition. This lack of checking was/is a constant source of annoying faults that could/can take hours to track down; once a developer has experienced the benefits of automatic argument/parameter checking they soon convert to the prototype form.
The makefile included with the source contains compiler options that won’t be familiar to Windows developers. This is because a special purpose C compiler was used, one that generated P-code (the P here is for Packed not Portable). In the late 1980s, when this code was written, a 256K was considered a lot of memory for your average person’s computer and at +180K lines of code Word 1.1 was enormous. Word processors are on the whole not cpu intensive tasks and trading off a factor of 10 (I’m guessing, probably more) for a memory footprint saving of 2.5-4 (not so much of a guess) was obviously considered worthwhile (but then Microsoft apps have never been known for being nimble).
The extensive use of bit-fields is the other clue that memory is tight. Do this ever get used outside of comms software and embedded systems these days?
I was surprised at how clean and presentable this source code is. I should not have judged Microsoft developers by the horrors perpetrated by many developers targeting their platform.
Hack, a template for improving code reliability
My sole prediction for 2014 has come true, Facebook have announced the Hack language (if you don’t know that HHVM is the Hip Hop Virtual Machine you are obviously not a trendy developer).
This language does not follow the usual trend in that it looks useful, rather than being fashion fluff for corporate developers to brag about. Hack extends an existing language (don’t the Facebook developers know about not-invented-here?) by adding features to improve code reliability (how uncool is that) and stuff that will sometimes enable faster code to be generated (which has always been cool).
Well done Facebook. I hope this is the start of a trend of adding features to a language that help developers improve code reliability.
Hack extends PHP to allow programmers to express intent, e.g., this variable only ever holds integer values. Compilers can then check that the code follows the intent and flag when it doesn’t, e.g., a string is assigned to the variable intended to only hold integers. This sounds so trivial to be hardly worth bothering about, but in practice it catches lots of minor mistakes very quickly and saves huge amounts of time that would otherwise be spent debugging code at runtime.
Yes, Hack has added static typing into a dynamically typed language. There is a generally held view that static typing prevents programmers doing what needs to be done and that dynamic typing is all about freedom of expression (who could object to that?) Static typing got a bad name because early languages using it were too disciplinarian in a few places and like the very small stone in a runners shoe these edge cases came to dominate thinking. Dynamic languages are great for small programs and showing off to spotty teenagers students, but are expensive to maintain and a nightmare to work with on 10K+ line systems.
The term gradual typing is a good description for Hack’s type system. Developers can take existing PHP code and gradually give types to existing variables in a piecemeal fashion or add new code that uses types into code that does not. The type checker figures out what it can and does not get too upperty about complaining. If a developer can be talked into giving such a system a try they quickly learn that they can save a lot of debugging time by using it.
I would like to see gradual typing introduced into R, but perhaps the language does not cause its users enough grief to make this happen (it is R’s libraries that cause the grief):
- Compared to PHP’s quirks the R quirk’s are pedestrian. In the interest of balance I should point out that Javascript can at times be as quirky as PHP and C++ error messages can be totally incomprehensible to everybody (including the people who wrote the compiler).
- R programs are often small, i.e., 100 lines’ish. It is only when programs, written in dynamically typed languages, start to exceed around 10k+ lines that they start to fall in on themselves unless that one person who has everything in his head is there to hold it all up.
However, there is a sort of precedent: Perl programs tend to be short (although I don’t think they are as short as R) and it gradually introduced the option of stronger typing. But Perk did/does have one person who was the recognized language designer who could lead the process; R has a committee.
By now I ought to feel more knowledgeable about R
I was surprised to find recently that there are now over 15,000 lines of R code in the book I am working on. If I had written that much code in another ‘newly’ acquired language I would probably feel a lot more knowledgeable about it than I currently feel about R. Why don’t I feel more knowledgeable about R?
Those 15,000 lines are not all real lines, lots of cut-and-paste has been going on; yes, R is a cut-and-paste language just like Cobol and ‘web’ languages. ‘Real’ programmers often look down their noses at such languages, but that is just a failure on their part to understand what they are really all about. Perhaps I have written 5,000 actual lines of R, still a decent amount and half way to the 10,000 line minimum I ask newbies if they have reached.
An expert in a language should be able to pick up a random sample of code and to have been there, done that and got the t-shirt. I still regularly learn new stuff when reading other people’s code, so I’m still a long way from being an R expert. But then R is in the mold of a functional language and one characteristic of languages in this mold is that they provide umpteen different ways of doing the same thing. The combination of this language characteristic along with the lack of common culture in R usage (when this exists it significantly reduces the patterns of code usage commonly encountered) could mean that I am on the treadmill of forever and regularly learning new R coding techniques (which is great source for blog articles but gets tedious after a while); Perl is a lot like this.
As a compiler guy I’m used to learning a language by reading the language definition. Reading this document gives me a warm fuzzy feeling of knowing the language, this has nothing to do with being able to program in it and there is no way of knowing that I understood what the words meant. I was going to say that the R language definition was little more than some brief notes jotted down by somebody to be written up later, but checking the link to the page I discovered that somebody had been spending time significantly improving on what existed a few years ago; there is still a way to go but the R language definition is starting to look respectable. Hopefully my feeling of R knowledgeability will improve after I have read through this updated definition a few times.
Use of R is usually intimately bound up with the data being manipulated; on a per line of code basis much more so than other languages (in this regard it is like Cobol). Perhaps the need to have to learn lots more about the data than I normally have to adds to my feeling of not knowing. Would my feeling of knowledgeability increase if I worked with the same kind of data ll the time?
Recent Comments