Changing development culture and practices: LLM edition
The popular perception of creating software systems is that it mainly involves writing code. In the 1950s, management treated writing code as a clerical task that just mapped the detailed requirements specified by someone with knowledge of the problem to something a computer could execute. Job titles reflected this division of labour, e.g., coder/programmer, systems analyst (the Wikipedia entry lists implementation as part of the job, this eventually became true in theory and for many was probably true in practice since the early days).
Using Large Language Models to write code based on the requirements contained in a prompt appears to take software development back to the process mandated by the managers of early software projects.
A major economic incentive for the creation of software systems is enabling more efficient work processes, with the collateral damage of decimated employment in some work functions. This happened to clerical workers and non-software engineering workers. Now it’s happening to software developers.
Hardware designers did not cease to exist once Computer-aided design became available. Technical drawing skills (larger schools once had a room full of drawing boards for teaching young teenagers) has ceased to be a job requirement (image from Wikipedia).
Software developer will remain as a job category, perhaps with reduced numbers or with reduced average pay. But the use of LLMs will change the culture and practices of software development.
The shift from using assembly language to high level languages suggests a few ideas about the kinds of changes. Using assembly language requires being reasonably familiar with the cpu architecture, e.g., register names/widths/instruction-restrictions and instruction timings. General developer chat about cpu architectures was still a thing in the 1980s, less so in the 1990s, and very rarely today (people do blog about it). Several decades from now, what will no longer be a general topic of developer conversation? Data types, perhaps; like registers, bit pattern representation is a low level detail. Since most developers don’t know much about the languages they use, it may be difficult to measure the impact of LLM usage on language knowledge.
High-level languages increase developer productivity by reducing the number of details that need to be thought about, at the cost of less efficient code. But for many applications, machine time is cheaper than human time.
LLMs increase developer productivity by reducing the need to lookup details (e.g., spelling of method names and their parameters). As confidence grows in the accuracy of LLM suggested code, developers will start accepting, whatever. What counts is whether the code works, not whether the average developer would have written something faster/smaller/idiomatic.
The early languages have a straightforward mapping from statements/declarations to machine code. Over time, languages were created that allowed developers to think less and less about implementation details, at the cost of supporting constructs that could introduce lots of hidden overhead. I expect that customer demand will incentivize LLM functionality that reduces what developers need to think about.
A real danger of LLM usage is that it will, eventually, result in programs a lot more bloated than humans have managed to achieve. There are physical constraints restricting what hardware designers can do, and these constraints show up in patterns of behavior, e.g., Rent’s rule relating the number of external connections in a logic block to the number of logic gates in the block. There are common usage patterns in existing code, but no theory suggesting they are desirable, or not, in any sense. I await having enough LLM generated production code to make statistically significant measurements.
I suspect that these days most developers are writing glue code, or short programs, and in the near term I expect that most LLM code will fill this need. Unfortunately, there is very little research/measurement on glue code/short program, so there are no known developer usage patterns to compare LLMs against.
Evidence for 28 possible compilers in 1957
In two earlier posts I discussed the early compilers for languages that are still widely used today and a report from 1963 showing how nothing has changed in programming languages
The Handbook of Automation Computation and Control Volume 2, published in 1959, contains some interesting information. In particular Table 23 (below) is a list of “Automatic Coding Systems” (containing over 110 systems from 1957, or which 54 have a cross in the compiler column):
Computer System Name or Developed by Code M.L. Assem Inter Comp Oper-Date Indexing Fl-Pt Symb. Algeb. Acronym IBM 704 AFAC Allison G.M. C X Sep 57 M2 M 2 X CAGE General Electric X X Nov 55 M2 M 2 FORC Redstone Arsenal X Jun 57 M2 M 2 X FORTRAN IBM R X Jan 57 M2 M 2 X NYAP IBM X Jan 56 M2 M 2 PACT IA Pact Group X Jan 57 M2 M 1 REG-SYMBOLIC Los Alamos X Nov 55 M2 M 1 SAP United Aircraft R X Apr 56 M2 M 2 NYDPP Servo Bur. Corp. X Sep 57 M2 M 2 KOMPILER3 UCRL Livermore X Mar 58 M2 M 2 X IBM 701 ACOM Allison G.M. C X Dec 54 S1 S 0 BACAIC Boeing Seattle A X X Jul 55 S 1 X BAP UC Berkeley X X May 57 2 DOUGLAS Douglas SM X May 53 S 1 DUAL Los Alamos X X Mar 53 S 1 607 Los Alamos X Sep 53 1 FLOP Lockheed Calif. X X X Mar 53 S 1 JCS 13 Rand Corp. X Dec 53 1 KOMPILER 2 UCRL Livermore X Oct 55 S2 1 X NAA ASSEMBLY N. Am. Aviation X PACT I Pact Groupb R X Jun 55 S2 1 QUEASY NOTS Inyokern X Jan 55 S QUICK Douglas ES X Jun 53 S 0 SHACO Los Alamos X Apr 53 S 1 SO 2 IBM X Apr 53 1 SPEEDCODING IBM R X X Apr 53 S1 S 1 IBM 705-1, 2 ACOM Allison G.M. C X Apr 57 S1 0 AUTOCODER IBM R X X X Dec 56 S 2 ELI Equitable Life C X May 57 S1 0 FAIR Eastman Kodak X Jan 57 S 0 PRINT I IBM R X X X Oct 56 82 S 2 SYMB. ASSEM. IBM X Jan 56 S 1 SOHIO Std. Oil of Ohio X X X May 56 S1 S 1 FORTRAN IBM-Guide A X Nov 58 S2 S 2 X IT Std. Oil of Ohio C X S2 S 1 X AFAC Allison G.M. C X S2 S 2 X IBM 705-3 FORTRAN IBM-Guide A X Dec 58 M2 M 2 X AUTOCODER IBM A X X Sep 58 S 2 IBM 702 AUTOCODER IBM X X X Apr 55 S 1 ASSEMBLY IBM X Jun 54 1 SCRIPT G. E. Hanford R X X X X Jul 55 Sl S 1 IBM 709 FORTRAN IBM A X Jan 59 M2 M 2 X SCAT IBM-Share R X X Nov 58 M2 M 2 IBM 650 ADES II Naval Ord. Lab X Feb 56 S2 S 1 X BACAIC Boeing Seattle C X X X Aug 56 S 1 X BALITAC M.I.T. X X X Jan 56 Sl 2 BELL L1 Bell Tel. Labs X X Aug 55 Sl S 0 BELL L2,L3 Bell Tel. Labs X X Sep 55 Sl S 0 DRUCO I IBM X Sep 54 S 0 EASE II Allison G.M. X X Sep 56 S2 S 2 ELI Equitable Life C X May 57 Sl 0 ESCAPE Curtiss-Wright X X X Jan 57 Sl S 2 FLAIR Lockheed MSD, Ga. X X Feb 55 Sl S 0 FOR TRANSIT IBM-Carnegie Tech. A X Oct 57 S2 S 2 X IT Carnegie Tech. C X Feb 57 S2 S 1 X MITILAC M.I.T. X X Jul 55 Sl S 2 OMNICODE G. E. Hanford X X Dec 56 Sl S 2 RELATIVE Allison G.M. X Aug 55 Sl S 1 SIR IBM X May 56 S 2 SOAP I IBM X Nov 55 2 SOAP II IBM R X Nov 56 M M 2 SPEED CODING Redstone Arsenal X X Sep 55 Sl S 0 SPUR Boeing Wichita X X X Aug 56 M S 1 FORTRAN (650T) IBM A X Jan 59 M2 M 2 Sperry Rand 1103A COMPILER I Boeing Seattle X X May 57 S 1 X FAP Lockheed MSD X X Oct 56 Sl S 0 MISHAP Lockheed MSD X Oct 56 M1 S 1 RAWOOP-SNAP Ramo-Wooldridge X X Jun 57 M1 M 1 TRANS-USE Holloman A.F.B. X Nov 56 M1 S 2 USE Ramo-Wooldridge R X X Feb 57 M1 M 2 IT Carn. Tech.-R-W C X Dec 57 S2 S 1 X UNICODE R Rand St. Paul R X Jan 59 S2 M 2 X Sperry Rand 1103 CHIP Wright A.D.C. X X Feb 56 S1 S 0 FLIP/SPUR Convair San Diego X X Jun 55 SI S 0 RAWOOP Ramo-Wooldridge R X Mar 55 S1 1 8NAP Ramo-Wooldridge R X X Aug 55 S1 S 1 Sperry Rand Univac I and II AO Remington Rand X X X May 52 S1 S 1 Al Remington Rand X X X Jan 53 S1 S 1 A2 Remington Rand X X X Aug 53 S1 S 1 A3,ARITHMATIC Remington Rand C X X X Apr 56 SI S 1 AT3,MATHMATIC Remington Rand C X X Jun 56 SI S 2 X BO,FLOWMATIC Remington Rand A X X X Dec 56 S2 S 2 BIOR Remington Rand X X X Apr 55 1 GP Remington Rand R X X X Jan 57 S2 S 1 MJS (UNIVAC I) UCRL Livermore X X Jun 56 1 NYU,OMNIFAX New York Univ. X Feb 54 S 1 RELCODE Remington Rand X X Apr 56 1 SHORT CODE Remington Rand X X Feb 51 S 1 X-I Remington Rand C X X Jan 56 1 IT Case Institute C X S2 S 1 X MATRIX MATH Franklin Inst. X Jan 58 Sperry Rand File Compo ABC R Rand St. Paul Jun 58 Sperry Rand Larc K5 UCRL Livermore X X M2 M 2 X SAIL UCRL Livermore X M2 M 2 Burroughs Datatron 201, 205 DATACODEI Burroughs X Aug 57 MS1 S 1 DUMBO Babcock and Wilcox X X IT Purdue Univ. A X Jul 57 S2 S 1 X SAC Electrodata X X Aug 56 M 1 UGLIAC United Gas Corp. X Dec 56 S 0 Dow Chemical X STAR Electrodata X Burroughs UDEC III UDECIN-I Burroughs X 57 M/S S 1 UDECOM-3 Burroughs X 57 M S 1 M.I.T. Whirlwind ALGEBRAIC M.I.T. R X S2 S 1 X COMPREHENSIVE M.I.T. X X X Nov 52 Sl S 1 SUMMER SESSION M.I.T. X Jun 53 Sl S 1 Midac EASIAC Univ. of Michigan X X Aug 54 SI S MAGIC Univ. of Michigan X X X Jan 54 Sl S Datamatic ABC I Datamatic Corp. X Ferranti TRANSCODE Univ. of Toronto R X X X Aug 54 M1 S Illiac DEC INPUT Univ. of Illinois R X Sep 52 SI S Johnniac EASY FOX Rand Corp. R X Oct 55 S Norc NORC COMPILER Naval Ord. Lab X X Aug 55 M2 M Seac BASE 00 Natl. Bur. Stds. X X UNIV. CODE Moore School X Apr 55 |
Chart Symbols used:
Code R = Recommended for this computer, sometimes only for heavy usage. C = Common language for more than one computer. A = System is both recommended and has common language. Indexing M = Actual Index registers or B boxes in machine hardware. S = Index registers simulated in synthetic language of system. 1 = Limited form of indexing, either stopped undirectionally or by one word only, or having certain registers applicable to only certain variables, or not compound (by combination of contents of registers). 2 = General form, any variable may be indexed by anyone or combination of registers which may be freely incremented or decremented by any amount. Floating point M = Inherent in machine hardware. S = Simulated in language. Symbolism 0 = None. 1 = Limited, either regional, relative or exactly computable. 2 = Fully descriptive English word or symbol combination which is descriptive of the variable or the assigned storage. Algebraic A single continuous algebraic formula statement may be made. Processor has mechanisms for applying associative and commutative laws to form operative program. M.L. = Machine language. Assem. = Assemblers. Inter. = Interpreters. Compl. = Compilers. |
Are the compilers really compilers as we know them today, or is this terminology that has not yet settled down? The computer terminology chapter refers readers interested in Assembler, Compiler and Interpreter to the entry for Routine:
“Routine. A set of instructions arranged in proper sequence to cause a computer to perform a desired operation or series of operations, such as the solution of a mathematical problem.
…
Compiler (compiling routine), an executive routine which, before the desired computation is started, translates a program expressed in pseudo-code into machine code (or into another pseudo-code for further translation by an interpreter).
…
Assemble, to integrate the subroutines (supplied, selected, or generated) into the main routine, i.e., to adapt, to specialize to the task at hand by means of preset parameters; to orient, to change relative and symbolic addresses to absolute form; to incorporate, to place in storage.
…
Interpreter (interpretive routine), an executive routine which, as the computation progresses, translates a stored program expressed in some machine-like pseudo-code into machine code and performs the indicated operations, by means of subroutines, as they are translated. …”
The definition of “Assemble” sounds more like a link-load than an assembler.
When the coding system has a cross in both the assembler and compiler column, I suspect we are dealing with what would be called an assembler today. There are 28 crosses in the Compiler column that do not have a corresponding entry in the assembler column; does this mean there were 28 compilers in existence in 1957? I can imagine many of the languages being very simple (the fashionability of creating programming languages was already being called out in 1963), so producing a compiler for them would be feasible.
The citation given for Table 23 contains a few typos. I think the correct reference is: Bemer, Robert W. “The Status of Automatic Programming for Scientific Problems.” Proceedings of the Fourth Annual Computer Applications Symposium, 107-117. Armour Research Foundation, Illinois Institute of Technology, Oct. 24-25, 1957.
Recent Comments