Specification based programming
The use of LLM to write software has focused on integrating them within existing practices, i.e., using LLMs as very fancy auto-completers for chunks of code or functionality. This use is programming by conversation, or less politely, programming by stream of thought. The term vibe-coding creates an illusion of trendiness; after all, software engineering is a hedonistic activity.
With vibe-code on top of vibe-code on top of vibe-code, refactoring becomes a complete rewrite, at least in theory. A rewrite assumes that it’s possible to extract a specification that is complete and accurate enough to recreate the software. A lot of software has a short lifetime, so a major rewrite may never be needed. However, for software that is expected to have a long life, management are going to want a more controlled/structured/repeatable approach.
LLMs’ ability to write software is now good enough to support a more controlled/structured/repeatable approach: Programming by specification. That is a specification of the desired behavior is given to one or more LLMs, which use it to generate the appropriate software.
The human input to the program creation process is via the specification.
Features are changed/added/removed by updating the specification. Bugs are fixed by updating the specification. If there are mistakes in the generated code, the specification has to work around them, in the same way that compiler bugs have to be worked around.
Business logic can be expressed as a specification, which is how application domain experts, who are not programmers, are able to create minimal viable products using LLMs.
How might a specification be created?
Agile has taught the lesson that software creation is an iterative process. Requiring a complete specification before coding starts is the stuff of armchair project managers.
One possible specification iteration process starts with a basic outline specification of what is required, and is followed by the following cycle:
- Using the current specification, developer+LLM produces code. Perhaps particular functionality is implemented, or the work continues for some amount of time, or etc,
- the transcript of the LLM conversation is used to create an updated specification of the code that exists when work stopped. Conversations involving code that came and went is not part of the updated specification, although logging it for future reference costs little,
- a new version of all the software covered by the updated specification is generated. This can be tested using existing tests and also by differential testing using multiple implementations created from the same specification (a recent paper generated five implementations in different languages),
- if more functionality is needed, go to step 1.
Specifications share many characteristics with source code. They can be split up and organized into modules/packages/components/phases, as was done for this LLM generated C compiler.
LLM generated code is more verbose than human generated code, just like the machine code generated by early compilers.
Open source projects could soon just be making the specification available. Why ship the source code generated from a specification, projects don’t ship the assembler code generated by compilers, they ship the original source code. However, given the current reliability of LLM source code generation, they are benefits to making the generated source of at least one implementation available (as a kind of checksum).
Reduced implementation costs, using LLMs, make it possible to create programs containing more functionality (Jevrons paradox in action). This in turned leads to specifications becoming larger, complicated and poorly organized, just like source code.
English usage is full of ambiguities. This ambiguity can be reduced by using a controlled language. If specification programming becomes popular, it’s easy to imagine the invention of controlled languages becoming as popular as the invention of programming languages. In 1957, there were compilers for at least 28 programming languages.
Specification based programming is a continuation of the trend of computers handling more of the details involved in program creation, with the program creation process requiring less and less knowledge about computers. Increasing amounts of computer time are spent to reduce or eliminate developer time.
Programming has evolved from physically connecting subsystems by cables to specify the flow of bits in a punch card computer, to a sequence of machine code instructions executed by a stored-program computer, then high-level programming languages reducing the need to know lots of details about the underlying cpu (details that remain include: number of bits in the integer types and type compatibility rules).
Specification based programming requires discipline, and I don’t expect it to be popular. I expect multiple LLM-derived project disasters need to occur before there are any significant changes to the current LLM approaches to software development.
Recent Comments