The Shape of Code

About

Home > Uncategorized > Predicted impact of LLM use on developer ecosystems

Predicted impact of LLM use on developer ecosystems

August 10, 2025 Derek Jones Leave a comment Go to comments

LLMs are not going to replace developers. Next token prediction is not the path to human intelligence. LLMs provide a convenient excuse for companies not hiring or laying off developers to say that the decision is driven by LLMs, rather than admit that their business is not doing so well

Once the hype has evaporated, what impact will LLMs have on software ecosystems?

The size and complexity of software systems is limited by the human cognitive resources available for its production. LLMs provide a means to reduce the human cognitive effort needed to produce a given amount of software.

Using LLMs enables more software to be created within a given budget, or the same amount of software created with a smaller budget (either through the use of cheaper, and presumably less capable, developers, or consuming less time of more capable developers).

Given the extent to which companies compete by adding more features to their applications, I expect the common case to be that applications contain more software and budgets remain unchanged. In a Red Queen market, companies want to be perceived as supporting the latest thing, and the marketing department needs something to talk about.

Reducing the effort needed to create new features means a reduction in the delay between a company introducing a new feature that becomes popular, and the competition copying it.

LLMs will enable software systems to be created that would not have been created without them, because of timescales, funding, or lack of developer expertise.

I think that LLMs will have a large impact on the use of programming languages.

The quantity of training data (e.g., source code) has an impact on the quality of LLM output. The less widely used languages will have less training data. The table below lists the gigabytes of source code in 30 languages contained in various LLM training datasets (for details see The Stack: 3 TB of permissively licensed source code by Kocetkov et al.):

Language   TheStack  CodeParrot  AlphaCode  CodeGen  PolyCoder
HTML        746.33     118.12
JavaScript  486.2       87.82       88        24.7     22
Java        271.43     107.7       113.8     120.3     41
C           222.88     183.83                 48.9     55
C++         192.84      87.73       290.5     69.9     52
Python      190.73      52.03       54.3      55.9     16
PHP         183.19      61.41       64                 13
Markdown    164.61      23.09
CSS         145.33      22.67
TypeScript  131.46      24.59       24.9                9.2
C#          128.37      36.83       38.4               21
GO          118.37      19.28       19.8      21.4     15
Rust         40.35       2.68        2.8                3.5
Ruby         23.82      10.95       11.6                4.1
SQL          18.15       5.67
Scala        14.87       3.87        4.1                1.8
Shell         8.69       3.01
Haskell       6.95       1.85
Lua           6.58       2.81        2.9
Perl          5.5        4.7
Makefile      5.09       2.92
TeX           4.65       2.15
PowerShell    3.37       0.69
FORTRAN       3.1        1.62
Julia         3.09       0.29
VisualBasic   2.73       1.91
Assembly      2.36       0.78
CMake         1.96       0.54
Dockerfile    1.95       0.71
Batchfile     1          0.7
Total      3135.95     872.95      715.1     314.1    253.6

The major companies building LLMs probably have a lot more source code (as of July 2023, the Software Heritage had over 1.6*10^10 unique source code files); this table gives some idea of the relative quantities available for different languages, subject to recency bias. At the moment, companies appear to be training using everything they can get their hands on. Would LLM performance on the widely used languages improve if source code for most of the 682 languages listed on Wikipedia was not included in their training data?

Traditionally, developers have had to spend a lot of time learning the technical details about how language constructs interact. For the first few languages, acquiring fluency usually takes several years.

It’s possible that LLMs will remove the need for developers to know much about the details of the language they are using, e.g., they will define variables to have the appropriate type and suggest possible options when type mismatches occur.

Removing the fluff of software development (i.e., writing the code) means that developers can invest more cognitive resources in understanding what functionality is required, and making sure that all the details are handled.

Removing a lot of the sunk cost of language learning removes the only moat that some developers have. Job adverts could stop requiring skills with particular programming languages.

Little is currently known about developer career progression, which means it’s not possible to say anything about how it might change.

Since they were first created, programming languages have fascinated developers. They are the fashion icon of software development, with youngsters wanting to program in the latest language, or at least not use the languages used by their parents. If developers don’t invest in learning language details, they have nothing language related to discuss with other developers. Programming languages will cease to be a fashion icon (cpus used to be a fashion icon, until developers did not need to know details about them, such as available registers and unique instructions). Zig could be the last language to become fashionable.

I don’t expect the usage of existing language features to change. LLMs mimic the characteristics of the code they were trained on.

When new constructs are added to a popular language, it can take years before they start to be widely used by developers. LLMs will not use language constructs that don’t appear in their training data, and if developers are relying on LLMs to select the appropriate language construct, then new language constructs will never get used.

By 2035 things should have had time to settle down and for the new patterns of developer behavior to be apparent.

Categories: Uncategorized Tags: ecosystems, evolution, Language usage, LLM, programming language

Comments (2) Trackbacks (0) Leave a comment Trackback

Allan Kelly

August 14, 2025 17:07 | #1

Reply | Quote

Thanks Derek, your thinking aligns with mine but given the AI hype at the moment I have been shy about saying much!

While I can see how an AI can predicte the next line of code you want to write, and it can even kind-of-copy systems which are out there how is it going to design and write a system which is unique and innovative?

(I read recently that Bill Gates had made a similar point about programming not going away even though it will change.)

Friend of mine wrote a conference submission system in Perl a few years back. He made it open source and it continues to be used by the niche conference he wrote it far. He recently tried to update it with help from an LLM. After a few prompts he realised it was giving him his own code back. Nobody except had written a system that did what he needed it to do in Perl so he got his own code back.

A couple of our friends, ACCU’ers, have been working at a hedge frund recently. They told me that they can’t use LLMs because the fund uses its own propriety language therefore there is no LLM that supports it (and probably not enough code to train it on.)

Continuing your point about language use further. If LLM coding becomes the norm then we may never see another computer language because any new language would like the code base to train an LLM on. Whether that is a good or a bad thing is an open question but it also demonstrates that LLM will kill off one source of innovation. Presumably there are other cases were they will kill innovation too.
Derek Jones

August 15, 2025 00:56 | #2

Reply | Quote

@Allan Kelly
The spread of automobiles severely curtailed innovation in horse-drawn carriages, but do we care?
Inventing languages is a fun thing to do, and almost all are based on recycled ideas which the person involved thinks are new. I’m sure that people will continue to invent ‘new’ languages, but industry will have fewer in-production languages to worry about.
The discussion around LLM usage has ignored the issue of the need for these companies to eventually make money. The profitability of AI companies and the likelihood of future profitability is starting to be questioned. This might change if these companies started focusing on producing more efficient models, rather than improving their benchmark rankings.

No trackbacks yet.

Positive and negative descriptions of numeric data Impact of developer uncertainty on estimating probabilities

The Shape of Code

Predicted impact of LLM use on developer ecosystems

Recent Posts

Recent Comments

Archives

Meta