December 11, 2022 Derek Jones No comments

This year’s list of books for Christmas, or Isaac Newton’s birthday (in the Julian calendar in use when he was born), returns to its former length, and even includes a book published this year. My book Evidence-based Software Engineering also became available in paperback form this year, and would look great on somebodies’ desk.

The Mars Project by Wernher von Braun, first published in 1953, is a 91-page high-level technical specification for an expedition to Mars (calculated by one man and his slide-rule). The subjects include the orbital mechanics of travelling between Earth and Mars, the complications of using a planet’s atmosphere to slow down the landing craft without burning up, and the design of the spaceships and rockets (the bulk of the material). The one subject not covered is cost; von Braun’s estimated 950 launches of heavy-lift launch vehicles, to send a fleet of ten spacecraft with 70 crew, will not be cheap. I’ve no idea what today’s numbers might be.

The Fabric of Civilization: How textiles made the world by Virginia Postrel is a popular book full of interesting facts about the economic and cultural significance of something we take for granted today (or at least I did). For instance, Viking sails took longer to make than the ships they powered, and spinning the wool for the sails on King Canute‘s North Sea fleet required around 10,000 work years.

Wyclif’s Dust: Western Cultures from the Printing Press to the Present by David High-Jones is covered in an earlier post.

The Second World Wars: How the First Global Conflict Was Fought and Won by Victor Davis Hanson approaches the subject from a systems perspective. How did the subsystems work together (e.g., arms manufacturers and their customers, the various arms of the military/politicians/citizens), the evolution of manufacturing and fighting equipment (the allies did a great job here, Germany not very good, and Japan/Italy terrible) to increase production/lethality, and the prioritizing of activities to achieve aims. The 2011 Christmas books listed “Europe at War” by Norman Davies, which approaches the war from a data perspective.

Through the Language Glass: Why the world looks different in other languages by Guy Deutscher is a science driven discussion (written in a popular style) of the impact of language on the way its speakers interpret their world. While I have read many accounts of the Sapir–Whorf hypothesis, this book was the first to tell me that 70 years earlier, both William Gladstone (yes, that UK prime minister and Homeric scholar) and Lazarus Geiger had proposed theories of color perception based on the color words commonly used by the speakers of a language.

Categories: Uncategorized Tags: books, Christmas, fabricate, human language, Mars, world domination

C++ deprecates some operations on volatile objects

December 4, 2022 Derek Jones 4 comments

Programming do-gooders sometimes fall into the trap of thinking that banning the use of a problematic language construct removes the possibility of the problems associated with that construct’s usage construct from occurring. The do-gooders overlook the fact that developers use language constructs because they solve a coding need, and that banning usage does not make the coding need go away. If a particular usage is banned, then developers have to come up with an alternative to handle their coding need. The alternative selected may have just as many, or more, problems associated with its use as the original usage.

The C++ committee has fallen into this do-gooder trap by deprecating the use of some unary operators (i.e., ++ and --) and compound assignment operators (e.g., += and &=) on objects declared with the volatile type-specifier. The new wording appears in the 2020 version of the C++ Standard; see sections 7.6.1.5, 7.6.2.2, 7.6.19, and 9.6.

Listing a construct as being deprecated gives notice that it might be removed in a future revision of the standard (languages committees tend to accumulate deprecated constructs and rarely actually remove a construct; breaking existing code is very unpopular).

What might be problematic about objects declared with the volatile type-specifier?

By declaring an object with the volatile type-specifier a developer is giving notice that its value can change through unknown mechanisms at any time. For instance, an array may be mapped to the memory location where the incoming bytes from a communications port are stored, or the members of a struct may represent the various status and data information relating to some connected hardware device.

The presence of volatile in an object’s declaration requires that the compiler not optimise away assignments or accesses to said object (because such assignments or accesses can have effects unknown to the compiler).

   volatile int k = 0;
   int i = k, // value of k not guaranteed to be 0
       j = k; // value of k may have changed from that assigned to i
 
   if (i != j)
      printf("The value of k changed from %d to %d\n", i, j);

If, at some point in the future, developers cannot rely on code such as k+=3; being supported by the compiler, what are they to do?

Both the C and C++ Standards state:
“The behavior of an expression of the form E1 op = E2 is equivalent to E1 = E1 op E2 except that E1 is evaluated only once.”

So the code k=k+3; cannot be relied upon to have the same effect as k+=3;.

One solution, which does not make use of any deprecated language constructs, is:

   volatile int k;
   int temp;
   /* ... */
   temp=k;
   temp+=3;
   k=temp;

In what world is the above code less problematic than writing k+=3;?

I understand that in the C++ world there are templates, operator overloading, and various other constructs that can make it difficult to predict how many times an object might be accessed. The solution is to specify the appropriate behavior for volatile objects in these situations. Simply deprecating them for some operators is all cost for no benefit.

We can all agree that the use of volatile has costs and benefits. What is WG21’s (the ISO C++ Committee) cost/benefit analysis for deprecating this usage?

The WG21 proposal P1152, “Deprecating volatile”, claims that it “… preserves the useful parts of volatile, and removes the dubious / already broken ones.”

The proposal is essentially a hatchet job, with initial sections written in the style of the heroic fantasy novel The Name of the Wind, where “…kinds of magic are taught in the university as academic disciplines and have daily-life applications…”; cut-and-pasting of text from WG14 (ISO C committee) documents and C++17 adds bulk. Various issues unrelated to the deprecated constructs are discussed, and it looks like more thought is needed in some of these areas.

Section 3.3, “When is volatile useful?”, sets the tone. The first four paragraphs enumerate what volatile is not useful, before the fifth paragraph admits that “volatile is nonetheless a useful concept to have …” (without listing any reasons for this claim).

How did this deprecation get accepted into the 2020 C++ Standard?

The proposal appeared in October 2018, rather late in the development timeline of a standard published in 2020; were committee members punch drunk by this stage, and willing to wave through what appears to be a minor issue? The document contains 1,662 pages of close text, and deprecation is only giving notice of something that might happen in the future.

Soon after the 2020 Standard was published, the pushback started. Proposal P2327, “De-deprecating volatile compound operations”, noted: “deprecation was not received too well in the embedded community as volatile is commonly used”. However, the authors don’t think that ditching the entire proposal is the solution, instead they propose to just de-deprecate the bitwise compound assignments (i.e., |=, &=, and ^=).

The P2327 proposal contains some construct usage numbers, obtained by grep’ing the headers of three embedded SDKs. Unsurprisingly, there were lots of bitwise compound assignments (all in macros setting various flags).

I used Coccinelle to detect actual operations on volatile objects in the Silabs Gecko SDK C source (one of the SDKs measured in the proposal; semgrep handles C and C++, but does not yet fully handle volatile). The following table shows the number of occurrences of each kind of language construct on a volatile object (code and data):

 Construct    Occurrences
    V++            83
    V--             5
    ++V             9
    --V             2
    bit assign    174
    arith assign   27

Will the deprecated volatile usage appear in C++23? ~~Probably, purely because the deadline for change has passed. Given WG21’s stated objective of a 3-year iteration, the debate will have to wait for work on to start on C++26.~~

The increment and decrement deprecation remains. In response to National Body comments on balloting of the draft C++23 document, it was “resolved to un-deprecate all volatile compound assignments.” Thanks to Jonathan Wakely for the correction.

Categories: Uncategorized Tags: C, coding guideline, embedded systems, language, Standard, volatile

Unneeded requirements implemented in Waterfall & Agile

November 27, 2022 Derek Jones No comments

Software does not wear out, but the world in which it runs evolves. Time and money is lost when, after implementing a feature in software, customer feedback is that the feature is not needed.

How do Waterfall and Agile implementation processes compare in the number of unneeded feature/requirements that they implement?

In a Waterfall process, a list of requirements is created and then implemented. The identity of ‘dead’ requirements is not known until customers start using the software, which is not until it is released at the end of development.

In an Agile process, a list of requirements is used to create a Minimal Viable Product, which is released to customers. An iterative development processes, driven by customer feedback, implements requirements, and makes frequent releases to customers, which reduces the likelihood of implementing known to be ‘dead’ requirements. Previously implemented requirements may be discovered to have become ‘dead’.

An analysis of the number of ‘dead’ requirements implemented by the two approaches appears at the end of this post.

The plot below shows the number of ‘dead’ requirements implemented in a project lasting a given number of working days (blue/red) and the difference between them (green), assuming that one requirement is implemented per working day, with the discovery after 100 working days that a given fraction of implemented requirements are not needed, and the number of requirements in the MVP is assumed to be small (fractions 0.5, 0.1, and 0.05 shown; code):

Dead requirements for Waterfall and Agile projects running for a given number of days, along with difference between them.

The values calculated using one requirement implemented per day scales linearly with requirements implemented per day.

By implementing fewer ‘dead’ requirements, an Agile project will finish earlier (assuming it only implements all the needed requirements of a Waterfall approach, and some subset of the ‘dead’ requirements). However, unless a project is long-running, or has a high requirements’ ‘death’ rate, the difference may not be compelling.

I’m not aware of any data on rate of discovery of ‘dead’ implemented requirements (there is some on rate of discovery of new requirements); as always, pointers to data most welcome.

The Waterfall projects I am familiar with, plus those where data is available, include some amount of requirement discovery during implementation. This has the potential to reduce the number of ‘dead’ implemented requirements, but who knows by how much.

As the size of Minimal Viable Product increases to become a significant fraction of the final software system, the number of fraction of ‘dead’ requirements will approach that of the Waterfall approach.

There are other factors that favor either Waterfall or Agile, which are left to be discussed in future posts.

The following is an analysis of Waterfall/Agile requirements’ implementation.

Define:

$F_{live}$ is the fraction of requirements per day that remain relevant to customers. This value is likely to be very close to one, e.g., 0.999 .
$R_{done}$ requirements implemented per working day.

Waterfall

The implementation of $R_{total}$ requirements takes $I_{days}=R_{total}/R_{done}$ days, and the number of implemented ‘dead’ requirements is (assuming that the no ‘dead’ requirements were present at the end of the requirements gathering phase):

$R_{Wdead}=R_{total}*(1-{F_{live}}^{I_{days}})$

As $I_{days} right infty$ effectively all implemented requirements are ‘dead’.

Agile

The number of implemented ‘live’ requirements on day is given by:

$R_n=F_{live}*R_{n-1}+R_{done}$

with the initial condition that the number of implemented requirements at the start of the first day of iterative development is the number of requirements implemented in the Minimum Viable Product, i.e., $R_0=R_{mvp}$ .

Solving this difference equation gives the number of ‘live’ requirements on day :

$R_n=R_{mvp}*{F_{live}}^n+{n*R_{done}}/{n(1-F_{live})+F_{live}}$

as n right infty , R_n approaches to its maximum value of ${R_{done}}/{1-F_{live}}$

Subtracting the number of ‘live’ requirements from the total number of requirements implemented gives:

$R_{Adead}=R_{mvp}+n*R_{done}-R_n$

or

$R_{Adead}=R_{mvp}(1-{F_{live}}^n)+n*R_{done}(1-1/{n(1-F_{live})+F_{live}})$
or
$R_{Adead}=R_{mvp}(1-{F_{live}}^n)+n*R_{done}{n-1}/{n+F_{live}/(1-F_{live})}$

as n right infty effectively all implemented requirements are ‘dead’, because the number of ‘live’ requirements cannot exceed a known maximum.

Update

The paper A software evolution experiment found that in a waterfall project, 40% of modules in the delivered system were not required.

Categories: Uncategorized Tags: Agile, economics, requirements, waterfall

Stochastic rounding reemerges

November 20, 2022 Derek Jones No comments

Just like integer types, floating-point types are capable of representing a finite number of numeric values. An important difference between integer and floating types is that the result of arithmetic and relational operations using integer types is exactly representable in an integer type (provided they don’t overflow), while the result of arithmetic operations using floating types may not be exactly representable in the corresponding floating type.

When the result of a floating-point operation cannot be exactly represented, it is rounded to a value that can be represented. Rounding modes include: round to nearest (the default for IEEE-754), round towards zero (i.e., truncated), round up (i.e., towards infty ), round down (i.e., towards -infty ), and round to even. The following is an example of round to nearest:

      123456.7    = 1.234567    × 10^5
         101.7654 = 0.001017654 × 10^5
Adding
                  = 1.235584654 × 10^5
Round to nearest
                  = 1.235585    × 10^5

There is another round mode, one implemented in the 1950s, which faded away but could now be making a comeback: Stochastic rounding. As the name suggests, every round up/down decision is randomly chosen; a Google patent makes some claims about where the entropy needed for randomness can be obtained, and Nvidia also make some patent claims).

From the developer perspective, stochastic rounding has a very surprising behavior, which is not present in the other IEEE rounding modes; stochastic rounding is not monotonic. For instance: z < x+y does not imply that 0<(x+y)-z, because x+y may be close enough to z to have a 50% chance of being rounded to one of z or the next representable value greater than z, and in the comparison against zero the rounded value of (x+y) has an uncorrelated probability of being equal to z (rather than the next representable greater value).

For some problems, stochastic rounding avoids undesirable behaviors that can occur when round to nearest is used. For instance, round to nearest can produce correlated rounding errors that cause systematic error growth (by definition, stochastic rounding is uncorrelated); a behavior that has long been known to occur when numerically solving differential equations. The benefits of stochastic rounding are obtained for calculations involving long chains of calculations; the rounding error of the result of operations is guaranteed to be proportional to sqrt{n} , i.e., just like a 1-D random walk, which is not guaranteed for round to nearest.

While stochastic rounding has been supported by some software packages for a while, commercial hardware support is still rare, with Graphcore's Intelligence Processing Unit being one. There are some research chips supporting stochastic rounding, e.g., Intel's Loihi.

What applications, other than solving differential equations, involve many long chain calculations?

Training of machine learning models can consume many cpu hours/days; the calculation chains just go on and on.

Machine learning is considered to be a big enough market for hardware vendors to support half-precision floating-point. The performance advantages of half-precision floating-point are large enough to attract developers to reworking code to make use of them.

Is the accuracy advantage of stochastic rounding a big enough selling point that hardware vendors will provide the support needed to attract a critical mass of developers willing to rework their code to take advantage of improved accuracy?

It's possible that the intrinsically fuzzy nature of many machine learning applications swamps the accuracy advantage that stochastic rounding can have over round to nearest, out-weighing the costs of supporting it.

The ecosystem of machine learning based applications is still evolving rapidly, and we will have to wait and see whether stochastic rounding becomes widely used.

Categories: Uncategorized Tags: floating-point, rounding error, stochastic

Some human biases in conditional reasoning

November 13, 2022 Derek Jones No comments

Tracking down coding mistakes is a common developer activity (for which training is rarely provided).

Debugging code involves reasoning about differences between the actual and expected output produced by particular program input. The goal is to figure out the coding mistake, or at least narrow down the portion of code likely to contain the mistake.

Interest in human reasoning dates back to at least ancient Greece, e.g., Aristotle and his syllogisms. The study of the psychology of reasoning is very recent; the field was essentially kick-started in 1966 by the surprising results of the Wason selection task.

Debugging involves a form of deductive reasoning known as conditional reasoning. The simplest form of conditional reasoning involves an input that can take one of two states, along with an output that can take one of two states. Using coding notation, this might be written as:

    if (p) then q       if (p) then !q
    if (!p) then q      if (!p) then !q

The notation used by the researchers who run these studies is a 2×2 contingency table (or conditional matrix):

where: A, B, C, and D are the number of occurrences of each case; in code notation, p is the input and q the output.

The fertilizer-plant problem is an example of the kind of scenario subjects answer questions about in studies. Subjects are told that a horticultural laboratory is testing the effectiveness of 31 fertilizers on the flowering of plants; they are told the number of plants that flowered when given fertilizer (A), the number that did not flower when given fertilizer (B), the number that flowered when not given fertilizer (C), and the number that did not flower when not given any fertilizer (D). They are then asked to evaluate the effectiveness of the fertilizer on plant flowering. After the experiment, subjects are asked about any strategies they used to make judgments.

Needless to say, subjects do not make use of the available information in a way that researchers consider to be optimal, e.g., Allan’s Delta p index Delta p=P(A vert C)-P(B vert D)=A/{A+B}-C/{C+D} (sorry about the double, vert , rather than single, vertical lines).

What do we know after 40+ years of active research into this basic form of conditional reasoning?

The results consistently find, for this and other problems, that the information A is given more weight than B, which is given by weight than C, which is given more weight than D.

That information provided by A and B is given more weight than C and D is an example of a positive test strategy, a well-known human characteristic.

Various models have been proposed to ‘explain’ the relative ordering of information weighting: w(A)>w(B) > w(C) > w(D)” title=”w(A)>w(B) > w(C) > w(D)”/><a href= , e.g., that subjects have a bias towards sufficiency information compared to necessary information.

Subjects do not always analyse separate contingency tables in isolation. The term blocking is given to the situation where the predictive strength of one input is influenced by the predictive strength of another input (this process is sometimes known as the cue competition effect). Debugging is an evolutionary process, often involving multiple test inputs. I’m sure readers will be familiar with the situation where the output behavior from one input motivates a misinterpretation of the behaviour produced by a different input.

The use of logical inference is a commonly used approach to the debugging process (my suggestions that a statistical approach may at times be more effective tend to attract odd looks). Early studies of contingency reasoning were dominated by statistical models, with inferential models appearing later.

Debugging also involves causal reasoning, i.e., searching for the coding mistake that is causing the current output to be different from that expected. False beliefs about causal relationships can be a huge waste of developer time, and research on the illusion of causality investigates, among other things, how human interpretation of the information contained in contingency tables can be ‘de-biased’.

The apparently simple problem of human conditional reasoning over two variables, each having two states, has proven to be a surprisingly difficult to model. It is tempting to think that the performance of professional software developers would be closer to the ideal, compared to the typical experimental subject (e.g., psychology undergraduates or Mturk workers), but I’m not sure whether I would put money on it.

Categories: Uncategorized Tags: causal, conditionals, experiment, human, mistake, reasoning

Evidence-based Software Engineering book: two years later

November 6, 2022 Derek Jones 2 comments

Two years ago, my book Evidence-based Software Engineering: based on the publicly available data was released. The first two weeks saw 0.25 million downloads, and 0.5 million after six months. The paperback version on Amazon has sold perhaps 20 copies.

How have the book contents fared, and how well has my claim to have discussed all the publicly available software engineering data stood up?

The contents have survived almost completely unscathed. This is primarily because reader feedback has been almost non-existent, and I have hardly spent any time rereading it.

In the last two years I have discovered maybe a dozen software engineering datasets that would have been included, had I known about them, and maybe another dozen non-software related datasets that could have been included in the Human behavior/Cognitive capitalism/Ecosystems/Reliability chapters. About half of these have been the subject of blog posts (links below), with the others waiting to be covered.

Each dataset provides a sliver of insight into the much larger picture that is software engineering; joining the appropriate dots, by analyzing multiple datasets, can provide a larger sliver of insight into the bigger picture. I have not spent much time attempting to join dots, but have joined a few tiny ones, and a few that are not so small, e.g., Estimating using a granular sequence of values and Task backlog waiting times are power laws.

I spent the first year, after the book came out, working through the backlog of tasks that had built up during the 10-years of writing. The second year was mostly dedicated to trying to find software project data (including joining Twitter), and reading papers at a much reduced rate.

The plot below shows the number of monthly downloads of the A4 and mobile friendly pdfs, along with the average kbytes per download (code+data):

Downloads of A4 and mobile pdf over 2-years.

The monthly averages for 2022 are around 6K A4 and 700 mobile friendly pdfs.

I have been averaging one in-person meetup per week in London. Nearly everybody I tell about the book has not previously heard of it.

The following is a list of blog posts either analyzing existing data or discussing/analyzing new data.

Introduction
analysis: Software effort estimation is mostly fake research
analysis: Moore’s law was a socially constructed project

Human behavior
data (reasoning): The impact of believability on reasoning performance
data: The Approximate Number System and software estimating
data (social conformance): How large an impact does social conformity have on estimates?
data (anchoring): Estimating quantities from several hundred to several thousand
data: Cognitive effort, whatever it might be

Ecosystems
data: Growth in number of packages for widely used languages
data: Analysis of a subset of the Linux Counter data
data: Overview of broad US data on IT job hiring/firing and quitting

Projects
analysis: Delphi and group estimation
analysis: The CESAW dataset: a brief introduction
analysis: Parkinson’s law, striving to meet a deadline, or happenstance?
analysis: Evaluating estimation performance
analysis: Complex software makes economic sense
analysis: Cost-effectiveness decision for fixing a known coding mistake
analysis: Optimal sizing of a product backlog
analysis: Evolution of the DORA metrics
analysis: Two failed software development projects in the High Court

data: Pomodoros worked during a day: an analysis of Alex’s data
data: Multi-state survival modeling of a Jira issues snapshot
data: Over/under estimation factor for ‘most estimates’
data: Estimation accuracy in the (building|road) construction industry
data: Rounding and heaping in non-software estimates
data: Patterns in the LSST:DM Sprint/Story-point/Story ‘done’ issues
data: Shopper estimates of the total value of items in their basket

Reliability
analysis: Most percentages are more than half

Statistical techniques
Fitting discontinuous data from disparate sources
Testing rounded data for a circular uniform distribution

Post 2020 data
Pomodoros worked during a day: an analysis of Alex’s data
Impact of number of files on number of review comments
Finding patterns in construction project drawing creation dates

Categories: Uncategorized Tags: book, feedback

A study of deceit when reporting information in a known context

October 30, 2022 Derek Jones No comments

A variety of conflicting factors intrude when attempting to form an impartial estimate of the resources needed to perform a task. The customer/manager, asking for the estimate, wants to hear a low value, creating business/social pressure to underestimate; overestimating increases the likelihood of completing the task within budget.

A study by Oey, Schachner and Vul investigated the strategic reasoning for deception/lying in a two-person game.

A game involved a Sender and Receiver, with the two players alternating between the roles. The game started with both subjects seeing a picture of a box containing red and blue marbles (the percentage of red marbles was either 20%, 50%, or 80%). Ten marbles were randomly selected from this ‘box’, and shown to the Sender. The Sender was asked to report to the Receiver the number of red marbles appearing in the random selection, $K_{report}$ (there was an incentive to report higher/lower, and punishment for being caught being inaccurate). The Receiver could accept or reject the number of red balls reported by the Sender. In the actual experiment, unknown to the human subjects, one of every game’s subject pair was always played by a computer. Every subject played 100 games.

In the inflate condition: If the Receiver accepted the report, the Sender gained $K_{report}$ points, and the Receiver gained $10-K_{report}$ points.

If the Receiver rejected the report, then:

if the Sender’s report was accurate (i.e., $K_{report}$ == $K_{actual}$ ), the Sender gained $K_{report}$ points, and the Receiver gained $10-K_{report}-5$ points (i.e., a -5 point penalty),
if the Sender’s report was not accurate, the Receiver gained 5 points, and the Sender lost 5 points.

In the deflate condition: The points awarded to the Sender was based on the number of blue balls in the sample, and the points awarded to the Received was based on the number of red balls in the sample (i.e., the Sender had in incentive to report fewer red balls).

The plot below shows the mean rate of deceit (i.e., the fraction of a subject’s reports where $K_{actual} < K_{report}$ , averaged over all 116 subject’s mean) for a given number of red marbles actually seen by the Sender; vertical lines show one standard deviation, calculated over the mean of all subjects (code+data):

Subjects have some idea of the percentage of red/blue balls, and are aware that their opponent has a similar idea.

The wide variation in the fraction of reports where a subject reported a value greater than the number of marbles seen, is likely caused by variation in subject level of risk aversion. Some subjects may have decided to reduce effort by always accurately reporting, while others may have tried to see how much they could get away with.

The wide variation is particularly noticeable in the case of a box containing 80% red. If a Sender’s random selection contains few reds, then the Sender can feel confident reporting to have seen more.

The general pattern shows subjects being more willing to increase the reported number when they are supplied with few.

There is a distinct change of behavior when half of the sample contains more than five red marbles. In this situation, subjects may be happy to have been dealt a good hand, and are less inclined to risk losing 5-points for less gain.

Estimating involves considering more factors than the actual resources likely to be needed to implement the task; the use of round numbers is one example. This study is one of few experimental investigations of numeric related deception. The use of students having unknown motivation is far from ideal, but they are better than nothing.

When estimating in a team context, there is an opportunity to learn about the expectations of others and the consequences of over/under estimating. An issue for another study 🙂

Categories: Uncategorized Tags: deception, experiment, human, risk profile

Studying the lifetime of Open source

October 23, 2022 Derek Jones No comments

A software system can be said to be dead when the information needed to run it ceases to be available.

Provided the necessary information is available, plus time/money, no software ever has to remain dead, hardware emulators can be created, support libraries can be created, and other necessary files cobbled together.

In the case of software as a service, the vendor may simply stop supplying the service; after which, in my experience, critical components of the internal service ecosystem soon disperse and are forgotten about.

Users like the software they use to be actively maintained (i.e., there are one or more developers currently working on the code). This preference is culturally driven, in that we are living through a period in which most in-use software systems are actively maintained.

Active maintenance is perceived as a signal that the software has some amount of popularity (i.e., used by other people), and is up-to-date (whatever that means, but might include supporting the latest features, or problem reports are being processed; neither of which need be true). Commercial users like actively maintained software because it enables the option of paying for any modifications they need to be made.

Software can be a zombie, i.e., neither dead or alive. Zombie software will continue to work for as long as the behavior of its external dependencies (e.g., libraries) remains sufficiently the same.

Active maintenance requires time/money. If active maintenance is required, then invest the time/money.

Open source software has become widely used. Is Open source software frequently maintained, or do projects inhabit some form of zombie state?

Researchers have investigated various aspects of the life cycle of open source projects, including: maintenance activity, pull acceptance/merging or abandoned, and turnover of core developers; also, projects in niche ecosystems have been investigated.

The commits/pull requests/issues, of circa 1K project repos with lots of stars, is data that can be automatically extracted and analysed in bulk. What is missing from the analysis is the context around the creation, development and apparent abandonment of these projects.

Application areas and development tools (e.g., editor, database, gui framework, communications, scientific, engineering) tend to have a few widely used programs, which continue to be actively worked on. Some people enjoy creating programs/apps, and will start development in an area where there are existing widely used programs, purely for the enjoyment or to scratch an itch; rarely with the intent of long term maintenance, even when their project attracts many other developers.

I suspect that much of the existing research is simply measuring the background fizz of look-alike programs coming and going.

A more realistic model of the lifecycle of Open source projects requires human information; the intent of the core developers, e.g., whether the project is intended to be long-term, primarily supported by commercial interests, abandoned for a successor project, or whether events got in the way of the great things planned.

Categories: Uncategorized Tags: ecosystems, evolution, projects, software development, survival

Clustering source code within functions

October 16, 2022 Derek Jones 1 comment

The question of how best to cluster source code into functions is a perennial debate that has been ongoing since functions were first created.

Beginner programmers are told that clustering code into functions is good, for a variety of reasons (none of the claims are backed up by experimental evidence). Structuring code based on clustering the implementation of a single feature is a common recommendation; this rationale can be applied at both the function/method and file/class level.

The idea of an optimal function length (measured in statements) continues to appeal to developers/researchers, but lacks supporting evidence (despite a cottage industry of research papers). The observation that most reported fault appear in short functions is a consequence of most of a program’s code appearing in short functions.

I have had to deal with code that has not been clustered into functions. When microcomputers took off, some businessmen taught themselves to code, wrote software for their line of work and started selling it. If the software was a success, more functionality was needed, and the businessman (never encountered a woman doing this) struggled to keep on top of things. A common theme was a few thousand lines of unstructured code in one function in a single file (keeping everything in one file is also a trait of highly focus developers).

Adding structural bureaucracy (e.g., functions and multiple files) reduced the effort needed to maintain and enhance the code.

The problem with ‘born flat’ source is that the code for unrelated functionality is often intermixed, and global variables are freely used to communicate state. I have seen the same problems in structured function code, but instances are nowhere near as pervasive.

When implementing the same program, do different developers create functions implementing essentially the same functionality?

I am aware of two datasets relating to this question: 1) when implementing the same small specification (average length program 46.3 lines), a surprising number of variants (6,301) are created, 2) an experiment that asked developers to reintroduce functions into ‘flattened’ code.

The experiment (Alexey Braver’s MSc thesis) took an existing Python program, ‘flattened’ it by inlining functions (parameters were replaced by the corresponding call arguments), and asked subjects to “… partition it into functions in order to achieve what you consider to be a good design.”

The 23 rows in the plot below show the start/end (green/brown delimited by blue lines) of each function created by the 23 subjects; red shows code not within a function, and right axis is percentage of each subjects’ code contained in functions. Blue line shows original (currently plotted incorrectly; patched original code+data):

3n+1 programs containing various lines of code.

There are many possible reasons for the high level of agreement between subjects, including: 1) the particular example chosen, 2) the code was already well-structured, 3) subjects were explicitly asked to create functions, 4) the iterative process of discovering code that needs to be written did not occur, 5) no incentive to leave existing working code as-is.

Given that most source has a short and lonely existence, is too much time being spent bike-shedding function contents?

Given how often lower level design time happens at code implementation time, perhaps discussion of function contents ought to be viewed as more about thinking how things fit together and interact, than about each function in isolation.

Analyzing each function in isolation can create perverse incentives.

Categories: Uncategorized Tags: bureaucracy, experiment, function size

Printing press+widespread religious behavior: A theory

October 9, 2022 Derek Jones No comments

The book The Weirdest People in the World: How the West Became Psychologically Peculiar and Particularly Prosperous provides an explanation of the processes which weakened the existing social ties of family and tribe; however, the emergence of WEIRD people (Western, Educated, Industrialized, Rich and Democratic) required new social norms to spread and be accepted throughout society. A major technical innovation, in the form of the printing press, provided the means for mass communication of ideas and practices.

David High-Jones’ book Wyclif’s Dust: Western Cultures from the Printing Press to the Present describes the social consequences of what he calls book religion; a combination of deeply religious western societies and the ability of individuals to write and sell affordable books (made possible by the printing press). Religion+printing press created the conditions for what High-Jones calls a hothouse culture, a period from the 1600s to the end of the 1800s.

Around 1440 the printing press is invented and quickly spreads; around 5 million books were handwritten in the 1400s, about 80 million books were produced in the first 50 years of printing, and around a billion in the 1700s. During the 1500s the Protestant reformation happens; Protestant encouraged its followers to read the Bible, which creates a demand for printed Bibles and the need to be able to read (which increases literacy rates). In England, between 1480-1640, 40% of published books were religious.

The changes to society’s existing norms are wrought by cultural transmission, initially via middle class parents making use of edifying books to teach their children moral values and social skills, later Sunday schools took on this role, but also had to offer reading lessons to attract members. In the adult world, accepted norms were maintained by social enforcement. The impact on western societies was widespread because observant religious behavior was widespread.

The original intent, of those writing the religious books, was the creation of a god fearing society. In practice, a trust based society was created, where workers might be relied upon not to shirk their duties and businessmen to not renege on agreements.

In the beginning science, in the form of printed technical books, rarely made an appearance. In the 1700s the Enlightenment happens, and scientific books are discussed by small collections of disparate individuals. The industrial revolution happens, but the bulk of the demand is for trustworthy workers; technical and scientific know how remains a minority interest.

In Part I of the book, High-Jones weaves a reading and convincing narrative. Part II, 1900 to today, is a tale of the crumbling and breakdown of the social forces and incentives that creates the trust based society; while example are enumerated, no overarching theory is proposed (I skimmed this part).

Categories: Uncategorized Tags: books, evolution, human behavior, social learning, social pressure

Newer Entries Older Entries

The Shape of Code

Christmas books for 2022

C++ deprecates some operations on volatile objects

Unneeded requirements implemented in Waterfall & Agile

Stochastic rounding reemerges

Some human biases in conditional reasoning

Evidence-based Software Engineering book: two years later

A study of deceit when reporting information in a known context

Studying the lifetime of Open source

Clustering source code within functions

Recent Posts

Recent Comments

Archives

Meta