Home > Uncategorized > My 2022 search for software engineering data

My 2022 search for software engineering data

At the start of this year, 2022, I announced a crowdsourced search for software engineering data, in May, as part of this search I created the evidenceSE account on Twitter, once a week, on average, I attended an in-person Meetup somewhere in London, I gave one talk and a handful of lightening talks.

What software engineering data did all this effort uncover?

Thanks to Diomidis Spinellis the crowdsource search did not have a zero outcome (the company who provided some data has been rather busy, so progress on iterating on the data analysis has been glacial).

My time spent of Twitter did not even come close to finding a decent sized dataset (a couple of tiny ones were found). When I encountered a tweet claiming to involve evidence in software engineering, I replied asking for a reference to the evidence. Sometimes the original tweet was deleted, sometimes the user blocked me, and sometimes an exchange on the difficulty of obtaining data ensued.

I am a member of 87 meetup groups; essentially any software related group holding an in-person event in London in 2022, plus pre-COVID memberships. Event cadence was erratic, dramatically picking up before Christmas, and I’m expecting it to pick up again in the New Year. I learned some interesting stuff, and spoke to many interesting people, mostly working at large companies (i.e., they have lawyers, so little chance of obtaining data). The idea of an evidence-based approach to software engineering was new to a surprising number of people; the non-recent graduates all agreed that software engineering was driven by fashion/opinions/folklore. I spoke to several people who planned to spend time researching software development in 2023, and one person who ticked all the boxes as somebody who has data and might be willing to release it.

My ‘tradition’ method of finding data (i.e., reading papers and blogs) has continued to uncover new data, but at a slower rate than previous years. Is this a case of diminishing returns (my 2020 book does claim to discuss all the publicly available data), my not reading as many papers as in previous years, or the collateral damage from COVID?

Interesting sources of general data that popped-up in 2022.

  • After years away, Carlos returned with his weekly digest Data Machina (now on substack),
  • I discovered Data Is Plural, a weekly newsletter of useful/curious datasets.
  1. No comments yet.
  1. No trackbacks yet.