Archive

Posts Tagged ‘software engineering’

Workshop on App Store Analysis

October 29, 2014 No comments

I was at the 36th CREST Open Workshop, on App Store Analysis, at the start of this week. The attendee list reads like a who’s who of academics researching App stores. What really stood out for me was the disconnect between my view of the software engineering aspects of developing mobile Apps and the view of many, but not all, academics in the room.

Divergent points of view on App development being different because… included:

Academics: they are written by a small number (often one) of developers.
Me: This was true in the early days of microprocessors and the web. When something new comes out only a small number of people are involved in it and few companies are willing to invest in setting up large development teams. If the new thing succeeds (i.e., there is money to be made) the money to create large teams will follow.

Academics: third party libraries make a significant contribution to functionality.
Me: This is true of a lot of web software and it is becoming more common for Apps on all platforms. It was not true in the past because the libraries were not available; Open Source changed all that.

Academics: they are not structured/written according to software engineering principles (someone in the room thought that waterfall was still widely used).
Me: This is true of most software produced by individuals who are writing something out of interest in their spare time or because they are not gainfully employed in ‘real’ work. When microcomputers were new the internal quality of most software on the market was truly appalling; it was primarily written by people who knew a market niche very well and taught themselves programming, the software sold because it addressed the needs to its customers and code quality was irrelevant (of course the successful products eventually needed to be maintained, which in when code quality became important, but they now had money to employ developers who knew about that kind of stuff).

Academics: the rapid rate of change (in tools and libraries etc) being experienced will continue into the foreseeable future.
Me: I was staggered that anyone could think this.

Academics: lots of money to be made for minimal investment:
Me: Those days are past.

Me: power drain issues (may) be a significant design issues.
Academics: Blank look.

Other things to report:

Various concerns raised by people who had encountered the viewpoint that mobile Apps were not considered worthy of serious academic study within software engineering; this point of view seemed to be changing. I don’t recall there every having been academic research groups targeting microcomputer software, but this certainly happened for web development.

I was a bit surprised at the rather rudimentary statistical techniques that were being used. But somebody is working on a book to change this.

Success: Software engineering data is starting to become very dull

July 31, 2014 No comments

A few years ago it was unusual for the author(s) of a paper in software engineering to make their data public (and on top of that it was rare to encounter a paper that actually made use of empirical data). The situation now is that I am having trouble keeping up with all the papers that include a link to downloadable data. Part of the problem is that I will pay a lot more attention to papers that come with data, having lived through a long famine I have not yet adjusted to the greater abundance. I’m sure that journal editors and referees are in the same boat and are being lured by accompanying data to accept paper for publication that they would otherwise have rejected.

This growing quantity of empirical software engineering data means we can now start thinking about what data is useful to have and what data is not so useful. Data is useful if it highlights a pattern of behavior that can be used to help reduce the resources needed to create/maintain software.

To get a handle on estimating data usefulness we need a model of research in software engineering. While many have used Physics as the model for software engineering research (i.e., a few simple universal laws that apply everywhere), I think Biology is a much better fit.

Software is written in different habitats environments (e.g., small teams, large teams) and targets different habitats environments (e.g., embedded, desktop, mobile, supercomputer) using different techniques and driven by different predators/prey market forces (e.g., release first/quickly, be reliable). Yes there are common drivers, just as the living things studied by biologists share a common need to eat, sleep and reproduce.

Like biology, the bulk of software engineering research is about the study of niche topics, with some small percentage of researchers trying to build theories that tie everything together at one level or another to create bigger pictures.

This model of software engineering research means estimating the usefulness of data probably requires some knowledge of the niche to which it applies. It also means that a particular data set might not be useful yet because it needs to be combined with other data, that does not yet exist (perhaps it was collected first because it was easier to do).

So in a space of a few years most software engineering data has gone from being very interesting (because it is rare) to being very dull (because it is harder to stand out in a crowd).

Background to my book project “Empirical Software Engineering with R”

June 22, 2012 7 comments

This post provides background information that can be referenced by future posts.

For the last 18 months I have been working in fits and starts on a book that has the working title “Empirical Software Engineering with R”. The idea is to provide broad coverage of software engineering issues from an empirical perspective (i.e., the discussion is driven by the analysis of measurements obtained from experiments); R was chosen for the statistical analysis because it is becoming the de-facto language of choice in a wide range of disciplines and lots of existing books provide example analysis using R, so I am going with the crowd.

While my last book took five years to write I had a fixed target, a template to work to and a reasonably firm grasp of the subject. Empirical software engineering has only really just started, the time interval between new and interesting results appearing is quiet short and nobody really knows what statistical techniques are broadly applicable to software engineering problems (while the normal distribution is the mainstay of the social sciences a quick scan of software engineering data finds few occurrences of this distribution).

The book is being driven by the empirical software engineering rather than the statistics, that is I take a topic in software engineering and analyse the results of an experiment investigating that topic, providing pointers to where readers can find out more about the statistical techniques used (once I know which techniques crop up a lot I will write my own general introduction to them).

I’m assuming that readers have a reasonable degree of numeric literacy, are happy dealing with probabilities and have a rough idea about statistical ideas. I’m hoping to come up with a workable check-list that readers can use to figure out what statistical techniques are applicable to their problem; we will see how well this pans out after I have analysed lots of diverse data sets.

Rather than wait a few more years before I can make a complete draft available for review I have decided to switch to making available individual parts as they are written, i.e., after writing a draft discussion and analysis of each experiment I will published it on this blog (along with the raw data and R code used in the analyse). My reasons for doing this are:

  • Reader feedback (I hope I get some) will help me get a better understanding of what people are after from a book covering empirical software engineering from a statistical analysis of data perspective.
  • Suggestions for topics to cover. I am being very strict and only covering topics for which I have empirical data and can make that data available to readers. So if you want me to cover a topic please point me to some data. I will publish a list of important topics for which I currently don’t have any data, hopefully somebody will point me at the data that can be used.
  • Posting here will help me stay focused on getting this thing done.

Links to book related posts

Distribution of uptimes for high-performance computing systems

Break even ratios for development investment decisions

Agreement between code readability ratings given by students

Changes in optimization performance of gcc over time

Descriptive statistics of some Agile feature characteristics

Impact of hardware characteristics on detectable fault behavior

Prioritizing project stakeholders using social network metrics

Preferential attachment applied to frequency of accessing a variable

Amount of end-user usage of code in Firefox

How many ways of programming the same specification?

Ways of obtaining empirical data in software engineering

What is the error rate for published mathematical proofs?

Changes in the API/non-API method call ratio with program size

Honking the horn for go faster memory (over go faster cpus)

How to avoid being a victim of Brooks’ law

Evidence for the benefits of strong typing, where is it?

Hardware variability may be greater than algorithmic improvement

Extracting the original data from a heatmap image

Entropy: Software researchers go to topic when they have no idea what else to talk about

Debian has cast iron rules for package growth & death

Joke: Student subjects in software engineering experiments