Empirical SE groups doing interesting work, 2013 version
Various people have asked me about who is currently doing interesting work in empirical software engineering and the following is an attempt to help answer this question. Interestingness is very subjective, in my case it is based on whether I think the work can contribute something towards my book on empirical software engineering.
To keep this list manageable I am restricting myself to groups of researchers (a group is two or more people) and giving priority to those who make their data publicly available.
Some background for those with no experience of academic research. Over a period of 4-5 years a group can go from having published nothing on a research topic to publishing some very interesting stuff to not publishing anything on the topic. Reasons for this include funding appearing/disappearing, the arrival/departure of very productive people (departure may be to other jobs or moving from research into management), or the researcher loosing interest and moving onto other things. A year from now any of the following groups may be disbanded or moved on to other research areas.
The conferences to check out are: Mining Software Repositories, Source Code Analysis and Manipulation, perhaps 1 in 2.5 of CREST Open Workshop and International Conference on Software Maintenance.
General sources of raw data include: promisedata and FLOSSmole is a firehose of bytes.
Who is the biggest group of researchers? In my mind it is the Canadians (to be exact the groups at Queen’s and Waterloo and the Ptidel project), now the empirical group at Microsoft would probably point out that they are not separated by several hundred miles and all work for the same company; this may be true but looking from Europe the Canadians look real close to each other on a map and all share a domain name ending in ca. In practice members of all three groups write papers together and spend time visiting/interning with each other. Given how rapidly things change I am not going to bother calculating an accurate number 1 for today.
Around the world (where there is no group page to link to I have used an individual’s page):
UK (theory in groups, practice by individuals; Brunel would warrant a link if they put some effort into maintaining a web presence and made their data available for download; come on guys)
USA (Devanbu, Grechanik, Kemerer, Menzies, SEMERU + TODO; Binkley for identifier semantics)
Some researchers leave a group to set up their own group and I know that some people in the above lists have done this. I wish them luck. If their group starts publishing interesting stuff they will be on any future version of this list.
Sitting here typing away I have probably missed out some obvious candidates. Pointers to obvious omissions welcome (remember this is about groups not individuals).
Perhaps a little off topic but I’ve been following your blog for some time. Recently I’ve been doing PHP programming and it is very high productivity in as much as you can build sites very quickly, with a fast turnaround for testing and development.
However PHP is not a particularly disciplined language, and that’s part of the rapid development paradigm I suppose. Since PHP syntax is pretty close to C and C++ I would expect there is a lot of money to be made in some sort of code analyzer for PHP. In particular, searching for security loopholes, SQL injection, and other blunders. The PHP “taint” mode is considered by many to be a bit weak, and many common distributions don’t even bother compiling it into packages.
Quite a lot of eCommerce still goes through PHP sites, not to mention blogs getting hacked all over the place. How about some research into this angle?
@Telford Tendys
There are a number of PHP tools available, such as PHP_Sniffer and RIPS; there are researchers doing all sorts of stuff with PHP such as analysing strings and analysing general usage.