Experimental method for measuring benefits of identifier naming
I was recently came across a very interesting experiment in Eran Avidan’s Master’s thesis. Regular readers will know of my interest in identifiers; while everybody agrees that identifier names have a significant impact on the effort needed to understand code, reliably measuring this impact has proven to be very difficult.
The experimental method looked like it would have some impact on subject performance, but I was not expecting a huge impact. Avidan’s advisor was Dror Feitelson, who kindly provided the experimental data, answered my questions and provided useful background information (Dror is also very interested in empirical work and provides a pdf of his book+data on workload modeling).
Avidan’s asked subjects to figure out what a particular method did, timing how long it took for them to work this out. In the control condition a subject saw the original method and in the experimental condition the method name was replaced by local and parameter names were replaced by single letter identifiers; in all cases the method name was replaced by xxx
andxxx
. The hypothesis was that subjects would take longer for methods modified to use ‘random’ identifier names.
A wonderfully simple idea that does not involve a lot of experimental overhead and ought to be runnable under a wide variety of conditions, plus the difference in performance is very noticeable.
The think aloud protocol was used, i.e., subjects were asked to speak their thoughts as they processed the code. Having to do this will slow people down, but has the advantage of helping to ensure that a subject really does understand the code. An overall slower response time is not important because we are interested in differences in performance.
Each of the nine subjects sequentially processed six methods, with the methods randomly assigned as controls or experimental treatments (of which there were two, locals first and parameters first).
The procedure, when a subject saw a modified method was as follows: the subject was asked to explain the method’s purpose, once an answer was given (or 10 mins had elapsed) either the local or parameter names were revealed and the subject had to again explain the method’s purpose, and when an answer was given the names of both locals and parameters was revealed and a final answer recorded. The time taken for the subject to give a correct answer was recorded.
The summary
output of a model fitted using a mixed-effects model is at the end of this post (code+data; original experimental materials). There are only enough measurements to have subject
as a random effect on the treatment
; no order of presentation data is available to look for learning effects.
Subjects took longer for modified methods. When parameters were revealed first, subjects were 268 seconds slower (on average), and when locals were revealed first 342 seconds slower (the standard deviation of the between subject differences was 187 and 253 seconds, respectively; less than the treatment effect, surprising, perhaps a consequence of information being progressively revealed helping the slower performers).
Why is subject performance less slow when parameter names are revealed first? My thoughts: parameter names (if well-chosen) provide clues about what incoming values represent, useful information for figuring out what a method does. Locals are somewhat self-referential in that they hold local information, often derived from parameters as initial values.
What other factors could impact subject performance?
The number of occurrences of each name in the body of the method provides an opportunity to deduce information; so I think time to figure out what the method does should less when there are many uses of locals/parameters, compared to when there are few.
The ability of subjects to recognize what the code does is also important, i.e., subject code reading experience.
There are lots of interesting possibilities that can be investigated using this low cost technique.
Linear mixed model fit by REML ['lmerMod'] Formula: response ~ func + treatment + (treatment | subject) Data: idxx REML criterion at convergence: 537.8 Scaled residuals: Min 1Q Median 3Q Max -1.34985 -0.56113 -0.05058 0.60747 2.15960 Random effects: Groups Name Variance Std.Dev. Corr subject (Intercept) 38748 196.8 treatmentlocals first 64163 253.3 -0.96 treatmentparameters first 34810 186.6 -1.00 0.95 Residual 43187 207.8 Number of obs: 46, groups: subject, 9 Fixed effects: Estimate Std. Error t value (Intercept) 799.0 110.2 7.248 funcindexOfAny -254.9 126.7 -2.011 funcrepeat -560.1 135.6 -4.132 funcreplaceChars -397.6 126.6 -3.140 funcreverse -466.7 123.5 -3.779 funcsubstringBetween -145.8 125.8 -1.159 treatmentlocals first 342.5 124.8 2.745 treatmentparameters first 267.8 106.0 2.525 Correlation of Fixed Effects: (Intr) fncnOA fncrpt fncrpC fncrvr fncsbB trtmntlf fncndxOfAny -0.524 funcrepeat -0.490 0.613 fncrplcChrs -0.526 0.657 0.620 funcreverse -0.510 0.651 0.638 0.656 fncsbstrngB -0.523 0.655 0.607 0.655 0.648 trtmntlclsf -0.505 -0.167 -0.182 -0.160 -0.212 -0.128 trtmntprmtf -0.495 -0.184 -0.162 -0.184 -0.228 -0.213 0.673 |
Recent Comments