McCabe’s cyclomatic complexity and accounting fraud
The paper in which McCabe proposed what has become known as McCabe’s cyclomatic complexity did not contain any references to source code measurements, it was a pure ego and bluster paper.
Fast forward 10 years and cyclomatic complexity, complexity metric, McCabe’s complexity…permutations of the three words+metrics… has become one of the two major magical omens of code quality/safety/reliability (Halstead’s is the other).
It’s not hard to show that McCabe’s complexity is a rather weak measure of program complexity (it’s about as useful as counting lines of code).
Just as it is possible to reduce the number of lines of code in a function (by putting all the code on one line), it’s possible to restructure existing code to reduce the value of McCabe’s complexity (which is measured for individual functions).
The value of McCabe’s complexity for the following function is 5 16, i.e., and there are 16 possible paths through the function:
int main(void) { if (W) a(); else b(); if (X) c(); else d(); if (Y) e(); else f(); if (Z) g(); else h(); } |
each if
…else
contains two paths and there are four in series, giving paths.
Restructuring the code, as below, removes the multiplication of paths caused by the sequence of if
…else
:
void a_b(void) {if (W) a(); else b();} void c_d(void) {if (X) c(); else d();} void e_f(void) {if (Y) e(); else f();} void g_h(void) {if (Z) g(); else h();} int main(void) { a_b(); c_d(); e_f(); g_h(); } |
reducing main
‘s McCabe complexity to 1 and the four new functions each have a McCabe complexity of two.
Where has the ‘missing’ complexity gone? It now ‘exists’ in the relationship between the functions, a relationship that is not included in the McCabe complexity calculation.
The number of paths that can be traversed, by a call to main
, has not changed (but the McCabe method for counting them now produces a different answer)
Various recommended practice documents suggest McCabe’s complexity as one of the metrics to consider (but don’t suggest any upper limit), while others go as far as to claim that it’s bad practice for functions to have a McCabe’s complexity above some value (e.g., 10) or that “Cyclomatic complexity may be considered a broad measure of soundness and confidence for a program“.
Consultants in the code quality/safety/security business need something to complain about, that is not too hard or expensive for the client to fix.
If a consultant suggested that you reduced the number of lines in a function by joining existing lines, to bring the count under some recommended limit, would you take them seriously?
What about, if a consultant highlighted a function that had an allegedly high McCabe’s complexity? Should what they say be taken seriously, or are they essentially encouraging developers to commit the software equivalent of accounting fraud?
Cyclomatic complexity is not the distinct number of paths through a function. For the example given, I compute 5 (not 16).
I would argue that you have increased complexity by splitting up the original function into multiple smaller functions, and one measure that shows that is the sum over all functions of cyclomatic complexity for each function. Whether that is good or bad depends on your perspective. The split increases overall unit test effort (bad), but may make each individual function easier to comprehend (good).
Switch statements aside, high cyclomatic complexity is a fair indicator of code that is hard to understand. Such code can sometimes be made more understandable by extracting portions into separate functions. That reduces cyclomatic complexity. Extracting portions into separate functions can be done well, or it can be done poorly, and cyclomatic complexity won’t tell you which is which.
I have successfully used a weighted combination of SLOC and cyclomatic complexity as a measure of how complex a function is. It was successful in the sense that it pointed out real problems whilst having a much lower false-positive rate than either SLOC or cyclomatic complexity alone.
Michael
@Michael Tempest
Oops, thanks. Fixed.
I would agree that the code has been made more complicated (but I’m not sure how to measure complicated).
Cyclomatic complexity strongly correlates with lines of code and all the things said about functions containing lots of lines can also be said about cyclomatic complexity. What code characteristics are unique to cyclomatic complexity? There might be sone, but I don’t know what they are.
Branches represent decision points, so one would think they have higher information content than other kinds of lines. Higher information content implies greater cognitive effort to understand. But turning this into a reproducible formula remains elusive.