Vigilant Software Blog
Defect Spotlight: Conflicting assumptions
Are you interested in code quality? This is part of a series of blogs called Defect Spotlight. In it, we'll be bringing you a new example of a real-world defect found by static analysis. Learn from the mistakes shown here!
Looking at source code can be pretty fascinating: there is a lot of information packed into a very small space. This information comes in different flavors and is woven in layers throughout the program.
In the strictest sense, source code is a sequence of constructs that the compiler uses to generate executable code. Initially, it might seem like this is the only layer there is (especially if there are no comments). But since we're not the compiler, we're free to look at source code from other angles. For example, look at the names of objects and functions in the program -- they convey semantic information from the author to the reader. This information helps humans understand the meaning of code, even if those names are arbitrary and ultimately meaningless to the compiler.
But wait, there's more! It turns out there is a lot of information in these layers. The call graph of a program describes the structure of the program at a high level. Comments explain the rationale for particular choices in the code, or tell us how to interact with it. The formatting and modularity of the code can tell you about the level of experience of the author. The complexity of a function can tell you about the likelihood of problems therein. The list goes on and on.
Today I want to look at another bit of information embedded in the code: assumptions. In particular, the assumptions made by programmers about the state of the program. It might seem like assumptions are hidden inside the mind of the author, but it turns out that they can often be inferred by the actions performed (or not performed) in the code itself.
In this example, found in the squid caching proxy server (in file urn.cc), Sentry saw something interesting is going on:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | urls = urnParseReply(s, urnState->request->method); for (i = 0; NULL != urls[i].url; i++) urlcnt++; debugs(53, 3, "urnFindMinRtt: Counted " << i << " URLs"); if (urls == NULL) { /* unkown URN error */ debugs(52, 3, "urnTranslateDone: unknown URN " << e->url() ); err = errorCon(ERR_URN_RESOLVE, HTTP_NOT_FOUND, urnState->request); err->url = xstrdup(e->url()); errorAppendEntry(e, err); urnHandleReplyError(urnState, urlres_e); return; } |
There are many assumptions implicit in the code above. Here are a couple I want to discuss:
- The
forloop on line 2 makes the assumption thaturnParseReplydoes not returnNULL. This is implied by the fact that the loop condition dereferences theurlsvariable without checking if it isNULLfirst. - The
ifstatement on line 7 assumes thaturnParseReplymay returnNULL, because its result is checked forNULLin theifcondition.
The attentive reader will notice that these assumptions are in conflict --
they cannot both be true. Either urnParseReply can return NULL, in which
case the for loop will cause a crash, or urnParseReply never returns
NULL, in which case the entire if statement spanning lines 7 through 14 is
unnecessary (i.e. dead) code.
To resolve the conflict, we need to take a look at the urnParseReply
function. A quick look informs us that urnParseReply always allocates and
returns a list of URLs (even if the list has zero elements, it's never
NULL). Seeing this, we now know the second assumption was incorrect, and we
can safely identify lines 7-14 as dead code.
(How does this happen? When someone modifies a function that has been around for a long time, he or she may make different assumptions about the code than the original author did, simply because it has been so long since it was first written. This can even happen if the person modifying the code is the original author, months or years later. Since most projects are maintained by many people over many years, looking for conflicting assumptions turns out to be a great way to identify software defects.)
We're starting to see that even a small snippet of code conveys a lot of information (far more than I've discussed here!). Fortunately, some of this information can be deduced automatically with static analysis, bringing situations like the above to your attention, and enabling you to identify and fix the problem.