Vigilant Software Blog

Defect Spotlight: Conflicting assumptions

Are you interested in code quality? This is part of a series of blogs called Defect Spotlight. In it, we'll be bringing you a new example of a real-world defect found by static analysis. Learn from the mistakes shown here!

Looking at source code can be pretty fascinating: there is a lot of information packed into a very small space. This information comes in different flavors and is woven in layers throughout the program.

In the strictest sense, source code is a sequence of constructs that the compiler uses to generate executable code. Initially, it might seem like this is the only layer there is (especially if there are no comments). But since we're not the compiler, we're free to look at source code from other angles. For example, look at the names of objects and functions in the program -- they convey semantic information from the author to the reader. This information helps humans understand the meaning of code, even if those names are arbitrary and ultimately meaningless to the compiler.

But wait, there's more! It turns out there is a lot of information in these layers. The call graph of a program describes the structure of the program at a high level. Comments explain the rationale for particular choices in the code, or tell us how to interact with it. The formatting and modularity of the code can tell you about the level of experience of the author. The complexity of a function can tell you about the likelihood of problems therein. The list goes on and on.

Today I want to look at another bit of information embedded in the code: assumptions. In particular, the assumptions made by programmers about the state of the program. It might seem like assumptions are hidden inside the mind of the author, but it turns out that they can often be inferred by the actions performed (or not performed) in the code itself.

In this example, found in the squid caching proxy server (in file urn.cc), Sentry saw something interesting is going on:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
urls = urnParseReply(s, urnState->request->method);
for (i = 0; NULL != urls[i].url; i++)
    urlcnt++;

debugs(53, 3, "urnFindMinRtt: Counted " << i << " URLs");

if (urls == NULL) {   /* unkown URN error */
    debugs(52, 3, "urnTranslateDone: unknown URN " << e->url()  );
    err = errorCon(ERR_URN_RESOLVE, HTTP_NOT_FOUND, urnState->request);
    err->url = xstrdup(e->url());
    errorAppendEntry(e, err);
    urnHandleReplyError(urnState, urlres_e);
    return;
}

There are many assumptions implicit in the code above. Here are a couple I want to discuss:

  1. The for loop on line 2 makes the assumption that urnParseReply does not return NULL. This is implied by the fact that the loop condition dereferences the urls variable without checking if it is NULL first.
  2. The if statement on line 7 assumes that urnParseReply may return NULL, because its result is checked for NULL in the if condition.

The attentive reader will notice that these assumptions are in conflict -- they cannot both be true. Either urnParseReply can return NULL, in which case the for loop will cause a crash, or urnParseReply never returns NULL, in which case the entire if statement spanning lines 7 through 14 is unnecessary (i.e. dead) code.

To resolve the conflict, we need to take a look at the urnParseReply function. A quick look informs us that urnParseReply always allocates and returns a list of URLs (even if the list has zero elements, it's never NULL). Seeing this, we now know the second assumption was incorrect, and we can safely identify lines 7-14 as dead code.

(How does this happen? When someone modifies a function that has been around for a long time, he or she may make different assumptions about the code than the original author did, simply because it has been so long since it was first written. This can even happen if the person modifying the code is the original author, months or years later. Since most projects are maintained by many people over many years, looking for conflicting assumptions turns out to be a great way to identify software defects.)

We're starting to see that even a small snippet of code conveys a lot of information (far more than I've discussed here!). Fortunately, some of this information can be deduced automatically with static analysis, bringing situations like the above to your attention, and enabling you to identify and fix the problem.

posted by mike at 11:17AM