Vigilant Software Blog

Defect Spotlight: Useless checks

Are you interested in code quality? This post is part of a series called Defect Spotlight. In it, we'll be bringing you instructive examples of real-world defects found by static analysis. Learn from the mistakes shown here!

One of the advantages of a statically-compiled language like C or C++ is that a good deal of information is available at compile time. While robust software can be written in more dynamic languages, it requires a certain level of attention to detail and a very thorough suite of dynamic tests.

This extra compile-time information is often a blessing for catching bugs early. First of all, your compiler catches a number of stupid mistakes before you even get a chance to run any tests, from misspelled names to using incorrect data types.

We added a couple checks recently for simple mistakes like checking an unsigned` value for being less than zero, which is clearly impossible. You'd never do that, right? Of course, experience has taught us that there is no mistake that's too simple to make. Even the best engineers make simple mistakes on a fairly regular basis.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
len = g_socket_receive_message (nl->priv->sock, NULL, &iv, 1,
                                NULL, NULL, &flags, NULL, &error);
if (len < 0)
  {
    g_warning ("Error on netlink socket: %s", error->message);
    g_error_free (error);
    if (nl->priv->dump_networks)
      finish_dump (nl);
    return FALSE;
  }

With the new scanner, Sentry found this perfectly innocent-looking code in GLib. If g_socket_receive_message returns a value less than zero, the error-handling logic will clean up and return, right? Unforunately, this code is unreachable because len was declared to be an unsigned type (gsize), and can never have a value less than zero. The check is useless because the condition is provably false at compile time. (Sentry also checks for checks that are useless because they will always be true — the else path could never be reached.)

When this code encountered a network I/O problem, it most likely would have caused undefined behavior in the program. The structure would not have been properly initialized and the code would've continued to execute as if it had been. Not only that, but it's the insidious kind of dead (unreachable) code that appears to be reachable. Debugging the problem could have taken hours trying to determine why the structure is (only sometimes) full of garbage. (We reported this bug to the GLib developers, and it was quickly fixed.)

It's a good reminder that automated checks for even very simple mistakes can produce valuable feedback. Taking advantage of the additional type information we have at compile time can save programmer hours and prevent entire classes of defects from impacting you in production.

posted by mike at 11:43AM

Defect Spotlight: Hiding in plain sight

Are you interested in code quality? This post is part of a series called Defect Spotlight. In it, we'll be bringing you instructive examples of real-world defects found by static analysis. Learn from the mistakes shown here!

Hiding in plain sight

Most of what we do revolves around finding software defects. They're everywhere, so it's not usually hard to find them. One thing we have learned is that Sentry spots not only the hard-to-find defects, but also the ones hiding in plain sight.

We've been analyzing many open source C and C++ programs on a nightly basis. As we continue growing the list of projects we inspect, we hope to help improve the quality and security of open source software.

When we add a new project, we tend to ignore all of the existing defects Sentry reports. This isn't to say that Sentry didn't find a defect worth reporting to the project, it most likely did. The issue is, developers care much less about old defects than they do about new ones. Especially brand new ones, like the ones committed last night.

In fact, in our experience, this is so important that reporting bugs immediately is the key factor in motivating developers to fix them. When someone is actively working on the code in question, it's generally easier (and safer) to fix a bug than it would be several months later.

Recently, we added Audacity to the list of C++ projects that we analyze nightly. Sentry found 305 potential defects. I casually reviewed some of the memory leaks reported by Sentry and found one that was pretty blatant. I decided in this instance to report this old issue.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
double AudacityProject::NearestZeroCrossing(double t0)
{
   int windowSize = (int)(GetRate() / 100);
   /* [Variable dist allocated but never returned or deleted.] */
   float *dist = new float[windowSize];
   int i, j;

   for(i=0; i<windowSize; i++)
      dist[i] = 0.0;

   TrackListIterator iter(mTracks);
   Track *track = iter.First();
   while (track) {
      /* [Code that doesn't touch dist snipped for brevity] */
      for(i=0; i<windowSize; i++) {
         if (windowSize != oneWindowSize)
            j = i * (oneWindowSize-1) / (windowSize-1);
         else
            j = i;

         dist[i] += oneDist[j];
      }

      track = iter.Next();
   }

   int argmin = windowSize/2; // Start at default pos in center
   float min = dist[argmin];
   for(i=0; i<windowSize; i++) {
      if (dist[i] < min) {
         argmin = i;
         min = dist[i];
      }
   }

   return t0 + (argmin - windowSize/2)/GetRate();
}

The memory leak that Sentry found was committed here. Take a look at the function NearestZeroCrossing above. It allocates the variable dist with the new[] operator, but never returns or deletes it, so the associated memory will always be leaked. In this case, the Audacity team was happy to accept a patch to fix it.

The interesting thing to note here is not that Sentry found a memory leak, it's that we found and fixed a memory leak that was eight years old.

Software does not fix itself. It takes good programmers to improve software. Even then, software defects are not always easy to spot, especially when the program appears to run well. Unit, integration and system testing are great ways to shake out defects and ensure software behaves correctly. Static analysis makes a great addition, helping you spot those hard to find bugs that don't occur on the happy path. Or, as in this example, it can find those eight-year-old bugs hiding in plain sight.

posted by chris at 10:12AM

Defect Spotlight: Conflicting assumptions

Are you interested in code quality? This is part of a series of blogs called Defect Spotlight. In it, we'll be bringing you a new example of a real-world defect found by static analysis. Learn from the mistakes shown here!

Looking at source code can be pretty fascinating: there is a lot of information packed into a very small space. This information comes in different flavors and is woven in layers throughout the program.

In the strictest sense, source code is a sequence of constructs that the compiler uses to generate executable code. Initially, it might seem like this is the only layer there is (especially if there are no comments). But since we're not the compiler, we're free to look at source code from other angles. For example, look at the names of objects and functions in the program -- they convey semantic information from the author to the reader. This information helps humans understand the meaning of code, even if those names are arbitrary and ultimately meaningless to the compiler.

But wait, there's more! It turns out there is a lot of information in these layers. The call graph of a program describes the structure of the program at a high level. Comments explain the rationale for particular choices in the code, or tell us how to interact with it. The formatting and modularity of the code can tell you about the level of experience of the author. The complexity of a function can tell you about the likelihood of problems therein. The list goes on and on.

Today I want to look at another bit of information embedded in the code: assumptions. In particular, the assumptions made by programmers about the state of the program. It might seem like assumptions are hidden inside the mind of the author, but it turns out that they can often be inferred by the actions performed (or not performed) in the code itself.

In this example, found in the squid caching proxy server (in file urn.cc), Sentry saw something interesting is going on:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
urls = urnParseReply(s, urnState->request->method);
for (i = 0; NULL != urls[i].url; i++)
    urlcnt++;

debugs(53, 3, "urnFindMinRtt: Counted " << i << " URLs");

if (urls == NULL) {   /* unkown URN error */
    debugs(52, 3, "urnTranslateDone: unknown URN " << e->url()  );
    err = errorCon(ERR_URN_RESOLVE, HTTP_NOT_FOUND, urnState->request);
    err->url = xstrdup(e->url());
    errorAppendEntry(e, err);
    urnHandleReplyError(urnState, urlres_e);
    return;
}

There are many assumptions implicit in the code above. Here are a couple I want to discuss:

  1. The for loop on line 2 makes the assumption that urnParseReply does not return NULL. This is implied by the fact that the loop condition dereferences the urls variable without checking if it is NULL first.
  2. The if statement on line 7 assumes that urnParseReply may return NULL, because its result is checked for NULL in the if condition.

The attentive reader will notice that these assumptions are in conflict -- they cannot both be true. Either urnParseReply can return NULL, in which case the for loop will cause a crash, or urnParseReply never returns NULL, in which case the entire if statement spanning lines 7 through 14 is unnecessary (i.e. dead) code.

To resolve the conflict, we need to take a look at the urnParseReply function. A quick look informs us that urnParseReply always allocates and returns a list of URLs (even if the list has zero elements, it's never NULL). Seeing this, we now know the second assumption was incorrect, and we can safely identify lines 7-14 as dead code.

(How does this happen? When someone modifies a function that has been around for a long time, he or she may make different assumptions about the code than the original author did, simply because it has been so long since it was first written. This can even happen if the person modifying the code is the original author, months or years later. Since most projects are maintained by many people over many years, looking for conflicting assumptions turns out to be a great way to identify software defects.)

We're starting to see that even a small snippet of code conveys a lot of information (far more than I've discussed here!). Fortunately, some of this information can be deduced automatically with static analysis, bringing situations like the above to your attention, and enabling you to identify and fix the problem.

posted by mike at 11:17AM

Defect Spotlight: Error handling

Are you interested in code quality? We're starting a new series of blogs called Defect Spotlight. In it, we'll be bringing you a new example of a real-world defect found by static analysis. Learn from the mistakes shown here!

I think it's safe to say we love software testing. We test our projects with a variety of approaches, both static and dynamic. We use a combination of testing methods because they're not all created equal; they each have their strengths and weaknesses. As a software engineer or tester, it pays to be aware of these. One well-known problem with dynamic testing is the difficulty of reaching every path through a program. Often, these are the error handling paths, or otherwise infrequently visited paths in your code.

Since most programs work reasonably well under normal conditions, static analysis usually reports problems in the corner cases, error-handling paths, and other areas that haven't been adequately tested through other approaches.

This week's example is from git, the popular distributed revision control system. One thing to note in the snippet below is that the error() function does not call exit(3) or similar, so it is not an exit point for the function. Given that hint, see if you recognize the error in this example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
f = fopen(file, "w");
if (!f)
    error("Unable to open marks file %s for writing.", file);

for (i = 0; i < idnums.size; i++) {
    if (deco->base && deco->base->type == 1) {
        mark = ptr_to_mark(deco->decoration);
        if (fprintf(f, ":%"PRIu32" %s\n", mark,
                sha1_to_hex(deco->base->sha1)) < 0) {
            e = 1;
            break;
        }
    }
    deco++;
}

e |= ferror(f);
e |= fclose(f);
if (e)
    error("Unable to write marks file %s.", file);

This snippet is from git's export_marks function. The file handle f is opened on line 1 and tested for NULL, and in the case that it is NULL, an error message is printed to the console. However, the function does not exit at this point, it keeps running. Depending on the path taken, control will reach either the fprintf on line 8 or the ferror call on line 17 using the invalid file handle f, causing a crash.

In this case, the crash is easy to reproduce: attempt to export marks into a directory where you don't have write permission:

[~/some_git_repos]$  git fast-export --export-marks=/foo
error: Unable to open marks file /foo for writing
Segmentation fault (core dumped)

This isn't unique to git; these types of untested error-handling paths exist in virtually every program we've analyzed. It just shows that there's a lot to be gained from having a well-rounded approach to testing, using all the tools that are available.

posted by mike at 2:54PM

Defect Spotlight: The Root Cause

Are you interested in code quality? We're starting a new series of blogs called Defect Spotlight. In it, we'll be bringing you a new example of a real-world defect found by static analysis. Learn from the mistakes shown here!

When your organization adopts a static analysis tool, your first instinct is to ask your developers to go find and fix as many defects as possible. This can be a daunting task, considering that on a large code base, hundreds or even thousands of defects may be produced. Once the developers get warmed up, it becomes natural for them to want to fix the bugs as fast as possible. Get that defect count down and make your boss happy, right?

There's one potential pitfall to watch out for during this phase, and that is introducing subtle new problems while fixing the defects reported by the analysis. When moving quickly and fixing bugs, it's easy to make a change that superficially fixes the defect without stopping and asking why the defect exists.

I recently examined a check after dereference defect reported by Sentry in GDB (The GNU Debugger). A check after dereference means the author of the code dereferenced a pointer, and then checked if the pointer could be NULL. This means the developer(s) of the code probably had inconsistent assumptions:

  • The pointer cannot be NULL (safe to dereference)
  • The pointer might be NULL (requires a check)

This code should be reviewed to determine which assumption is correct, and, when possible, cleaned up to be consistent. In GDB, the defect is in the file stabsread.c, function scan_file_globals, and looks something like this,

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
#define SYMBOL_LINKAGE_NAME(symbol)     (symbol)->ginfo.name
#define SYMBOL_VALUE_CHAIN(symbol)      (symbol)->ginfo.value.chain

if (strcmp (SYMBOL_LINKAGE_NAME (msymbol), 
                 SYMBOL_LINKAGE_NAME (sym)) == 0)
{
  /* Splice this symbol out of the hash chain and
     assign the value we have to it. */
  if (prev)
    {
      SYMBOL_VALUE_CHAIN (prev) = SYMBOL_VALUE_CHAIN (sym);
    }
  else
    {
      global_sym_chain[hash] = SYMBOL_VALUE_CHAIN (sym);
    }
  if (sym)
    {
      /* Do something (also dereferences sym) */
    }
  /* ... */
}

On line 5, 11, 15, sym is dereferenced. On line 17, sym is compared to NULL. In this scenario, one of two things is true: sym can never be NULL and the check is superfluous, or sym can be NULL and GDB has an undiscovered crash lurking inside of it.

As a developer reviews this defect, it's tempting to quickly scan the code and decide one that one of the two cases above is true, and make the change. That is, either wrap the the entire chunk of code in an if (sym) condition, or remove the inner 'if (sym)' assuming it is never NULL. Both of those approaches would silence the analyzer in the future, but neither of them is the correct thing to do blindly. The ideal solution is to start tracking down how and why this code looks the way it does.

Fortunately for us, we can view GDB's revision control logs through CVS (wow, people still use CVS?). In the latest version of stabsread.c, CVS version 1.128, in function scan_file_globals, at line 4622, the defect still exists. I did a binary search of the revisions to find that the defect was introduced in commit 1.66 in November, 2003. The relevant change is at the bottom of the diff.

Well, now we are in a little bit of trouble. We've spent all this time trying to understand this particular defect. We've figured out who introduced the defect and when. The problem is, it was seven years ago! You probably don't remember what you ate for lunch a week ago. Now, go ask your buddy Todd (if he still works with you) why he made this change seven years ago.

In the end, what would have really helped here is automatically running the static analysis tool nightly, catching the bug as soon as it was committed. This way, the developer reviewing the static analysis defect could ask the developer that made the change while it's still fresh in his mind. Determining the correct fix at this time is usually completely straightforward.

In this example, without further investigation and/or assistance from the original developer, I would be hesitant to commit any kind of fix. It is probably safe to just remove the check and move on, but it's not 100% clear to me. In the interest of being conservative, I'd probably just as soon leave this seven year old code alone. There is a lot of value in removing an unnecessary check during development time -- it simplifies the code, making it easier to understand and maintain. In this case, it's not a strong enough reason to justify modifying something that has been working for years. After all, every time you change the code, you risk introducing a new bug in the process.

posted by chris at 11:15AM