Why Static Analysis?
What static analysis is, how it works, and why it's crucial for building high quality software.
Introduction
Getting software right has proven to be a challenging task, despite decades of research and practice. All software contains defects, even when developed under the most rigorous methods and by the most talented engineers.
To address the issue, a variety of tools and testing practices have evolved. Common practices include coding standards, automated dynamic tests (unit, integration, and system tests), and manual testing by QA teams. In the last decade we've seen the arrival of philosophies like the Spiral model, Agile, and the methods they've produced, such as test-driven development (TDD), scrum, and more.
While the methods have progressed and matured, none of them are immune to human error. Software producers still need help to create programs with as few defects as possible, and this is where static analysis has become a crucial tool in any engineer's arsenal.
How It Works
Static analysis works by examining a program's source code and applying a variety of heuristics to identify potential defects in the program. Unlike dynamic testing methods, static analysis does not involve executing the program. Static analyzers examine the program's source code in place, flagging potential defects such as memory errors, undefined behavior, and more.
For C and C++, static analysis started with simple tools like Lint, which mainly enforced stylistic rules to reduce the probability of inadvertent mistakes. Tools like this are fast, but produce a large number of messages which can be tedious and time-consuming to review.
In the last few years, static analysis has evolved significantly, in a shift toward in-depth, flow-sensitive analysis. In these tools, the analyzer treats the program as a whole, modeling the states of variables and expressions as they propagate through the executable paths in the program. Every path through the program is explored, allowing a very high signal-to-noise ratio in the results.
This generation of tools take more time than previous tools, but the improvement in accuracy means less developer time is wasted on false positives. Instead of producing thousands of warnings, a typical 500,000 line program will produce around 100 results, and 80% of these will represent real defects, many of them severe (crashes and potential exploits).
An Example
The snippet in Figure 1 is from a popular small HTTP server called
lighttpd. It shows an example of a defect found "in the wild" by
Sentry, our static analyzer. Note that some irrelevant code has been
elided with /* ... */ to simplify the example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | if (s->is_ssl) { #ifdef USE_OPENSSL /* ... */ #else free(srv_socket); /* ... */ goto error_free_socket; #endif } /* ... */ error_free_socket: /* ... */ free(srv_socket); return -1; |
The code in Figure 1 exists to handle SSL sockets, and, in the case
where SSL support is not compiled in, free up some resources and return
an error code. On line 5, we see that srv_socket is freed,
and then execution jumps to the error_free_socket label.
The defect occurs on line 14, where srv_socket is freed
again, which will normally result in a program crash.
In all other paths that result in goto error_free_socket,
there is no problem (srv_socket has not been freed yet).
It is only on this one particular path that the defect occurs. This is
why a flow-sensitive analysis is so crucial; defects of this kind can
only be identified this way.
Static vs Dynamic
Advantages
Static code analysis has several advantages when compared to dynamic testing. First, there is no need to construct test case scenarios, harnesses and supporting code. Since the analysis engine traverses every path through the program automatically, it doesn't require input or guidance from the engineer. In short, it's extremely easy to perform.
A problem frequently encountered when dynamically testing a program is that not all paths are easy to explore at run time. Certain error conditions may manifest only when a memory allocation fails at just the right time, or when two threads compete for a resource in the right way. Testing the error-handling paths (as opposed to the "happy paths") is extremely difficult. However, static analyzers consider all paths equally, so they will often uncover the defects hiding in those infrequently taken paths.
Finally, a significant advantage of static analysis is the cost-benefit ratio it offers. Static analysis requires minimal effort and engineering time, and allows teams to shake defects out of their software much earlier than other testing approaches. Because of this, static analysis saves these organizations the significant expenses of dealing with these defects later, when they are more expensive to fix.
Limitations
Of course, static analysis is not a silver bullet. Static analyzers cannot know exactly which values are possible at run-time (since the values may arrive from user input, network I/O, etc.), and so analyzers must make assumptions. In the interest of being conservative (not producing a high level of noise), these assumptions can mean that some defects are missed by static analysis.
Although it finds a large number of programming errors early in the development cycle, static analysis is not a replacement for dynamic testing. Dynamic testing can find defects that are impossible to detect statically, and is still a critical part of the development process.
Conclusion
In recent years, static code analysis has become a vital link in the software development chain. Organizations that produce safety-critical and security-critical applications cannot afford to overlook the power of these tools. When producing solid, reliable software is a business's lifeblood, using static analysis is a powerful way to keep their customers safe and their reputation secure.
Want to know more?
Contact us to schedule a web demo of Vigilant Sentry, our static analysis application and see how we can improve your code safety and security, or request a free trial of Sentry and analyze your code right away!