Years ago, I found myself staring at about 10 lines of source code. This code had me stymied, and I could feel the onset of a headache as I read and re-read it. What did this brain-bending code do, you might wonder? It sorted elements in an array.
Now you probably assume that I was learning to code at the time. Interestingly, no. I had worked gainfully as a software developer for years and was also completing a master’s degree in computer science. In fact, the pursuit of this degree had brought me to this moment of frustration, rubbing my temples and staring tiredly at a simple bubble sort in C.
Neither inexperience nor the difficulty of the code had brought me to that point. Rather, I struggled to formally prove the correctness of this tiny program, in the mathematical sense. I had enrolled in a class called “Formal Methods of Software Development” that taught the math underpinning our lives as programmers.
This innocuous, simple bubble sort had seven execution paths. Here was number five, from a piece of homework I kept in my digital files.
Hopefully I now seem less like an incompetent programmer and more like a student grappling with some weighty concepts. But why should a simple bubble sort prove so difficult? Well, the short answer is that actually proving things about programs with any complexity is really, surprisingly hard. The longer answer lies in the history of static code analysis, so let’s take a look at that.
Code Analysis Before Code
Consider something strange. The earliest computers weren’t actually computers, at least not in the way we think of them. As described in this story, they were humans. Before transistors and chips, human “computers” handled computations based on large amounts of data.
Similarly to their silicon counterparts, these human computers followed algorithmic sets of instructions in order to perform their tasks. Humans could thus reason about algorithms before the existence of machine computers. So they could also reason conceptually about source code, after a fashion.
In fact, Alan Turing did just that in 1936, when he described the halting problem. He proved me entirely justified in my struggles in CS 477 — he proved that upfront reasoning about source code is really, really hard.
Okay, he actually proved something a bit more precise than that. Turing talked about whether or not a given computer program would halt for a given set of inputs. In other words, would the program eventually stop, or would it get hung up in an endless loop, as happens whenever you forget to increment a variable in a while loop? Turing showed that, while you can prove whether or not a given program would halt, you cannot possibly come up with a general algorithm that you can apply to all programs.
Digest that for a moment. We cannot even write something that can tell whether or not an arbitrary program will hang, let alone whether or not it has critical defects. Before humans invented the transistor, we invented static code analysis.
The First Commercial Static Analysis
Of course, we didn’t have to wait much longer before machines replaced humans as computers. At first these machines simply performed computation. But then they also began to store information and generally resemble what we consider computers today. Their applications migrated beyond academia and military research and into the commercial world. And understandably so, given their powerful applications.
The idea of formal correctness proofs, however, did stay firmly in the academic sphere. How could it not? Turing demonstrated the immensity of this problem space back when humans executed algorithms and did the computing. Proving that mainframe source code behaved correctly wouldn’t even have risen to the level of pipe dream.
So the next incarnation of static analysis looked substantially different. It came about in the late 70s. And in another nod to things being much older than you probably thought, its author called it Lint.
During that time, many people used the C programming language. In that language, you have immense power to shoot yourself in the foot, metaphorically speaking. Misplace a semicolon or accidentally make an implicit cast and your code will still happily compile. It will just do things that seem strange and crazy to you until you track down the source of the defect. Lint addressed this by flagging common potential mistakes in a way that resembles today’s compiler warnings.
Going Deeper with Second Generation Static Analysis
While the name has persisted and proliferated to all sorts of languages, Lint struggled to gain significant adoption for the next couple of decades. It had excellent underlying design, but like most similar tools, it created a lot of noise.
In fact, the low signal-to-noise ratio proved the main barrier. It did catch some mistakes, but it also produced many false positives. This in turn forced people to spend time manually reviewing its warnings — an outcome not any better than just manually inspecting the code. So the early promise of static analysis lay dormant.
It came out of hiding, however, in the late 90s and early 2000s. As processing power grew substantially, people and organizations pursued what Coverity calls second generation static analysis tools. This meant a shift from spot-analysis of source files to significantly more complex “path analysis” that contained deeper reasoning about runtime behavior. This generation of tools also introduced improvements like targeted assessment (e.g., security flaws) and the separation of the knowledge base of issues from the analysis engine itself.
But even with significant improvements came familiar struggles. Tools that scaled well had similar noise problems to Lint. And tools that eliminated the noise with more sophisticated analysis tended to scale poorly.
Third Generation Static Analysis and Code as Data
As the 2000s wore on, you started to see a third generation of static analysis tool, as described here by OWASP. These tools started to cover more and more programming languages and to operate in more semantically sophisticated ways.
The introduction of abstract syntax tree modeling gave these tools a significant boost. This approach creates an abstract structure to represent the logic of the program in question. Doing this sheds non-logic language details and captures the essence of the codebases’s logic. It makes for an extremely efficient way to process the codebase’s behavior.
But it does something else important as well. It introduces the idea of treating the code itself as data. As we moved into the second decade of the 21st century, this became a powerful norm. It allowed a proliferation of static analysis approaches. These included standard defect detections but also allowed IDEs to incorporate significant automated refactorings and even gave rise to interesting ideas like Microsoft’s Rosyln “Compiler as a Service.”
The Future of Static Code Analysis
For the entirety of my professional career and well beyond that, people have viewed static analysis as the Cold Fusion of software. Imagine it! We’ll write the ultimate killer application — one that analyzes other applications and finds all of their defects. We’ll automate correctness!
But inevitably it proves easier said than done. And I mean it proves way easier said than done. Trying to prove a 10 line bubble sort correct gave me as much of a taste of that as I could ever want. Short of combinatorial jumps in processing and reasoning power, the satisfying absolute of mathematical proof fails at about the 10th line of code you write in your green field project. But we’ve learned from that, regrouped, and taken a new approach.
I don’t make my living as a fortune teller, so I try to avoid gratuitous speculation. But I’ve worked with static code analysis for a long time, and I have high hopes for it. And, specifics notwithstanding, I can tell you that expanding on the “code as data” theme holds the key to the future of static analysis.