How to Debug

There is a process that I’ve used to debug everything from 10-line toy programs to 100,000-line systems. It works as follows:

  1. Identify that something is wrong, and what is wrong about it.

    Identifying bugs is a complicated and interesting process that includes testing, validation, user studies, and so much more. But I expect that if you came to this page it is because you think your code has a bus so it’s probably already identified.

  2. Understand the design and process of the part of the code that is involved in the buggy behavior. If you can’t explain what steps the code should be going through to create the desired behavior you also can’t fix the bug.

  3. Locate the first mistake the code made. With rare exceptions, this is always done by iterating the following steps:

    1. Pick some point on the code path to the symptom you identified

    2. Describe what the code is supposed to be doing there, often by identifying what values the in-scope variables should have.

    3. Check to see whether it’s doing what was expected at that point.

      For small programs, print statements might be enough to do this. For larger programs, a visual debugger is much more efficient and useful. With increased experience, you may eventually check the code by looking at it without running it, but that’s an unreliable technique at best.

      If it isn’t doing what you expect, double-check the expectation: the program could be wrong, but so could the your expectations.

    4. Based on the result, narrow the region of code that could have led to the bug and repeat. If it’s right there, look closer to the symptom; if it’s wrong there, look earlier to see where it started being wrong.

    In my experience, when students fail to debug their code the most common reason is they didn’t iterate through these four steps enough times to find the first manifestation of the bug. There may be some better bog location system than this loop, but in two decades of debugging my own code and helping students debug theirs, I’ve never found it.

  4. Change the code to remove the bug.

    Sometimes this is a change to the code at the location found in step 3. But there are two common classes of bugs that require changes in other locations.

    The first nonlocal case is the missing component bug, where the location identified in the previous step needs to do something that depends on data not available at that location. Fixing this involves figuring out what data is missing, finding when and where it was available, and designing and implementing a data path to get it to the place where it is needed. Data paths can be as simple as a single variable or as complicated as multiple new data structures and function parameters. In a few cases the changes might be so significant that it’s easier to start over with a better design.

    The second nonlocal case is the aliased meaning bug, often caused by poor variable naming, where the same variable is used to store different content at different points in the program. This case has most of the characteristics of a missing component bug: you’ll need a new data path (often a new variable) to transmit both content simultaneously. It additionally requires revisiting each use of the old overloaded variable and deciding which meaning was intended there.

    When the change is more than a few trivial steps, take it slowly with multiple rounds of intermediate testing. If one part of a multi-part change doesn’t work, it can be hard to tell which part failed unless each was added and tested individually. Additionally, debugging generally involves modifying code that is no longer fresh in your mind, so it will take more thought and time to fix that it did to write the first time, another reason why small, careful steps are the best approach.

Decreasing the rate of bug introduction

Debugging is generally less enjoyable than other parts of software development, and while it is not usually something you can fully avoid there are practices that can significantly reduce the amount of debugging you need to spend.

Design before you code

If you can’t do the task by hand the way the computer should, you can’t tell the computer how to do it.

If you don’t know what data you want to store or how it moved between different parts of the program, you can’t tell the computer how to store it and move it.

If you haven’t drawn a picture or made an outline or written some pseudocode, your job will be much harder, both in creating the code and in knowing where to look when debugging it.

Documentation
Well-documented code generally has fewer bugs and is easier to debug than more cryptic code. Name your variables, functions, and classes with descriptive names that all follow the same naming pattern. Add comments on every function and anywhere else a programmer unfamiliar with your code might reasonably ask what’s this part of your code do?
Build-test-integrate-test

If you need to write a nontrivial bit of logic somewhere in your code,

  1. build it first on its own;
  2. then test it on its own until you know it works by itself;
  3. then integrate it into the rest of the code;
  4. then test the integrated code.

Trying to build and test code in-place invites confusion as to wether it is what you wrote or how you used it that is causing errors.

Test-driven development

Debugging large code is frustrating. To avoid ever debuggin large code, build it in small steps

  1. Start with code that does nothing (and does so correctly).
  2. Pick the smallest part of the task you can.
  3. Write tests to verify your code is not doing that part.
  4. Implement just that part until those tests pass.
  5. Pick a small additional part of the task to add next.
  6. Return to step 3.

The larger your task is, the more important this process for writing it becomes.

Plan to throw one away

One of the most time-honored, well-tested, and revered principles of software development was labeled by Fred Brooks as plan to throw one away. Write your code, then delete it and start over. Your new code will be faster to write, better designed, and pass more tests than the first. It will be faster and easier to re-write it without bugs than to find and fix the bugs in the first version.

This may seem counter-intuitive. How could starting over be easier? Because you learned by trial and error while writing the first version, but on your second version you alreayd know what you should do. No trial and error = no error = better code.

This only works if you continue on the first version until you fully understand the task. If you get stuck with I don’t know how to begin to add feature X then starting over is unlikely to help.