If you're a professional software engineer, it pays to think about the process of reading code. You're going to be doing it for the rest of your career, you might as well get good at it.
But working on multimillion line legacy code bases is hard. Code rot, duplicated functionality, general cruft, obsolete packages, dead code, "prototype" code that became production code--the fun just never stops.
I've read a lot of blog entries advising "you should learn to read code", but there's not a lot of advice as to how to read code. Everybody has their own approach, but I wanted to write down some techniques that have worked for me. Remember, the end goal is to get up-to-speed on a new code base quickly. My memory isn't what it once was, so when I start to get lost reading things, I take a lot of notes. Here are some of the ways I do that:
As you start to tackle a large code base, go through important directories, and for each subdirectory, annotate what it's for. You will be completely clueless at first, but do a bit of code research, and just give it your best guess. I tend to use question marks to indicate where I'm unsure, and come back and refine my answers over time.
For example, for the top level linux directories, my notes would look something like this:
Documentation ABI the ABI between the Linux kernel and userspace PCI PCI bus RCU Read/Copy/Update ... Licenses arch processor-specific code blk block device code? certs TPM certificates??? I dunno. ...
I take these notes just to confirm my own understanding. Pro-tip: some day, you might want to refine this guide and internally publish it for others.
I find it helpful to create a one-line summary of key functions and data structures. You're not trying to document everything, and your notes should be terse and to the point. If I'm lucky, I'll be able to pull this from comments. But doing this forces me to read and understand each function enough to have a cogent entry.
This is kind of a bad example because the functions are fairly self-describing, but here's what I'd do, using a random file in the linux source tree:
btintel_check_btaddr // check device for corrupt/buggy address btintel_enter_mfg // enter manufacturing mode btintel_exit_mfg // exit manufacturing mode, with/without reset btintel_set_bdaddr // set bluetooth device address btintel_load_ddc_config // load intel "Device Data Control?" parameters
This list can help quickly jog your mind about what each function does, especially when function names are cryptic and unfamiliar.
For example: I have worked on many hardware platforms that perform read/write IO calls. And in each case, I've written a document titled "The Life Of A Hardware Read", that starts at the uppermost user API, and recursively describes each step until the read reaches hardware, and then unwinds the stack showing how the data gets back to the user.
This can be a very large task--depending on the system, it could take days to do the topic justice. Usually my first pass is pretty sloppy. But later, I've turned these documents into full presentations to familiarize new hires with how things work.
Look for the thread_create function, and document each thread that is started, and its responsibilities.
If you're lucky each thread has a clearly defined responsibility. If you're not, well, better to know that you've got a big mess on your hands.
Or it may be just going through and creating pseudocode for the functions. The idea is to pull out the important bits and remove the boilerplate code. When you forget what "redo_foobar_froz()" does, your pseudocode should tell you at a glance.
Just because you're reading code doesn't mean you can't also run the code. So go ahead, run it under the debugger to be able to inspect runtime values and control flow. Add printf statements to elucidate the tricky bits. Delete a function body and see what unit tests fail, and how.
If you are trying to read a state machine in code, create a very simple graphviz document. This can be done so quickly that it's almost always worth doing.
If you're looking at a feature, sometimes it's worth looking at the particular commit so that you see all the related code at once. Use your source control to take a look. Of course, if the code was checked in haphazardly, or the function has been through a ton of bug fixes, this may be impossible. But it's worth checking out.
This list is by no means exhaustive, and I'll keep expanding it as I come up with new tricks. In the meantime, connect with me on twitter to let me know what tricks I missed.