Reference Graphs as Tools for Refactoring
Some of the code we work with at New Relic has been around for quite awhile. Unfortunately, not all of it has stayed clean and well factored over time. If you’re not careful, even the cleanest code can succumb to entropy. And without an active effort to counteract this, you can end up with code that seems unmaintainable.
Luckily, there are some tools we can use to help us make clear, organized, testable changes to even the most complicated codebases. One of my favorites is the ‘Reference Graph’, a slightly expanded version of the ‘Feature Sketch’ described in Working Effectively with Legacy Code. The general idea is that you have a set of changes you want to make to your code, so you graph everything that holds any references to the changes you want to make.
Let’s take a look at the following sample code, in which all the classes and even the methods on those classes are a confusing mess of crossed references:
class A
def report
B.new.get_all_the_state
end
end
class B
def get_all_the_state
[ munge_state, @state1, D.new.get_state ]
end
def munge_state
get_state
end
def get_state
@state0
end
end
class C
def report
D.new.get_state
end
end
class D
def get_state
@state2
end
end
module E
def self.report
[ A.new.report, C.report ]
end
end
Let’s say that we’ve decided that we want to get rid of the B#get_all_the_state method and split its responsibilities among the ‘A’ and ‘B’ classes. In this contrived example, it’s easy to prop up a canned solution that we should refactor toward. But in a more complicated project, it can be challenging to know all the ramifications that one simple change can have on your source code. By graphing the reference between the methods and instance variables around the code we want to change, we can get a good idea of the scope of the changes we want to make. A visual representation of what the references between methods and instance variables in your code can give you a good view of where your tightest knots of code are hiding and guide your work to untangle them.
I use Graphviz to create the graphs. It renders images from dot files, which you use to define node, style them and define their associations. For the code above, this would be the dot file. Once you’ve installed Graphviz, you can create the graph below with this command:
dot before.dot -T png -o before.png
All the parts we want to change are in red and all the parts that refer to parts we want to change are in orange. This gives us an idea of the scope of the changes we want, as well as a roadmap to getting those changes done incrementally. We see that we can’t make changes to B#get_state without also making changes to B#munge_state and B#get_all_the_state as well. The tests for these methods will also have to change or be removed entirely. The existing tests for the orange boxes should not have to change, but new ones will likely need to be written to cover the functionality of the old red boxes. Neither code nor test changes for the black boxes should be necessary, since the orange boxes should isolate all the changes within their interfaces and keep them hidden from unrelated code.
As I work my way through the refactor, I go back to the dot file and keep it up to date with my changes. This is a great way to keep track of my progress and it keeps me from getting lost, which is a very real risk in large refactors of complex code. I like to check the Graphviz dot file into source control. That way if I get stuck too badly, I can always go back to my graph and see if it’s still current. It’s also nice to be able to go back and look at your progress. The feedback you get from seeing changes in the graph can help you find areas for further development (This is similar to the feedback loop of TDD.) If the graph starts to get more twisted, you may need to rethink your code change.
Once we’re all done, our graph should look like this:
The newly added green boxes are either new entries or old entries that are completed. At this point, our code should look like this:
class A
def report
[ B.new.zeros, ones ]
end
def ones
B.new.get_state
end
class B
def zeros
@state0
end
def get_state
@state1
end
end
class C
def report
@state2
end
end
module E
def self.report
[ A.new.report, C.report ]
end
end
While contrived examples are nice for demonstrating a concept, they don’t do a good job showing how well this technique works in practice. In my experience, it takes about two to three hours to construct the graph for a non-trivial tract of code. I consider it time well spent and I find keeping it up to date as I work is surprisingly unobtrusive. In fact, I actually look forward to making my incremental changes then seeing the graph clean up and straighten out as I work.
Recently, I overhauled the configuration system in the Ruby agent. Here’s a teaser:
If you’re curious, the dot file for these graphs can be found in the git history for the Ruby agent. You can browse through the history and recreate the graph at each point.
If you’d like to find out more about this subject, I encourage you to read Michael Feather’s book for more information on this and other techniques for working with code that’s suffered the ravages of time and entropy.
Sign up here. It's free, so why not?



It’s either I missed it in the article or it isn’t mentioned, but what tool do you use to generate the dot files in the first place?
Posted: 9 November 2012 at 12:28 pm by Arik Fraimovich
Yeah, sorry I didn’t make is clear in the post – the dot file is generated by hand. This limits the size of the problem you can apply the technique to, but I found that even a few hundred classes and methods can be graphed in this way without too much effort. Your mileage may vary, of course.
Posted: 9 November 2012 at 2:07 pm by Jon Guymon
Hi,
In the refactored code you are using B.new.ones method, but have not defined it.
Jon Guymon Reply:
November 15th, 2012 at 11:29 am
Hi, you’re absolutely right. Mea Culpa. I’ll see about getting that fixed. Thanks!
Posted: 15 November 2012 at 10:31 am by Rohit Sharma
I’ve been tasked to maintain a badly mangled legacy application and this technique will be really helpful in working through some of the challenges. And, as you said, I suspect that seeing the graph clean up will give me a great deal of pleasure.
My only question is, how do you find all the places where a particular piece of code is referenced? This application is a bit of a spaghetti soup and its hard to even work out the locations of a lot of the references.
Jon Guymon Reply:
November 15th, 2012 at 2:47 pm
Excellent question. It helps to be able to lean on your editor to help track down all the relevant references. I use emacs with ctags to traverse the code, I know vi has similar support and more heavy weight IDEs like SublimeText and RubyMine should have similar functionality as well.
Outside of editor support, liberal use of “git grep” or “ack” can answer a lot of the same questions, and it’s often worth while to use both editor and console tools to make sure you caught everything.
Maybe some day we’ll have a nice static analysis tool that can do all this automatically, but for now I find the manual labor involved in setting up the initial graph to be well worth the time.
Richard Reply:
November 15th, 2012 at 4:44 pm
I’ve been using tools like that to try and help me track the references but I often find that I get back many spurious results from other unrelated code (especially when the naming is not very unique eg. width) because Ruby is dynamically typed and the tools simply can’t work out whether the reference belongs to this class or another.
Jon Guymon Reply:
November 16th, 2012 at 10:42 am
Yeah, there are trade-offs on both sides of the great static vs dynamic debate. For all the benefits we get from the dynamism of Ruby, one of the things we give up is the ability to concretely trace references.
Sorry I don’t have any better suggestions, if you come across something that works well for you, please let me know!
(email: jon at this domain)
Posted: 15 November 2012 at 2:14 pm by Richard