Reference Graphs as Tools for Refactoring

By Posted in Nerd Culture, New Relic News, Tech Topics, Top Post 8 November 2012

Cover of Working Effectively with Legacy Code by Michael FeathersSome of the code we work with at New Relic has been around for quite awhile. Unfortunately, not all of it has stayed clean and well factored over time. If you’re not careful, even the cleanest code can succumb to entropy. And without an active effort to counteract this, you can end up with code that seems unmaintainable.

Luckily, there are some tools we can use to help us make clear, organized, testable changes to even the most complicated codebases. One of my favorites is the ‘Reference Graph’, a slightly expanded version of the ‘Feature Sketch’ described in Working Effectively with Legacy Code. The general idea is that you have a set of changes you want to make to your code, so you graph everything that holds any references to the changes you want to make.

Let’s take a look at the following sample code, in which all the classes and even the methods on those classes are a confusing mess of crossed references:

class A
  def report
    B.new.get_all_the_state
  end
end

class B
  def get_all_the_state
    [ munge_state, @state1, D.new.get_state ]
  end

  def munge_state
    get_state
  end

  def get_state
    @state0
  end
end

class C
  def report
    D.new.get_state
  end
end

class D
  def get_state
    @state2
  end
end

module E
  def self.report
    [ A.new.report, C.report ]
  end
end

Let’s say that we’ve decided that we want to get rid of the B#get_all_the_state method and split its responsibilities among the ‘A’ and ‘B’ classes. In this contrived example, it’s easy to prop up a canned solution that we should refactor toward. But in a more complicated project, it can be challenging to know all the ramifications that one simple change can have on your source code. By graphing the reference between the methods and instance variables around the code we want to change, we can get a good idea of the scope of the changes we want to make. A visual representation of what the references between methods and instance variables in your code can give you a good view of where your tightest knots of code are hiding and guide your work to untangle them.

I use Graphviz to create the graphs. It renders images from dot files, which you use to define node, style them and define their associations. For the code above, this would be the dot file. Once you’ve installed Graphviz, you can create the graph below with this command:

dot before.dot -T png -o before.png

Sample Graph #1

All the parts we want to change are in red and all the parts that refer to parts we want to change are in orange. This gives us an idea of the scope of the changes we want, as well as a roadmap to getting those changes done incrementally. We see that we can’t make changes to B#get_state without also making changes to B#munge_state and B#get_all_the_state as well. The tests for these methods will also have to change or be removed entirely. The existing tests for the orange boxes should not have to change, but new ones will likely need to be written to cover the functionality of the old red boxes. Neither code nor test changes for the black boxes should be necessary, since the orange boxes should isolate all the changes within their interfaces and keep them hidden from unrelated code.

As I work my way through the refactor, I go back to the dot file and keep it up to date with my changes. This is a great way to keep track of my progress and it keeps me from getting lost, which is a very real risk in large refactors of complex code. I like to check the Graphviz dot file into source control. That way if I get stuck too badly, I can always go back to my graph and see if it’s still current. It’s also nice to be able to go back and look at your progress. The feedback you get from seeing changes in the graph can help you find areas for further development (This is similar to the feedback loop of TDD.) If the graph starts to get more twisted, you may need to rethink your code change.

Once we’re all done, our graph should look like this:

Sample Graph 2

The newly added green boxes are either new entries or old entries that are completed. At this point, our code should look like this:

class A
  def report
    [ B.new.zeros, ones ]
  end

  def ones
    B.new.get_state
  end

class B
  def zeros
    @state0
  end

  def get_state
    @state1
  end
end

class C
  def report
    @state2
  end
end

module E
  def self.report
    [ A.new.report, C.report ]
  end
end

While contrived examples are nice for demonstrating a concept, they don’t do a good job showing how well this technique works in practice. In my experience, it takes about two to three hours to construct the graph for a non-trivial tract of code. I consider it time well spent and I find keeping it up to date as I work is surprisingly unobtrusive. In fact, I actually look forward to making my incremental changes then seeing the graph clean up and straighten out as I work.

Recently, I overhauled the configuration system in the Ruby agent. Here’s a teaser:

Sample Graph 3

If you’re curious, the dot file for these graphs can be found in the git history for the Ruby agent.  You can browse through the history and recreate the graph at each point.

If you’d like to find out more about this subject, I encourage you to read Michael Feather’s book for more information on this and other techniques for working with code that’s suffered the ravages of time and entropy.

About the author

jon@newrelic.com'

Tell us your thoughts Or Send us an internal high five

Talk to @newrelic