On returning from outer space

Over the years of my illustrious career, I’ve left several codebases a tattered wreckage. Not intentionally, mind you. Exuberance, eagerness, and a desire to learn new techniques on a slow day all contributed. Half finished implementations of different development techniques littered the landscape. The road to production is paved with good intentions, and a deadline. The code itself worked, shipped, improved but weird vestigial bits floated around. As someone who has fixed as many problems as they have created, you too can larn from my mistaks.

When one encounters and has to regularly work in a codebase that is creaking under the load of it’s implementation details, there are generally three options:

Since everyone is now an expert in dependency injection, we are going to focus on refactoring with an emphasis on dependency injection. The main reason refactoring is worth the attention is it can be done in a minimally disruptive manner. The other reason is that, while lying to yourself about the quality of your code and life is bad, rewriting a codebase is always worse.

It’s worth calling out before diving too deep that just because code is older it’s not necessarily bad. Sometimes the needs of the system change, or the time constraints of the past are not immediately obvious. Be careful of bulldozing through a system that you may have inherited. If you have access to previous authors, they are definitely an invaluable resource on potential pitfalls and any previous efforts towards code cleanup.

But first: tests!

I’m not sure I can put it any more succinctly than Hamlet D’Arcy did, but for the sake of completeness, do not start a refactor without adequate test code coverage. If the codebase you are attempting to clean up does not have reasonable¹ test coverage, this would be a good time to add some. In short: No tests? No refactoring!

Gentle guidelines and exceptional non-rules

Now that we have decided to make the world marginally better and have a safety net of tests to protect us from ourselves, let’s get down to a set of rules that invariably will have exceptions all the time.

Always have a reason

Refactoring for the sole reason of putting your own twist on the codebase is always a bad idea. With each PR generated as part of a refactoring effort, there should be a clear reason behind why a change is being applied. Formatting, linting, opinions, and epic power struggles are not forms of helpful refactoring.

Don’t get attached

Second only to opinionated blog posts, refactoring is probably the lowest positive value endeavor you could provide as a developer. As such, expect your pull requests to be ignored and stomped on by higher priority pull requests. Nothing done as part of a refactor effort should put delivering bug fixes or product features on hold. If the master branch diverges too far from a refactoring PR, throw it away, as the time wasted in merge conflict resolution is rarely worth it.

Progress by 1000 papercuts

Building on the previous two concepts, any attempt at refactoring should result in as small of a pull request as possible. When restructuring class hierarchies, dependencies, and constructor arguments the impact on a codebase ripples very wide, very quickly. By making a concerted effort to change as little as possible in a single pull request, it makes reviews easier and faster. In addition, minimizing the change set reduces the likelihood of getting into a situation where some random class no longer compiles and it becomes impossible to tell which part broke

If you find yourself down a rabbit hole and things are falling apart all around you, simply delete the branch (or start a fresh one) and re-apply the lowest impact changes.

Where do I start?

Just like the guidelines, this is very subjective. A lot of the time there are reasons that code was written a specific way. With that said, as you ease into this, ensure you get eyes on any code you see that feels strange. If you find something was done for performance reasons or something else, your refactoring effort could be just documenting the reasoning for the next reader.

Protect the Object Graph

As you migrate code from a non-DI style to a DI style, you should be looking for places where the object graph is broken. As you will recall, any time an injectable instantiates an internal class as a newable, the object graph breaks. If you notice a singleton being passed around directly, or another class passed in solely to be used to instantiate a child class these are good indicators that the object graph was broken too early.

public class SuperVillainLair {
  private final NuclearReactor reactor;

  public SuperVillainLair(AtomicClock atomicClock) {
    // The lair does not need a clock, this is a sign the clock should have
    // been injected
    this.reactor = new NuclearReactor(atomicClock);
  }
}

The best approach to this is to trace the call references backwards to find a single point where the object graph first broke:

public class SuperVillainRealtyGroup {
  private final List<SuperVillainLair> lairs;

  public SuperVillainRealtyGroup(int lairCount, AtomicClock atomicClock) {
    List<SuperVillainLair> lairs = new ArrayList<>();

    for (int i=0; i < lairCount; i++) {
      lairs.add(new SuperVillainLair(atomicClock));
    }
  }
}

Phase 1:

In the first pass, we move construction of the SuperVillainRealtyGroup into a Factory to allow it to be provided automatically by the injection framework.

public class SuperVillainRealtyGroup {
  private final List<SuperVillainLair> lairs;

  public static class Factory {
    private final AtomicClock atomicClock;

    @Inject
    public Factory(AtomicClock atomicClock) {
      this.atomicClock = atomicClock;
    }

    public SuperVillainRealtyGroup create(int lairCount) {
      // Note: the object graph is still broken here, but this
      // gives us an controlled point to re-attach later
      return new SuperVillainRealtyGroup(lairCount, atomicClock);
    }
  }

  private SuperVillainRealtyGroup(int lairCount, AtomicClock atomicClock) {
    List<SuperVillainLair> lairs = new ArrayList<>();

    for (int i=0; i < lairCount; i++) {
      lairs.add(new SuperVillainLair(atomicClock));
    }
  }
}

After this has changed we consult our peers, get it approved with the note that it is rebuilding the object graph and merge forward. Once again, never try to do too many changes in one pass, otherwise the review is tedious and can lead to a spiral of changes that nobody can track.

Also worth reminding is that we did not change the original target of SuperVillainLair as there could be several other classes that instantiate it.

Phase 1.n

Now that the SuperVillainRealtyGroup is an injectable, we will update all other classes that instantiate a SuperVillainLair as a newable to ensure they are all injectables themselves, one at a time.

Phase 2

It is now safe to update the SuperVillainLair class to become an injectable. Since the previous phase guaranteed that all places it was instantiated were called from an injectable, the change can be propagated up one level and still be very readable².

public class SuperVillainLair {
  private final NuclearReactor reactor;

  @Inject
  public SuperVillainLair(AtomicClock atomicClock) {
    // Once all classes that instantiate NuclearReactor are updated
    // this can go away
    this.reactor = new NuclearReactor(atomicClock);
  }

  // This is optional to allow for partial application of the new
  // injectable construction. However, if the code is tangled up
  // to the point where a single constructor can't be changed cleanly
  // you may want to refactor the blocking area elsewhere in the code
  // and save this for later
  @Deprecated
  public SuperVillainLair(AtomicClock atomicClock) {
    // The lair does not need a clock, this is a sign the clock should have
    // been injected
    this.reactor = new NuclearReactor(atomicClock);
  }
}

public class SuperVillainRealtyGroup {
  private final List<SuperVillainLair> lairs;

  public static class Factory {
    private final Provider<SuperVillainLair> superVillainLairProvider;

    @Inject
    public Factory(Provider<SuperVillainLair> superVillainLairProvider) {
      this.superVillainLairProvider = superVillainLairProvider;
    }

    public SuperVillainRealtyGroup create(int lairCount) {
      // The object graph has now been fixed
      return new SuperVillainRealtyGroup(lairCount, superVillainLairProvider);
    }
  }

  private SuperVillainRealtyGroup(int lairCount, Provider<SuperVillainLair> superVillainLairProvider) {
    List<SuperVillainLair> lairs = new ArrayList<>();

    for (int i=0; i < lairCount; i++) {
      lairs.add(superVillainLairProvider.get());
    }
  }
}

Phase 3

Now that the object graph has been repaired, we can make the minor change to NuclearReactor and push that up one level as well. If it turns out there was another class that instantiated a newable NuclearReactor, don’t try to squish it together. Reset and repeat phase 1 & 2 as needed.

public class SuperVillainLair {
  private final NuclearReactor reactor;

  @Inject
  public SuperVillainLair(Provider<NuclearReactor> nuclearReactorProvider) {
    this.reactor = nuclearReactorProvider.get();
  }
}

public class NuclearReactor {
  private final AtomicClock atomicClock;

  @Inject
  public NuclearReactor(AtomicClock atomicClock) {
    this.atomicClock = atomicClock;
  }
}

Commit. Commit. Commit.

I’m going to stress it again, make these changes as small as possible when pushing for review. Larger changes in an active codebase will create more merge conflicts, as the class hierarchy is changing in a pretty drastic way overall. This not only affects the developer working on the refactor, but every other teammate that has work in progress. The goal here is to make everyone’s life easier, not to ensure they have good merge skills.

Anonymous Classes

One common time sink when it comes to refactoring to repair an object graph is anonymous classes. Here is s normal example using an anonymous class:

public class Wrapper {

  Map<String, String> parentClassLookupTable;

  private void sendEnclosedEvent(Map<String, String> lookupTable) {
    // ... snip ...
  }

  public void makeInternalTimer() {
    Timer timer = new Timer();

    timer.schedule(new TimerTask() {
      @Override
      public void run() {
        sendEnclosedEvent(parentClassLookupTable);
      }
    }, 1000);
  }
}

This is a very clear and concise use for an anonymous class. Testing that Timer calls run() correctly adds no value to our code, and the enclosedEvent() call may need access to some internal state.

We run into a problem when the object graph is required in any way by operations inside of sendEnclosedEvent(). This is really the same problem as before, we have broken the object graph. In this case, it’s a very trivial example and can be worked around. In reality, this can get more twisted and complex. When untangling, anonymous classes are fairly good candidates to get extracted for clarity. Hours can be wasted tracking down a non-obvious compile error when you are stuck inside of an anonymous inner class.

Like other refactoring exercises, a class extraction of any kind should exist as its own pull request with only fixes to tests included. Tracking other changes during a review can be very difficult and prone to turning into an unhelpful rubber stamp.

And stick the landing

Cleaning up and modernizing a codebase can be cathartic but it can also turn into a giant time sink if you’re not careful. Just remember, if you approach it in short discardable bursts you can get more done over time with the least amount of disruption to the forward momentum of bug fixes and feature development.

“Reasonable” is obviously at your team’s discretion, but it should be far higher than zero. ↩
I waver back and forth on the use of @Deprecated as opposed to a commit that just sweepingly cleans house. If @Deprecated actually supported attaching a reason as an argument, I’d probably feel better. ↩