Over the years of my illustrious career, I’ve left several codebases a tattered wreckage. Not intentionally, mind you. Exuberance, eagerness, and a desire to learn new techniques on a slow day all contributed. Half finished implementations of different development techniques littered the landscape. The road to production is paved with good intentions, and a deadline. The code itself worked, shipped, improved but weird vestigial bits floated around. As someone who has fixed as many problems as they have created, you too can larn from my mistaks.
When one encounters and has to regularly work in a codebase that is creaking under the load of it’s implementation details, there are generally three options:
Since everyone is now an expert in dependency injection, we are going to focus on refactoring with an emphasis on dependency injection. The main reason refactoring is worth the attention is it can be done in a minimally disruptive manner. The other reason is that, while lying to yourself about the quality of your code and life is bad, rewriting a codebase is always worse.
It’s worth calling out before diving too deep that just because code is older it’s not necessarily bad. Sometimes the needs of the system change, or the time constraints of the past are not immediately obvious. Be careful of bulldozing through a system that you may have inherited. If you have access to previous authors, they are definitely an invaluable resource on potential pitfalls and any previous efforts towards code cleanup.
But first: tests!
I’m not sure I can put it any more succinctly than Hamlet D’Arcy did, but for the sake of completeness, do not start a refactor without adequate test code coverage. If the codebase you are attempting to clean up does not have reasonable1 test coverage, this would be a good time to add some. In short: No tests? No refactoring!
Gentle guidelines and exceptional non-rules
Now that we have decided to make the world marginally better and have a safety net of tests to protect us from ourselves, let’s get down to a set of rules that invariably will have exceptions all the time.
Always have a reason
Refactoring for the sole reason of putting your own twist on the codebase is always a bad idea. With each PR generated as part of a refactoring effort, there should be a clear reason behind why a change is being applied. Formatting, linting, opinions, and epic power struggles are not forms of helpful refactoring.
Don’t get attached
Second only to opinionated blog posts, refactoring is probably the lowest positive value endeavor you could provide as a developer. As such, expect your pull requests to be ignored and stomped on by higher priority pull requests. Nothing done as part of a refactor effort should put delivering bug fixes or product features on hold. If the master
branch diverges too far from a refactoring PR, throw it away, as the time wasted in merge conflict resolution is rarely worth it.
Progress by 1000 papercuts
Building on the previous two concepts, any attempt at refactoring should result in as small of a pull request as possible. When restructuring class hierarchies, dependencies, and constructor arguments the impact on a codebase ripples very wide, very quickly. By making a concerted effort to change as little as possible in a single pull request, it makes reviews easier and faster. In addition, minimizing the change set reduces the likelihood of getting into a situation where some random class no longer compiles and it becomes impossible to tell which part broke
If you find yourself down a rabbit hole and things are falling apart all around you, simply delete the branch (or start a fresh one) and re-apply the lowest impact changes.
Where do I start?
Just like the guidelines, this is very subjective. A lot of the time there are reasons that code was written a specific way. With that said, as you ease into this, ensure you get eyes on any code you see that feels strange. If you find something was done for performance reasons or something else, your refactoring effort could be just documenting the reasoning for the next reader.
Protect the Object Graph
As you migrate code from a non-DI style to a DI style, you should be looking for places where the object graph is broken. As you will recall, any time an injectable instantiates an internal class as a newable, the object graph breaks. If you notice a singleton being passed around directly, or another class passed in solely to be used to instantiate a child class these are good indicators that the object graph was broken too early.
The best approach to this is to trace the call references backwards to find a single point where the object graph first broke:
Phase 1:
In the first pass, we move construction of the SuperVillainRealtyGroup
into a Factory to allow it to be provided automatically by the injection framework.
After this has changed we consult our peers, get it approved with the note that it is rebuilding the object graph and merge forward. Once again, never try to do too many changes in one pass, otherwise the review is tedious and can lead to a spiral of changes that nobody can track.
Also worth reminding is that we did not change the original target of SuperVillainLair
as there could be several other classes that instantiate it.
Phase 1.n
Now that the SuperVillainRealtyGroup
is an injectable, we will update all other classes that instantiate a SuperVillainLair
as a newable to ensure they are all injectables themselves, one at a time.
Phase 2
It is now safe to update the SuperVillainLair
class to become an injectable. Since the previous phase guaranteed that all places it was instantiated were called from an injectable, the change can be propagated up one level and still be very readable2.
Phase 3
Now that the object graph has been repaired, we can make the minor change to NuclearReactor
and push that up one level as well. If it turns out there was another class that instantiated a newable NuclearReactor
, don’t try to squish it together. Reset and repeat phase 1 & 2 as needed.
Commit. Commit. Commit.
I’m going to stress it again, make these changes as small as possible when pushing for review. Larger changes in an active codebase will create more merge conflicts, as the class hierarchy is changing in a pretty drastic way overall. This not only affects the developer working on the refactor, but every other teammate that has work in progress. The goal here is to make everyone’s life easier, not to ensure they have good merge skills.
Anonymous Classes
One common time sink when it comes to refactoring to repair an object graph is anonymous classes. Here is s normal example using an anonymous class:
This is a very clear and concise use for an anonymous class. Testing that Timer
calls run()
correctly adds no value to our code, and the enclosedEvent()
call may need access to some internal state.
We run into a problem when the object graph is required in any way by operations inside of sendEnclosedEvent()
. This is really the same problem as before, we have broken the object graph. In this case, it’s a very trivial example and can be worked around. In reality, this can get more twisted and complex. When untangling, anonymous classes are fairly good candidates to get extracted for clarity. Hours can be wasted tracking down a non-obvious compile error when you are stuck inside of an anonymous inner class.
Like other refactoring exercises, a class extraction of any kind should exist as its own pull request with only fixes to tests included. Tracking other changes during a review can be very difficult and prone to turning into an unhelpful rubber stamp.
And stick the landing
Cleaning up and modernizing a codebase can be cathartic but it can also turn into a giant time sink if you’re not careful. Just remember, if you approach it in short discardable bursts you can get more done over time with the least amount of disruption to the forward momentum of bug fixes and feature development.