Why DRY?
Introduction
In his ‘The Case Against Relying Solely on DRY’, Ashley Peacock argues that the concept of ‘Don’t Repeat Yourself’ is these days sometimes taken out of context, and too liberally applied. In the spirit of the principle, I won’t rehash his points here, only noting as an aside that I didn’t find his example particularly convincing (see the afterword, where I do actually rehash his points). This short article intends to discuss a different, but related, case which recently came up in my day-to-day, where DRY was again called into question.
DRY why?
Without going in to too much detail, the example involves (essentially) duplicating several classes we’d made with the goal of extending them towards a slightly different purpose to the original classes (which were also needed). I say ’essentially’ because not all attributes were taken from the original classes, and some new ones were added. I flagged this as not feeling very DRY, though was reassurred that the SRP (arguably a principle sitting above, and coming before DRY) guides us to this approach, since the two classes were to be used in different ways by the client.
It also occurred to me that another argument against violating DRY could be made here: typically the central reason for DRY is given as ease of synchronisation: when we change something in one class, we don’t want to have to go and change it in lots of other ones. Better to have one single helper class whose methods, through either composition or inheritance, are used repeatedly throughout (this is not the only application of DRY, but it is enlightening here). Consider the following very basic example with two Python classes
class Aidan:
name: str
age: int
favourite_colour: str
def print_details(self) -> str:
return f'name: {self.name}, age: {self.age}'
class Olly:
name: str
age: int
favourite_colour: str
def print_details(self) -> str:
return f'name: {self.name}, age: {self.age}'
this would rightly be flagged as non-DRY, because not only do Aidan
and Olly
have the same attributes, but also the
same methods. If I wanted to add a trait nickname
to each, or change what print_details
does, I’m required to change
my code in multiple places. This is a waste of time and increases the chances of bugs in code (what if I forget that I’d
also defined Lizzie
in another module?)
Instead, it would be suggested, they should inherit from a parent class Person
:
class Person:
name: str
age: int
favourite_colour: str
def print_details(self) -> str:
return f'name: {self.name}, age: {self.age}'
class Aidan(Person):
pass
class Olly(Person):
pass
Now when I want to make any of the above changes, I simply need to change it in one place – much nicer!
But you know all that: how does this relate to a potential reason for not DRY-ing your code? The key is that the above is premised on code mutability: over time our classes develop, new things get added or removed, and DRY makes this easier. If we could guarantee that code wouldn’t or couldn’t change, then there would be a much weaker case for DRY (and we might even start to turn against it, since having a function’s definition right in front of you is a lot easier than trawling modules for the helper module where it is defined). In mature production codebases, there is a degree of immutability to the core data structures: lots is built on top of them, thus changing them could involve having to change lots of things in lots of places, and comes with a high risk of outages (depending on your deployment workflow). Thus when working with very old data structures (in my head I’m picturing something like Wyvern from Pirate of the Caribbean), modification can come with a risk of breaking a lot, and thus or otherwise incurring a disproportionate amount of work, and it could be argued it then makes sense to get (at least a bit) WET.
Now this seemed convincing when I realised it in the middle of a Zoom call, but having reflected a bit, I’m not sure that the above really works. In particular, if your codebase is written in such a way that modifying old classes in any way will break lots of things, then you are probably in some way violating the open-closed principle, and Bertrand Meyer is on his way to your residence as we speak. But more concretely than this (since I had hoped this post would just be more than regurgitating SOLID principles), I think this problem could be better solved by a refactor which breaks up the original class into more atomic units, out of which it can be reconstructed (thus requiring no refactor, and carrying no risk of breaking any existing code), and also out of which we can construct our new ‘semi-duplicate’ class. Everyone’s a winner!
Afterword
Peacock defines the following two classes (written in Ruby):
class OrderPlacedEventPublisher
def publish(order)
publisher = EventPublisher.for_event(
'order_placed'
).with_data(
order_id: order.id,
items: order.items
)
add_location_context(
location_uuid: order.location_uuid,
publisher: publisher
)
publisher.publish
end
def add_location_context(location_uuid:, publisher:)
return if location_uuid.blank?
publisher.with_context(
name: 'location_context',
data: {location_uuid: location_uuid},
version: '1-0-0'
)
end
end
class ItemViewedEventPublisher
def publish(item)
publisher = EventPublisher.for_event(
'item_viewed'
).with_data(item_id: item.id)
add_location_context(
location_uuid: item.location_uuid,
publisher: publisher
)
publisher.publish
end
def add_location_context(location_uuid:, publisher:)
return if location_uuid.blank?
publisher.with_context(
name: 'location_context',
data: {location_uuid: location_uuid},
version: '1-0-0'
)
end
end
and argues that DRY would not be best practice (or at least wouldn’t represent a significant improvement). He states that adding a helper class to abstract the shared functionality would couple the classes together unnecessarily, and come at ’too high’ a cost.
I’m not particularly convinced by the latter because it feels like a very small refactor to rework the above, which
could carry benefits going forwards, as is the intention of DRY coding practices. Neither am I convinced by the former:
the classes are already coupled together by virtue of having the same functions used in both! If the functions do
diverge (which in fairness, they do a bit for publish
), then it makes sense to keep them separate, but they don’t, and
to design for the case that they might going forwards seems like premature optimisation to me.