On list comprehensions
Introduction
List comprehensions were introduced in Python 2.0, and have been upgraded and extended since then, including the inclusion of conditionals and porting to other iterative types, such as generators, sets, and dicts. Another article isn’t needed espousing their greatness: instead I wanted to give an example which arose recently for me where a naïve programmer (i.e. me before writing this article), with an insufficiently nuanced idea of what it meant for code to be ‘pythonic’ (or whether this was always even a good thing) might incorrectly apply them.
When all is well
I’ll begin with a very brief overview of some good applications of list comprehension, so we’re all on the same page.
The typical example usually involves a programmer who already has a list, and wants to do something to each of the elements (for example, square them all) to arrive at a new list. To those yet to drink the Python kool-aid (and who aren’t functional programmers), the most natural way to to this might seem to be:
original_list = [1, 2, 3, 4, 5]
new_list = []
for i in original_list:
new_list.append(i ** 2)
Alarm bells then go off in Python HQ, a car is dispatched, and a man shortly knocks on your door to confiscate your laptop and let you know your StackOverflow account has been permanently disabled. Obviously the nicer way to do this is would be:
original_list = [1, 2, 3, 4, 5]
new_list = [i ** 2 for i in original_list]
Indeed for many (if not most) situations where you see the structure for / list / append
it will make sense to replace
your loop with a list comprehension. A slightly less obvious example could be if you are constructing a new object to
put into your list, perhaps with some conditional logic attached:
students = ['Aidan Gallagher', 'Lizzie', 'Olly']
students_summary = []
for student in students:
if len(student.split()) > 1:
last_name = student.split()[1]
else:
last_name = None
info_dict = {
'first_name': student.split()[0],
'last_name': last_name
}
students_summary.append(info_dict)
The better way to do this would be to define a function which serialises the students in the required way, and then comprehend using this:
students = ['Aidan Gallagher', 'Lizzie', 'Olly']
def _seralise_student(student: str) -> dict[str, str]:
if len(student.split()) > 1:
last_name = student.split()[1]
else:
last_name = None
return {'first_name': student.split()[0], 'last_name': last_name}
students_summary = [_serialise_student(student) for student in students]
Cool! (There are debatably a few more ways this code could be optimised – ternary assignment, walrus operators, and such – but there’s no need to get into that now.)
As mentioned above you can get a bit more advanced with your list comprehensions with conditionals and conditional
expressions if you’re into that sort of thing (see here for a good
explainer on something I used to find confusing about this – the two different syntaxes for if
and if else
in
comprehensions).
Why not
The problem which one might run into as a young programmer (writes the young programmer) is that it’s easily possible to
get the wrong idea about list comprehensions, and believe that any time you see for / list / append
or
just for / list
you ought to coerce your code into this form.
This article gives some good examples of where this may not be the best approach, in particular looking at (1) code readability with nested comprehensions and (2) memory use. The point on (1) I think is a good one (so is (2), I’m just less concerned with it here): especially when working on larger, shared codebases, programmers should think about how easy it will be to read and debug their code. If your coworkers are going to try and have to parse and step through 3 levels of list comprehension, it might quickly make your code very difficult to work with. Another example which I came across recently concerns side effects and also relates to shared codebases.
List comprehensions have their origins in the functional programming paradigm, one of the central tenets of which is
that functions shouldn’t have side effects. What this
means is that a function shouldn’t modify variables which sit outside its scope. The below update_last_updated
function for example has the side effect of modifying an entry in the user table, which doesn’t sit inside the scope of
the function. Apologies that this example isn’t entirely self-contained: assume everything means basically what you
expect it to mean (User
will be a class we’ve defined to represent an entry in our database, get
and write
will do
what they say on the tin, users in our database have user_id
and last_updated
attributes).
from datetime import datetime
def update_last_updated(user_id: str) -> User:
user = User.get(user_id)
user.last_updated = datetime.now()
user.write()
return user
What both these things mean is that in general when people see list comprehensions, they think functional programming, and so don’t expect side-effects. This can be a source of bugs and confusion when others come to work with your code. For example, continuing the above example, the following would be a bad use of list comprehension:
user_ids = ['user_1', 'user_2', 'user_3']
updated_users = [update_last_updated(user_id) for user_id in user_ids]
The relevant reason this is bad is not because it’s bad programming in the abstract (though it is also this, in
that update_last_updated
fails the SRP), but because
when you work on a shared codebase, there are also a shared set of assumptions and biases your coworkers will have. Not
all of these are necessarily justified or good, but insofar as they are common (which is usually a signal that they are
justified and good), we ought to write code which conforms to them. If your list comprehensions have side effects, you
will (or at least could) be failing to conform with the expectations of your colleagues who see list comprehensions as
functional-programming-esque.