I was having a chat over lunch a few days ago about lambda
abstractions. The guy I was talking to didn't fully understand what
they did and how they worked, but he's a smart guy so he understood it
pretty quickly. I explained them using the below example, and I
thought I'd have a shot at writing this up for other people to read.
Say you have the following functions in your code somewhere. The code is intentionally simple. If that's a problem, imagine that everywhere you see
double
, it actually reads encrypt
.
int double(int val); void doubleList(int values[], int valCount) { for (int i = 0; i < valCount; ++i) values[i] = double(values[i]); } int double(int val) { return val * 2; }
Now, it's pretty inconvenient to have to declare and define a whole other function for something as short as
double
. Excuse me
while I invent some syntax, but wouldn't it be convenient to be able
to do something like:
void doubleList(int values[], int valCount) { for (int i = 0; i < valCount; ++i) values[i] = function(int val) { return val * 2; }(values[i]); }
And that bit with
function...
is a lambda abstraction or
closure. What's happened here is that I've defined and called a
function all on the one line. This function doesn't have a name, so it
can't be called from anywhere else. But it is otherwise just like a
normal function; it appears on the call stack, it can return values
and it can even be passed parameters. Which is how values[i]
ends up in the val
local variable.
In fact the only unusual thing about this function is that it was defined 'inline.' And perhaps the syntax, but I invented that on the spot, so don't hold that against it. There are nicer ways to express this.
This sort of thing is known in some programming languages as a lambda abstraction. That name comes from something called the 'lambda calculus,' which doesn't really matter, but briefly it's a way of writing programs as a series of anonymous functions, each referred to as a lambda.
But why is it also called a closure? That's much more interesting because that name actually says something about what you can do with it.
Technically, it's called a closure because the anonymous function 'closes over' objects within the containing scope.
Which means almost nothing. Looking back at that inline doubling example above it did appear slightly unwieldy. Something like this is more readable:
void doubleList(int values[], int valCount) { for (int i = 0; i < valCount; ++i) values[i] = function() { return values[i] * 2; }(); }
Notice now how the anonymous function no longer takes any arguments. How can that still work? The anonymous function is a separate function on the callstack, with its own local variables. How can it access the
values
array, or the counter i
,
for that matter?
Imagine the following code:
class DoubleFunction { public: DoubleFunction(int values[], int valCount, int i) { mValues = values; mValCount = valCount; mI = i; } int apply() { return mValCount[mI] * 2; } private: int mValues[]; int mValCount; int mI; }; void doubleList(int values[], int valCount) { for (int i = 0; i < valCount; ++i) { DoubleFunction func(values, valCount, i); values[i] = func.apply(); } }
So, what's happening here? We've defined a class with a constructor and private members that happen to match exactly all the local variables and arguments in scope where we want to construct our closure, this class has only one member function called
apply
that matches the body of the closure.
This code is precisely equivalent to the version with the closure. And in fact, it goes further than that. This version is how a compiler could implement closures. A closure is an object with exactly one method:
apply
. So, anytime you see a closure in code just
imagine it as an object with a constructor that captures the current
scope and you're there.
A few details:
- Some may be jumping up and down to point out how inefficient the code with the class is. And they're right. But no compiler would actually do that, it's just a way to understand what's happening under the covers. Real closure implementations can be quite efficient.
- That class implementation takes copies of the values in scope. A real implementation would actually take a reference to the exact same value. This is important because it allows the value to change between when the closure is created and when it's evaluated, and have the new value be used at execution time.
What do I mean and what use is this anyway? Well, in the following code I'm using an invented syntax for closures. The part in the parentheses to the left of the
->
is the parameter list,
and the type to the right of the ->
is the return type.
void applyFunctionToEveryElement(int values[], int valCount, (int) -> int fn) { for (int i = 0; i < valCount; ++i) values[i] = fn(values[i]); } void doubleList(int values[], int valCount) { applyFunctionToEveryElement(values, valCount, function(int val) { return val * 2; } ); }
Here instead of evaluating the closure as soon as it is created, it's being passed into another function, which will evaluate it. Once for every element in the list. Which is exactly what you could do with an object. Big deal? Well, with that exact same
applyFunctionToEveryElement
function, you can also write
the following functions. Oh, and applyFunctionToEveryElement
is traditionally known as map
, so I've used that name below -
same function though, just a shorter name.
void printList(int values[], int valCount) { map(values, valCount, function(int v) { printf("%d\n", v); return v; }); } void scaleList(int values[], int valCount, int factor) { map(values, valCount, function(int val) { return val * factor; }); }
Both of these functions do completely different things to a list of integers, but use the exact same function to perform the mechanical walking the list part. Congratulations, you've abstracted out iteration. The first function is pretty straight forward, it prints every value and leaves the list unchanged.
The second function is a little more tricky. The closure created there is using a local variable, which the
map
function has no
knowledge of or access to. Remember that a closure captures all
objects in scope at the time it is created, and those objects can be
used anytime the object is evaluated. That's exactly what's happening
here.
How that works requires a few more tricks, so I'll be leaving it for another post. It's related to why continuations and closures commonly go hand in hand. Look out for my crash course in continuations and how to implement closures coming soon.
For now, just know that this works. Closures are good, if your current language has them, use them. You won't regret it. If you, like me, find your language doesn't have closures, then...
No comments:
Post a Comment