Niko Matsakis: Parallel Iterators Part 1: Foundations |
Since giving a talk about Rayon at the Bay Area Rust meetup, I’ve been working off and on on the support for parallel iterators. The basic idea of a parallel iterator is that I should be able to take an existing iterator chain, which operates sequentially, and easily convert it to work in parallel. As a simple example, consider this bit of code that computes the dot-product of two vectors:
1 2 3 4 |
|
Using parallel iterators, all I have to do to make this run in
parallel is change the iter
calls into par_iter
:
1 2 3 4 |
|
This new iterator chain is now using Rayon’s parallel iterators instead of the standard Rust ones. Of course, implementing this simple idea turns out to be rather complicated in practice. I’ve had to iterate on the design many times as I tried to add new combinators. I wanted to document the design, but it’s too much for just one blog post. Therefore, I’m writing up a little series of blog posts that cover the design in pieces:
Before we get to parallel iterators, let’s start by covering how
Rust’s sequential iterators work. The basic idea is that iterators
are lazy, in the sense that constructing an iterator chain does not
actually do anything until you execute
that iterator, either with
a for
loop or with a method like sum
. In the example above, that
means that the chain vec1.iter().zip(...).map(...)
are all
operations that just build up a iterator, without actually doing
anything. Only when we call sum
do we start actually doing work.
In sequential iterators, the key to this is
the Iterator
trait. This trait is actually very simple; it
basically contains two members of interest:
1 2 3 4 |
|
The idea is that, for each collection, we have a method that will
return some kind of iterator type which implements this Iterator
trait. So let’s walk through all the pieces of our example iterator
chain one by one (I’ve highlighted the steps in comments below):
1 2 3 4 |
|
The very start of our iterator chain was a call vec1.iter()
. Here
vec1
is a slice of integers, so it has a type like &[i32]
. (A
slice is a subportion of a vector or array.) But the iter()
method
(and the iterator it returns) is defined generically for slices of any
type T
. The method looks something like this (because this method
applies to all slices in every crate, you can only write an impl like
this in the standard library):
1 2 3 4 5 |
|
It creates and returns a value of the struct SliceIter
, which is the
type of the slice iterator (in the standard library, this type is
std::slice::Iter
, though it’s implemented somewhat
differently). The definition of SliceIter
looks something like this:
1 2 3 |
|
The SliceIter
type has only one field, slice
, which stores the
slice we are iterating over. Each time we produce a new item, we will
update this field to contain a subslice with just the remaining items.
If you’re wondering what the 'iter
notation means, it represents the
lifetime of the slice, meaning the span of the code where that
reference is in use. In general, references can be elided within
function signatures and bodies, but they must be made explicit in type
definitions. In any case, without going into too much detail here, the
net effect of this annotation is to ensure that the iterator does not
outlive the slice that it is iterating over.
Now, to use SliceIter
as an iterator, we must implement the
Iterator
trait. We want to yield up a reference &T
to each item in
the slice in turn. The idea is that each time we call next
, we will
peel off a reference to the first item in self.slice
, and then
adjust self.slice
to contain only the remaining items. That looks
something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Ok, so let’s return to our example iterator chain:
1 2 3 4 |
|
We’ve now seen how vec1.iter()
and vec2.iter()
work, but what
about zip
? The zip iterator is an adapter that takes two
other iterators and walks over them in lockstep. The return type
of zip
then is going to be a type ZipIter
that just stores
two other iterators:
1 2 3 4 |
|
Here the generic types A
and B
represent the types of the
iterators being zipped up. Each iterator chain has its own type that
determines exactly how it works. In this example we are going to zip
up two slice iterators, so the full type of our zip iterator will be
ZipIter, SliceIter<'b, i32>>
(but we never have
to write that down, it’s all fully inferred by the compiler).
When implementing the Iterator
trait for ZipIter
, we just want the
next
method to draw the next item from a
and b
and pair them up,
stopping when either is empty:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
The next step in our example iterator chain is the call to map
:
1 2 3 4 |
|
Map is another iterator adapter, this time one that applies a function
to each item we are iterating, and then yields the result of that
function call. The MapIter
type winds up with three generic types:
ITER
, the type of the base iterator;MAP_OP
, the type of the closure that we will apply at each step (in
Rust, closures each have their own unique type);RET
, the return type of that closure, which will be the type of the
items that we yield on each step.The definition looks like this:
1 2 3 4 5 6 7 |
|
(As an aside, here I’ve switched to using a where clause to write out the constraints on the various parameters. This is just a stylistic choice: I find it easier to read if they are separated out.)
In any case, I want to focus on the second where clause for a second:
1
|
|
There’s a lot packed in here. First, we said that MAP_OP
was the
type of the closure that we are going to be mapping over: FnMut
is
one of Rust’s standard closure traits;
it indicates a function that will be called repeatedly in a sequential
fashion (notice I said sequential; we’ll have to adjust this later
when we want to generalize to parallel execution). It’s called FnMut
because it takes an &mut self
reference to its environment, and thus
it can mutate data from the enclosing scope.
The where clause also indicates the argument and return type of the
closure. MAP_OP
will take one argument, ITER::Item
– this it the
type of item that our base iterator produces – and it will return
values of type RET
.
OK, now let’s write the iterator itself:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
The final step is the actual summation. This turns out to be fairly
straightforward. The actual sum
method is designed to work over any
kind of type that can be added in a generic way, but in the interest
of simplicity, let me just give you a version of sum
that works on
integers (I’ll also write it as a free-function rather than a method):
1 2 3 4 5 6 7 |
|
Here we take in some iterator of type ITER
. We don’t care what kind
of iterator it is, but it must produce integers, which is what the
Iterator
bound means. Next we repeatedly call next
to
draw all the items out of the iterator; at each step, we add them up.
There is one last piece of the iterator puzzle that I would like to
cover, because I make use of it in the parallel iterator design. In my
example, I created iterators explicitly by calling iter
:
1 2 3 4 |
|
But you may have noticed that in idiomatic Rust code, this explicit call to
iter
can sometimes be elided. For example, if I were actually writing
that iterator chain, I wouldn’t call iter()
from within the call to zip
:
1 2 3 4 |
|
Similarly, if you are writing a simple for loop that just goes over a
container or slice, you can often elide the call to iter
:
1 2 3 |
|
So what is going on here? The answer is that we have another trait
called IntoIterator
, which defines what types can be converted
into iterators:
1 2 3 4 5 6 7 8 9 10 |
|
Naturally, anything which is itself an iterator implements
IntoIterator
automatically – it just gets converted
into itself,
since it is already an iterator. Container types also implement
IntoIterator
. The usual convention is that the container type itself
implements IntoIterator
so as to give ownership of its contents:
e.g., converting Vec
into an iterator takes ownership of the
vector and gives back an iterator yielding ownership of its T
elements. However, converting a reference to a vector (e.g.,
`&Vec
) gives back references to the elements &T
. Similarly,
converting a borrowed slice like &[T]
into an iterator also gives
back references to the elements (&T
). We can implement
IntoIterator
for &[T]
like so:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Finally, the zip
helper method uses IntoIterator
to convert its
argument into an iterator:
1 2 3 4 5 6 7 8 |
|
Now that we’ve covered the whole iterator chain, let’s take a moment
to reflect on some interesting properties of this whole setup. First,
notice that as we create our iterator chain, nothing actually
happens until we call sum
. That is, you might expect that calling
vec1.iter().zip(vec2.iter())
would go and allocate a new vector that
contains pairs from both slices, but, as we’ve seen, it does not. It
just creates a ZipIter
that holds references to both slices. In
fact, no vector of pairs is ever created (unless you ask for one by
calling collect
). Thus iteration can be described as lazy, since
the various effects described by an iterator take place at the last
possible time.
The other neat thing is that while all of this code looks very
abstract, it actually optimizes to something very efficient. This is a
side effect of all those generic types that we saw before. They
basically ensure that the resulting iterator has a type that describes
precisely what it is going to do. The compiler will then generate a
custom copy of each iterator function tailored to that particular
type. So, for example, we wind up with a custom copy of ZipIter
that
is specific to iterating over slices, and a custom copy of MapIter
that is specific to multiplying the results of that particular
ZipIter
. These copies can then be optimized independently. The end
result is that our dot-product iteration chain winds up being
optimized into some very tight assembly; in fact, it even gets
vectorized. You can verify this yourself by
looking at this example on play and clicking
the ASM
button (but don’t forget to select Release
mode). Here is
the inner loop you will see:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Neat.
So let’s review the criticial points of sequential iterators:
next
, and then the iterator
does the minimal amount of work it can to produce a result.collect
, which accumulates the iterator’s items into a vector
or other data structure, building that data structure will require
allocating memory.So in summary, you get to write really high-level, convenient code with really low-level, efficient performance.
http://smallcultfollowing.com/babysteps/blog/2016/02/19/parallel-iterators-part-1-foundations/
Комментировать | « Пред. запись — К дневнику — След. запись » | Страницы: [1] [Новые] |