In practice, what are the main uses for the yield from syntax in Python 3.3?
In practice, what are the main uses for the yield from syntax in Python 3.3?
Lets get one thing out of the way first. The explanation that yield from g
is equivalent to for v in g: yield v
does not even begin to do justice to what yield from
is all about. Because, lets face it, if all yield from
does is expand the for
loop, then it does not warrant adding yield from
to the language and preclude a whole bunch of new features from being implemented in Python 2.x.
What yield from
does is it establishes a transparent bidirectional connection between the caller and the sub-generator:
-
The connection is transparent in the sense that it will propagate everything correctly too, not just the elements being generated (e.g. exceptions are propagated).
-
The connection is bidirectional in the sense that data can be both sent from and to a generator.
(If we were talking about TCP, yield from g
might mean now temporarily disconnect my clients socket and reconnect it to this other server socket.)
BTW, if you are not sure what sending data to a generator even means, you need to drop everything and read about coroutines first—theyre very useful (contrast them with subroutines), but unfortunately lesser-known in Python. Dave Beazleys Curious Course on Coroutines is an excellent start. Read slides 24-33 for a quick primer.
Reading data from a generator using yield from
def reader():
A generator that fakes a read from a file, socket, etc.
for i in range(4):
yield << %s % i
def reader_wrapper(g):
# Manually iterate over data produced by reader
for v in g:
yield v
wrap = reader_wrapper(reader())
for i in wrap:
print(i)
# Result
<< 0
<< 1
<< 2
<< 3
Instead of manually iterating over reader()
, we can just yield from
it.
def reader_wrapper(g):
yield from g
That works, and we eliminated one line of code. And probably the intent is a little bit clearer (or not). But nothing life changing.
Sending data to a generator (coroutine) using yield from – Part 1
Now lets do something more interesting. Lets create a coroutine called writer
that accepts data sent to it and writes to a socket, fd, etc.
def writer():
A coroutine that writes data *sent* to it to fd, socket, etc.
while True:
w = (yield)
print(>> , w)
Now the question is, how should the wrapper function handle sending data to the writer, so that any data that is sent to the wrapper is transparently sent to the writer()
?
def writer_wrapper(coro):
# TBD
pass
w = writer()
wrap = writer_wrapper(w)
wrap.send(None) # prime the coroutine
for i in range(4):
wrap.send(i)
# Expected result
>> 0
>> 1
>> 2
>> 3
The wrapper needs to accept the data that is sent to it (obviously) and should also handle the StopIteration
when the for loop is exhausted. Evidently just doing for x in coro: yield x
wont do. Here is a version that works.
def writer_wrapper(coro):
coro.send(None) # prime the coro
while True:
try:
x = (yield) # Capture the value thats sent
coro.send(x) # and pass it to the writer
except StopIteration:
pass
Or, we could do this.
def writer_wrapper(coro):
yield from coro
That saves 6 lines of code, make it much much more readable and it just works. Magic!
Sending data to a generator yield from – Part 2 – Exception handling
Lets make it more complicated. What if our writer needs to handle exceptions? Lets say the writer
handles a SpamException
and it prints ***
if it encounters one.
class SpamException(Exception):
pass
def writer():
while True:
try:
w = (yield)
except SpamException:
print(***)
else:
print(>> , w)
What if we dont change writer_wrapper
? Does it work? Lets try
# writer_wrapper same as above
w = writer()
wrap = writer_wrapper(w)
wrap.send(None) # prime the coroutine
for i in [0, 1, 2, spam, 4]:
if i == spam:
wrap.throw(SpamException)
else:
wrap.send(i)
# Expected Result
>> 0
>> 1
>> 2
***
>> 4
# Actual Result
>> 0
>> 1
>> 2
Traceback (most recent call last):
... redacted ...
File ... in writer_wrapper
x = (yield)
__main__.SpamException
Um, its not working because x = (yield)
just raises the exception and everything comes to a crashing halt. Lets make it work, but manually handling exceptions and sending them or throwing them into the sub-generator (writer
)
def writer_wrapper(coro):
Works. Manually catches exceptions and throws them
coro.send(None) # prime the coro
while True:
try:
try:
x = (yield)
except Exception as e: # This catches the SpamException
coro.throw(e)
else:
coro.send(x)
except StopIteration:
pass
This works.
# Result
>> 0
>> 1
>> 2
***
>> 4
But so does this!
def writer_wrapper(coro):
yield from coro
The yield from
transparently handles sending the values or throwing values into the sub-generator.
This still does not cover all the corner cases though. What happens if the outer generator is closed? What about the case when the sub-generator returns a value (yes, in Python 3.3+, generators can return values), how should the return value be propagated? That yield from
transparently handles all the corner cases is really impressive. yield from
just magically works and handles all those cases.
I personally feel yield from
is a poor keyword choice because it does not make the two-way nature apparent. There were other keywords proposed (like delegate
but were rejected because adding a new keyword to the language is much more difficult than combining existing ones.
In summary, its best to think of yield from
as a transparent two way channel
between the caller and the sub-generator.
References:
- PEP 380 – Syntax for delegating to a sub-generator (Ewing) [v3.3, 2009-02-13]
- PEP 342 –
Coroutines via Enhanced Generators (GvR, Eby) [v2.5, 2005-05-10]
What are the situations where yield from is useful?
Every situation where you have a loop like this:
for x in subgenerator:
yield x
As the PEP describes, this is a rather naive attempt at using the subgenerator, its missing several aspects, especially the proper handling of the .throw()
/.send()
/.close()
mechanisms introduced by PEP 342. To do this properly, rather complicated code is necessary.
What is the classic use case?
Consider that you want to extract information from a recursive data structure. Lets say we want to get all leaf nodes in a tree:
def traverse_tree(node):
if not node.children:
yield node
for child in node.children:
yield from traverse_tree(child)
Even more important is the fact that until the yield from
, there was no simple method of refactoring the generator code. Suppose you have a (senseless) generator like this:
def get_list_values(lst):
for item in lst:
yield int(item)
for item in lst:
yield str(item)
for item in lst:
yield float(item)
Now you decide to factor out these loops into separate generators. Without yield from
, this is ugly, up to the point where you will think twice whether you actually want to do it. With yield from
, its actually nice to look at:
def get_list_values(lst):
for sub in [get_list_values_as_int,
get_list_values_as_str,
get_list_values_as_float]:
yield from sub(lst)
Why is it compared to micro-threads?
I think what this section in the PEP is talking about is that every generator does have its own isolated execution context. Together with the fact that execution is switched between the generator-iterator and the caller using yield
and __next__()
, respectively, this is similar to threads, where the operating system switches the executing thread from time to time, along with the execution context (stack, registers, …).
The effect of this is also comparable: Both the generator-iterator and the caller progress in their execution state at the same time, their executions are interleaved. For example, if the generator does some kind of computation and the caller prints out the results, youll see the results as soon as theyre available. This is a form of concurrency.
That analogy isnt anything specific to yield from
, though – its rather a general property of generators in Python.
In practice, what are the main uses for the yield from syntax in Python 3.3?
Wherever you invoke a generator from within a generator you need a pump to re-yield
the values: for v in inner_generator: yield v
. As the PEP points out there are subtle complexities to this which most people ignore. Non-local flow-control like throw()
is one example given in the PEP. The new syntax yield from inner_generator
is used wherever you would have written the explicit for
loop before. Its not merely syntactic sugar, though: It handles all of the corner cases that are ignored by the for
loop. Being sugary encourages people to use it and thus get the right behaviors.
This message in the discussion thread talks about these complexities:
With the additional generator features introduced by PEP 342, that is no
longer the case: as described in Gregs PEP, simple iteration doesnt
support send() and throw() correctly. The gymnastics needed to support
send() and throw() actually arent that complex when you break them
down, but they arent trivial either.
I cant speak to a comparison with micro-threads, other than to observe that generators are a type of paralellism. You can consider the suspended generator to be a thread which sends values via yield
to a consumer thread. The actual implementation may be nothing like this (and the actual implementation is obviously of great interest to the Python developers) but this does not concern the users.
The new yield from
syntax does not add any additional capability to the language in terms of threading, it just makes it easier to use existing features correctly. Or more precisely it makes it easier for a novice consumer of a complex inner generator written by an expert to pass through that generator without breaking any of its complex features.