Before the lab
Read
Overview
Roughly speaking a generator
is a function that behaves like a list,
but only for sequential access. In this lab we'll start by trying to
impliment generators using closures, and then consider some
performance tradeoffs involved with their use.
Counter
- Time
- 20 minutes
- Activity
- Demo
make_counter
is the first
generator
example from the book. Try to predict the order of output from the following code.
Where is 'entering make_counter' and 'incrementing x' printed.
def make_counter(x):
print('entering make_counter')
while True:
yield x
print('incrementing x')
x = x + 1
print('first')
counter = make_counter(100)
print('second')
print(next(counter))
print('third')
print(next(counter))
print('last')
We can see there is some kind of delayed execution going on here. Let's try
tracing the code in the pudb3
debugger.
We can see that next
and yield
jump back and forth between two
sequences of execution.
To demystify this generator a bit, let's try to simulate it with a
closure by completing the definition of make_counter2
so that
counter=make_counter2(100); counter(); counter()
mimics
counter=make_counter(100); next(counter); next(counter)
.
Unlike a generator, our closure (the inner function) counter
always
starts at the top. We can fake the two entry points of the generator
by looking at the value of count
(note use of nonlocal)
def make_counter2(x):
count = x
def counter():
nonlocal count
return counter
Our functions should pass the following test (note this ignores output from print
):
def test_count():
counter1=counter.make_counter(100)
counter2=counter.make_counter2(100)
for j in range(0,100):
assert next(counter1) == counter2()
Fibonacci
- Time
- 30 minutes
- Activity
- individual
Here fib
is the
second generator example
from the book. Fill in the definition of fib2
to roughly mimic the
function of fib
. The difference is the handling of running out of
values; the generator version uses an exception, while we just return
None
. To replace yield
with return, you probably need to save the
value of a
before overriding it.
def fib(max):
a, b = 0, 1
while a < max:
yield a
a, b = b, a + b
def fib2(max):
a,b = 0,1
def next():
nonlocal a,b
else:
return None
return next
Your code should pass the following (somewhat clunky) test
def test_fib():
fun = fib.fib2(1000)
lst2 = []
while True:
n = fun()
if n!=None:
lst2.append(n)
else:
break
assert lst2 == list(fib.fib(1000))
Sieve
- Time
- 20 minutes
- Activity
- individual
We already looked at transforming the sieve
code to use list
comprehensions in Lab 15. In this section we'll consider
changing the code to use generators.
Start with the following version of sieve.py
.
#!/usr/bin/python3
from math import sqrt,ceil
def drop_divisible(n,lst):
return [ x for x in lst if x == n or x % n != 0 ]
def sieve_with(candidates, lst):
for c in candidates:
lst=drop_divisible(c,lst)
return lst
def sieve(n):
return sieve_with(range(2,ceil(sqrt(n))+1), range(2,n))
Copy sieve.py
to sievegen.py
, and rewrite drop_divisible
to use
a generator.
The following test makes sure we're getting the right answer; we'll understand the need for the list comprehension below.
import sieve
import sievegen
def test_sieve():
assert [x for x in sievegen.sieve(1000)] == sieve.sieve(1000)
Performance Comparison
- Time
- 15 minutes
- Activity
- Demo
One of the most common tradeoffs in computing is time versus memory. A technique called memoization uses more memory to cache answers. Here we're going to consider the opposite tradeoff, using more time for less memory.
Add the following code to the end of both modules.
if __name__ == '__main__':
primes = sieve(500000)
time the two versions with
$ /usr/bin/time python3 sieve.py $ /usr/bin/time python3 sievegen.py
Observe that the generate version is suspiciously faster. Let's run the following to understand more.
import sieve
import sievegen
import timeit
print('sieve={:s}'.format(str(sieve.sieve(10))))
print('sievegen={:s}'.format(str(sievegen.sieve(10))))
print(timeit.timeit("sieve.sieve(10000)","import sieve",number=100))
print(timeit.timeit("sievegen.sieve(10000)", "import sievegen",number=100))
print(timeit.timeit("[x for x in sievegen.sieve(10000)]", "import sievegen",number=100))
Use something like the list comprehension in the
sievegen
main block to force the generator to actually evaluate.Repeat the
/usr/bin/time
experiment, paying attention to both memory usage and time. What do you observe?
Estimating the tradeoff
- Time
- 15 minutes
- Activity
- individual
Try and estimate how much the time and memory consumption for each version grows when the parameter
n
is doubled. You may need to start with a smallern
to get a reasonable number of measurements.At what point does the generator version crash? Why?