How to Use Generators and Yield in Python
A Python generator is a type of function that allows us to process large amounts of data quickly and efficiently. They are generally thought of as a complex concept in Python. That's why they are consistently seen on job interviews. Generators aren't just an arcane bit of programming used in interviews. They are an invaluable concept used to solve real world problems efficiently. In this article we'll go over what a generator and yield is, where to use it, and how it's used to process infinite sequences.
For simplicity's sake, all of the code in this tutorial (and all our Python courses) will be in Python 3.x.
What is a Generator?
To put it simply, a generator is a function which contains the yield keyword. That means every time yield is seen, you know it's a generator. Fair enough. But let's take it a step further. Here is an example of a basic generator function:
print(“start generating”) for i in range(some_value): print(“value before yield”, i) yield i print(“after yield”, i) print(“Generator is complete…")
As you can see, a generator is created with the keyword def — just like any other function or method. Nothing new there. Next we use a for loop to iterate over some_value which is provided by the user. This is old hat, but the yield keyword is new and specific to generators.
What is Yield in Python?
Yield in Python is very similar to return. Recall that return will output the value calculated by the function and then exits all computation. So return exits the function, yield suspends computation. This means that a generator using yield can be called as many times as it needs to be, and proves the very next value that is to be calculated. This is done by using the generator method called __next()__. (That is two underscores before and after "next()". These are generally called "dunder methods" or "magic methods" in Python.) Let's look at an example using our previous function, my_generator().
. f = my_generator(3) #<— Here we made some_value 3. f = my_generator(3) f.__next__() start generating value before yield 0 0 #<—Value yielded f.__next__() after yield 0 value before yield 1 1 f.__next__() after yield 1 value before yield 2 2 f.__next__() after yield 2
Generator is complete…
In this example, we passed 3 into my_generator and called the method three times. Notice that each time we called __next()__, it provided the next value to be calculated. As we said, it suspended calculation until the next iteration was called.
Reflect for a minute on how this differs from a traditional function. Normally, we would call a for loop and it would gather everything up, calculate it, and provide an answer to the user in one fell swoop. Generators however only care about the next calculation, forgetting everything else. Let's take a look at why that's important.
Why Do I Need Generators?
One of the core uses of Python is data analysis. This often means parsing records in a database that could potentially have millions upon millions of records. For example, let's say you need to look at each row in a database. Then multiply the second column in that row by two.
One might naively think a for loop or while loop with return would be perfect for this situation. However, that is not the case. That is because these iterative techniques store each piece of data in internal memory until the process is complete. What if the file were billions of rows long?
When the file is gigantic this will cause a memory error, which means there is no way to read large data sets in the traditional manner. You don't need to store all the records in memory, just the current row you need to perform calculations on. This is the problem we wish to solve with generators. This means generators are not only useful for parsing gigantic files, but also for calculating sequences that continue on into infinity. Let's look at a generator that can yield an infinite sequence of values.
How to Generate an Infinite Sequence in Python
Infinite sequence calculation is commonly used by mathematicians to understand large, indefinite quantities. A naive for loop or while loop would quickly run out of memory since there is no definitive end point. That means generators are perfect here. Remember that a generator suspends activity after every yield. So, it will only calculate as much as it needs to and nothing
more. Here is an example of an infinite sequence generator:
. def infinite_calculation(starting_point): while True: yield starting_point starting_point += 1
For most programmers, a red flag will raise at the statement while True. You don't have to be Alan Turing to know this means the program will continue forever and crash. However that is only with traditional for loops, a yield will keep the previous answer and simply return the next. Here is our infinite calculation code in action:
i = inifinite_calculation(10) i.__next__() 10 i.__next__() 11
We can do this ad infinitum and we'll never run out of memory. For the sake of example, let's look at what happens when we do this with a for loop without yield:
. def infinite_calculation_wont_work(starting_point): while True: starting_point += 1 z = infinite_calculation_wont_work(starting_point) Z #<— calling z method.
This won't work. It will hang forever. Feel free to try it yourself. Hopefully this conveys the fundamental difference between generators and traditional iterators. Speaking of iterators, let's take a quick look at using our generator as an iterator. (Instead of just calling __next__() all the time!)
How to Use a Generator as an Iterator
Iterators in Python are essentially functions that can be looped over. All generators are iterators, but not all iterators are generators. The finer points of iterators are not in the scope of this article, but it is important to know they can be called in a for loop, which is exactly what we're about to do. Here is a code snippet of that:
. for i in infinite_calculation(20): print(starting_point, end="this goes on forever")
Executing this code will produce an infinite amount of number printing onto the screen. Use ctrl+c to stop the iteration. (cmd+c on Mac) However, the key thing to remember is that it will never crash due to memory. That's because it is throwing away previous computation in favor of the next one. So far everything we have spoken about has only had one yield. Generators can actually have as many yields as you want. Let's see how.
Can a Generator Have Multiple Yields?
A generator can indeed have multiple yields, and there are many cases that it comes in handy. For instance, let's say we needed to iterate through billions of records, but needed to do something slightly different to the odd ones and the even ones. Let's take a look at some code that would help us accomplish that:
. def many_yields(number_of_records): for i in range(1,number_of_records): if i % 2 == 0: #<— Check if the number is Even or not. print(“multiply even number by 10”) x = i * 10 yield x #<—First Yield i += 1 else: print(“The number is odd, just yield the number…”) yield i #<—Second Yield
In this example, we are multiplying the current number by 10 if it's even, and simply yielding the current number if it is odd. Let's see it in action:
. b = many_yields(5) b.__next__() x b.__next__() The number is odd, just yield the number… 1 b.__next__() multiply even number by 10 20 b.__next__() The number is odd, just yield the number… 3 b.__next__() multiply even number by 10 40 b.__next__() —————————————————————————————- StopIteration Traceback (most recent call last) <ipython-input-37-d27662ec52f3> in <module> ——> 1 b.__next__()
Everything is going swimmingly. Odd numbers are yielded and not multiplied, while their even counterparts are. However, take note of the exception that is called at the end. This is the StopIteration exception. This is what Python uses to denote the end of a sequence. It occurs when you try to call __next__() on a generator (or any iterable) that does not have any values left. This can be avoided by using infinite sequences.
Python generator is a notoriously difficult concept to grasp. So don't be alarmed if you don’t completely understand what is going on immediately. It's important to get your fingers on the keyboard and experiment with yield and generator. Try all sorts of different variations until you understand the concept. Put yields before and after calculations. Try multiple yields and infinite sequences.