Python Iterators: Understanding Python's Generators

Python’s Gen, Iter, and Iters: The Right Time

19 min read

Francis Wolff

01/26/2024

Laptop displaying code on screen with glasses beside it

Python stands out in the programming world for its simplicity and readability, coupled with powerful features that enable efficient and effective coding practices. Among these features are generators, the iter() function, and iterable objects. Understanding when and how to use these can significantly enhance your coding proficiency in Python. Navigating through the intricacies of Python’s Gen, Iter, and Iters, let’s transition to a complementary exploration of variable type checking in Python with the concept of Duck Typing.

Generators, Iterables, and Iterators rank among the most frequently utilized features in Python programming. Despite their widespread use, we seldom take a moment to delve into their mechanics or consider crafting our custom generators and iterables. Gaining an in-depth understanding of their capabilities opens up a realm of possibilities, enabling you to broaden your coding arsenal. By harnessing these tools effectively, you can elevate your code’s efficiency and embrace a more Pythonic coding style.

Iterables in Python: Unveiling the Power of Custom Iterators

In the early stages of learning Python, one of the fundamental concepts introduced is the ability to iterate through all elements of a list using a straightforward syntax.

>>> var = [1, 2, 3]
>>> for element in var:
...     print(element)
... 
1
2
3

Even after extensive experience with Python, you might take this syntax for granted without pausing to reflect on it. Eventually, you’ll discover that the same syntax applies seamlessly to tuples and dictionaries as well.

>>> var = {'a': 1, 'b':2, 'c':3}
>>> for key in var:
...     print(key)
... 
a
b
c

This syntax is also applicable to strings:

>>> var = 'abc'
>>> for c in var:
...     print(c)
... 
a
b
c

As you might envision, an iterable in Python is an object that enables traversing its elements one at a time. It appears that nearly any data type facilitating the grouping of information is iterable. Consequently, the next natural consideration is whether we can create our custom iterable.

The getitem approach

As you might be aware, when creating our custom types, we utilize classes. We can exemplify this with the following:

class Arr:
    def __init__(self):
        self.a = 0
        self.b = 1
        self.c = 2

And for its utilization:

>>> a = Arr()
>>> print(a.a)
0
>>> print(a.b)
1

You can quickly grasp the basics of using classes from the brief illustration. This simple example involves storing three distinct values and subsequently printing two of them. However, attempting the following would result in an error:

>>> for item in a:
...     print(item)
... 
Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: 'Arr' object is not iterable

Certainly, it’s evident that we cannot iterate over our object. Python lacks the knowledge of the order in which elements should be processed. Consequently, it becomes necessary for us to implement the iteration ourselves, as illustrated in the following example:

class Arr:
    def __init__(self):
        self.a = 0
        self.b = 1
        self.c = 2

    def __getitem__(self, item):
        if item == 0:
            return self.a
        elif item == 1:
            return self.b
        elif item == 2:
            return self.c
        raise StopIteration

Acknowledging its complexity, let’s delve into how it operates:

>>> a = Arr()
>>> print(a[0])
0
>>> print(a[1])
1
>>> for element in a:
...     print(element)
... 
0
1
2

By implementing the __getitem__ method in Python, we enable attribute access in our object akin to how we would interact with a list or tuple. This allows for straightforward indexing like a[0], a[1], etc. Venturing further into this approach opens up numerous possibilities and questions. At this juncture, it becomes crucial to explore the concept of duck typing and how it can be effectively utilized in your code.

Additionally, we have incorporated a StopIteration exception to handle cases where attempts are made to access elements beyond the first three. This exception is unique to the behavior of iterables. While using a for loop, the iteration over the desired elements occurs seamlessly. However, attempting to iterate beyond the predefined limit reveals an unexpected behavior that deviates from standard for loop operations.

>>> a[3]
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "<input>", line 14, in __getitem__
StopIteration

Had our object been a traditional list, encountering an IndexError would be the norm when attempting to access an out-of-bounds element, rather than the StopIteration exception we implemented. Let’s consider a different scenario, where we create an iterable that can handle a broader range of data instead of being limited to just three elements. Imagine an object named ‘Sentence’ designed to iterate over individual words in a sentence. With this construct, we could seamlessly traverse each word using a for loop in the following manner:

class Sentence:
    def __init__(self, text):
        self.words = text.split(' ')

    def __getitem__(self, item):
        return self.words[item]

In this straightforward example, we take a block of text and split it at the spaces. The resulting words are stored in an attribute called words. Here, the __getitem__ method simply retrieves the correct item from this list of words. We can utilize it in the following way:

>>> s = Sentence('This is some form of text that I want to explore')
>>> for w in s:
        print(w)

This 
is 
some 
form 
of 
text 
that 
I 
want 
to 
explore

The behavior of our example is as anticipated. While straightforward, it does have its limitations, such as not accounting for punctuation. Nevertheless, the for loop concludes smoothly once it exhausts the list of words. Conversely, attempting to access a non-existent element, like s[100], correctly results in an IndexError, as expected.

The Sentence class could be enhanced further, for instance, by implementing a __len__ method. However, delving into such improvements is outside the scope of this discussion.

To create an iterable object like the ones we’ve discussed, the essential requirement is a __getitem__ method that accesses elements using a 0-based indexing approach. The first element should be accessed with s[0], and so on. If your implementation relies on a list, as in the Sentence example, this setup naturally aligns with list behavior. However, in cases like our ‘Arr’ example, where a custom approach is taken, careful consideration is needed to ensure that the first element is indeed indexed with 0.

On-the-Fly Element Generation in Python

In our previous examples, we iterated through elements that were predefined at the instantiation of a class. However, this isn’t a necessity. Let’s explore how we can dynamically generate elements, such as random numbers, for as long as we desire:

import random 

class RandomNumbers:
    def __getitem__(self, item):
        return random.randint(0, 10)

This class can be used as follows:

>>> r = RandomNumbers()
>>> for a in r:
>>>         print(a)
>>>         if a==10:
>>>             break
>>>             
4
1
5
2
5
7
2
7
9
9
3
10

Notice that we halt the loop when the RandomNumbers object produces a 10. Without this condition, the loop would continue indefinitely. You could alternatively use a timer, a specific iteration count, or allow it to run endlessly. The approach is flexible. Additionally, you can modify the class to internally limit value generation:

import random 

class RandomNumbers:
    def __getitem__(self, item):
        return random.randint(0, 10)

This class can be used as follows:

>>> r = RandomNumbers()
>>> for a in r:
>>>         print(a)
>>>         if a==10:
>>>             break
>>>             
4
1
5
2
5
7
2
7
9
9
3
10

Now, the loop stops naturally once a 10 is generated by the class.

This technique is particularly useful in scenarios like signal acquisition from devices generating sequential data, such as frames from a camera or analog readings from a sensor. I’ll delve deeper into device interfacing in a future tutorial.

Another example of on-demand value generation is file reading in Python. When opening a file, Python doesn’t load all contents into memory; instead, it allows iteration over each line:

>>> with open('39_use_arduino_with_python.md', 'r') as f:
...     for line in f:
...         print(line) 
...
Using Python to communicate with an Arduino
===========================================
    ...

This method is efficient for handling files larger than your available memory. To illustrate:

>>> import sys
>>> with open('39_use_arduino_with_python.md', 'r') as f:
>>>     print(sys.getsizeof(f))
>>>     size = 0
...     for line in f:
...         size += sys.getsizeof(line) 
...     print(size)
...
216
56402

Here, the size of the f variable is significantly smaller than the total size of the file’s contents. While this isn’t a precise method for determining a file’s size, it highlights the benefits of using iterables for managing large data sets without overwhelming the computer’s memory.

Exploring Iterators in Python

Programmer surrounded by screens with code and a cat

In the previous section, we discussed how creating an iterable object in Python can be as simple as defining a suitable __getitem__ method. We also noted that several common objects, such as lists or files, are iterables. However, there’s an underlying mechanism at play here, worth exploring, known as iterators.

The distinction between an iterable and an iterator is nuanced, akin to the difference between a class and its instance. We understand that we can iterate over a list or a custom object, but what exactly is happening behind the scenes in Python during iteration? Let’s demystify this process using a basic list as an example:

>>> var = ['a', 1, 0.1]
>>> it = iter(var)
>>> next(it)
'a'
>>> next(it)
1
>>> next(it)
0.1
>>> next(it)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
StopIteration

In the provided code, an iterator has been crafted using the built-in function iter. This function accepts an iterable as input and produces an iterator. The iterator is the entity that comprehends the meaning of “next.” Notably, the iterator itself is iterable.

>>> it = iter(var)
>>> for element in it:
...     print(element)
...
a
1
0.1

If more control over the process is desired, Python provides the means to achieve it. An iterator necessitates the implementation of two methods: next and iter. When a particular iterator is preferred for a class, an iter method should also be specified. Examining the case of the Sentence class illustrates this. To emphasize, the iteration through words is done in reverse order. Initially, the iterator is defined as follows:

class SentenceIterator:
    def __init__(self, words):
        self.words = words
        self.index = 0

    def __next__(self):
        try:
            word = self.words[self.index]
        except IndexError:
            raise StopIteration()
        self.index -= 1
        return word

    def __iter__(self):
        return self

The clarity of the code above is notable. It’s essential to remember that the sole method for specifying the next value is by maintaining the current index that gets returned. With the iterator in place, the next step involves updating the original Sentence class:

class Sentence:
    def __init__(self, text):
        self.words = text.split(' ')

    def __iter__(self):
        return SentenceIterator(self.words)

Now, let’s dive into the enjoyable segment:

>>> text = "This is a text to test if our iterator returns values backward"
>>> s = Sentence(text)
>>> s[0]
Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: 'Sentence' object does not support indexing

Certainly, accessing the elements of our Sentence object is not feasible since we haven’t defined the necessary getitem. Nevertheless, the for loop will function as expected:

>>> for w in s:
...     print(w)
...     
This
backward
values
returns
iterator
our
if
test
to
text
a
is
This

As you observe, the printing order is reversed, as anticipated. The only discrepancy is with the first word due to starting at 0, which corresponds to the last index: -12. If you wish, a solution for this matter is attainable with a bit of contemplation.

The provided example is also functional with iter:

>>> it = iter(s)
>>> next(it)
'This'
>>> next(it)
'backwards'

It’s crucial to emphasize that once the iterator is depleted, no further actions can be taken with it. However, consistently traversing the elements in the Sentence object is always possible within a for loop.

The final piece we lack is the capability to access words by index but in the correct order. If we implement a getitem method, what outcome do you anticipate for the for loop?

class Sentence:
    def __init__(self, text):
        self.words = text.split(' ')

    def __iter__(self):
        return SentenceIterator(self.words)

    def __getitem__(self, item):
        return self.words[item]

and its utilization is demonstrated as follows:

>>> s = Sentence(text)
>>> s[0]
'This'
>>> s[1]
'is'
>>> for w in s:
...     print(w)
...     
This
backwards
values
returns
iterator
our
if
test
to
text
a
is
This

So now, it becomes evident that accessing the elements by their index and iterating over them in reverse order is achievable.

Should the ‘Sentence’ Class Implement a next Method?

It’s technically feasible to transform the ‘Sentence’ class into an iterator by adding a __next__ method. However, this approach might not be advisable. It’s important to remember that iterators are designed to run until they’re exhausted, maintaining an internal state as they progress. If you conflate the roles of an iterable and an iterator within the ‘Sentence’ class, you’re likely to encounter issues, particularly in scenarios involving nested loops.

Separating the iterable and iterator functionalities is generally a sound practice. This concept is thoroughly explored in Luciano Ramalho’s book “Fluent Python”, particularly in Chapter 14, which delves into this topic in detail.

For those working in fields like scientific instrumentation, dividing the iterable and iterator can offer distinct advantages. Consider scenarios where you need an iterator that behaves differently based on specific parameters. Take, for instance, data acquisition from a camera. You might envision a situation like:

>>> for frame in camera:
...     analyze(frame)

In its simplest form, imagine analyzing each frame produced by a camera. But, consider a scenario where frames are generated via an external trigger. In such cases, you might want to associate a timestamp with each frame. Alternatively, the camera might capture a finite number of frames at a specific framerate. Ideally, it’s the iterator’s responsibility to handle these variations.

Often, there’s no need to define a separate iterator class; we can effectively employ generators within our __iter__ method. This approach, leveraging the power and simplicity of generators for iteration, will be the focal point of our next discussion.

Generators in Python

The final crucial topic to delve into is the role of generators. In my view, generators are somewhat underutilized, yet they unlock new possibilities. A quintessential example is generating an infinite sequence of numbers. While it’s impractical to store an infinite list in memory, if we know the interval between numbers, we can continually determine the next number. This principle is at the heart of generators. To grasp this concept, let’s begin with a simple example to understand their mechanics:

def make_numbers():
    print('Making a number')
    yield 1
    print('Making a new number')
    yield 2
    print('Making the last number')
    yield 3

In the code above, note that we define a function using def, but instead of return, we use yield. The print statements are included to help track the execution flow. It’s intriguing that we have three yield statements, whereas in a standard function, you would typically see only one return. To use this generator, we proceed as follows:

>>> a = make_numbers()
>>> print(a)
<generator object make_numbers at 0x7f65a8ea8ed0>

When we call make_numbers, there are no print outputs, indicating that the execution hasn’t started yet. Now, let’s request the next element from the generator:

>>> next(a)
Making a number
1

The print statement executes, displaying “Making a number”, and we receive the first number. Continuing the process:

>>> next(a)
Making a new number
2
>>> next(a)
Making the last number
3
>>> next(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>>

Once we exhaust the yield statements, a StopIteration exception is raised. This is the same exception we encountered earlier with iterables, indicating that we can use a generator similarly. For instance, in a for-loop:

>>> b = make_numbers()
>>> for i in b:
...  print(i)
... 
Making a number
1
Making a new number
2
Making the last number
3

Generators can be expanded to perform various tasks. For example, let’s generate a series of equally spaced integers:

def make_numbers(start, stop, step):

def make_numbers(start, stop, step):
    i = start
    while i<=stop:
        yield i
        i += step

Applying what we’ve learned, we can do:

>>> for i in make_numbers(1, 20, 2):
...     print(i)
1
3
5
7
9
11
13
15
17
19

Notably, each number is generated only when needed, hence the term ‘generators’. Observing the memory usage of our variables:

>>> import sys
>>> z = make_numbers(0, 1000000000, 1)
>>> sys.getsizeof(z)
128

The variable z spans from 0 to 1 billion in increments of 1. Yet, its memory footprint is a mere 128 bytes. This efficiency exemplifies the power of generator syntax.

Generators and Iterators: Enhancing Class Functionality

A fascinating pattern in Python programming involves the integration of generators with iterators within classes. This approach can significantly streamline and optimize the way we handle data in our classes. Let’s enhance the ‘Sentence’ class, enabling it to loop through its elements efficiently:

class Sentence:
    def __init__(self, text):
        self.words = text.split(' ')

    def __iter__(self):
        for word in self.words:
            yield word

Using this class is straightforward:

>>> text = "This is a text to test our iterator"
>>> s = Sentence(text)
>>> for w in s:
...     print(w)
This
is
a
text
to
test
our
iterator

In this implementation, we forego defining a separate iterator class like SentenceIterator and don’t explicitly define a __next__ method. Nevertheless, we achieve seamless iteration through the sentence. While the example is basic, its potential becomes more apparent when applied to more complex scenarios, such as iterating through the words in a file.

Reading an entire file into a massive list of words can be memory-intensive, especially for large files. A more efficient approach is to read the file line by line:

class WordsFromFile:
    def __init__(self, filename):
        self.filename = filename

    def __iter__(self):
        with open(self.filename, 'r') as f:
            for line in f:
                words = line.split(' ')
                for word in words:
                    yield word

Here’s how you might use this class:

>>> words = WordsFromFile('22_Step_by_step_qt.rst.md')
>>> for w in words:
...     print(w)

This approach ensures that when the WordsFromFile class is instantiated, only the filename is stored without opening the file. The file is opened and read line by line only during iteration. This technique, which is also utilized in Python’s file handling, allows for efficient memory usage. We then split each line into words and yield them one by one.

An interesting consequence of using a generator in this way is the possibility of nesting loops. For instance:

>>> for w in words:
...     for c in words:
...         print(w, c)

Although this is a simplistic example, it illustrates the capability of nested loops using generators. If we had used an internal index in the class to track elements, such nesting would not be feasible. This flexibility underscores the power of integrating generators with class iterators, offering a blend of efficiency and functionality.

considerations very
considerations simple
considerations way.
considerations You
considerations could
considerations find
considerations better
considerations solutions,
considerations of
considerations course,
considerations but
considerations this

Three people sharing and discussing code on devices

Generators, Iterators, and Iterables in Python

Navigating the concepts of generators, iterators, and iterables in Python can be perplexing, especially after going through extensive guides. Let’s clarify these terms, broadening our understanding of their distinct roles in Python programming.

At its core, an iterable refers to any object over which iteration is possible. In layman’s terms, if you can move from one element to another within your object, it qualifies as iterable. Examples of iterables include lists, strings, and dictionaries, where you can traverse through their elements;
An iterator is a specialized class designed to iterate over elements of an iterable. The primary role of an iterator is to keep track of the current position and know when the end of the collection is reached. This is typically achieved by implementing methods like __iter__() and __next__();
Generators are a special category that resembles iterators but is defined using the yield syntax. They are particularly useful for generating values on demand, especially in scenarios where generating all values at once is impractical due to memory constraints or the need for real-time processing.

While there are subtle differences between iterators and generators – with iterators generally returning existing values and generators capable of creating values on the fly – the distinctions aren’t always clear-cut. In practice, the overlap between these concepts is substantial, and focusing too narrowly on their differences may not always be productive.

However, using the terms ‘generator’ and ‘iterator’ can provide clarity in certain contexts, particularly when explaining code. For instance, if capturing live images from a camera, referring to the process as ‘using a generator’ makes sense, as the images are generated in real-time and are not pre-existing. Conversely, when processing stored frames from a hard drive, the term ‘iterator’ is more fitting, as the data already exists and is simply being accessed sequentially.

Understanding these terms and their nuanced applications enhances our ability to write and explain code more effectively, making our Python programming more intuitive and accessible.

Conclusions

Iterators and generators stand out as indispensable tools in the realm of Python programming, especially valuable for managing continuous data streams or datasets larger than the available memory. Our journey has taken us through the creation of custom classes and context managers, and now, with the addition of generators and iterators, we unlock new dimensions in handling loops and iterations.

The significance of understanding generators extends beyond their direct application in your code. It’s a critical step towards deciphering the logic underpinning various external libraries and frameworks. Take Django, for instance, a widely-used Python web framework. A cursory exploration of its codebase reveals an extensive use of generators. Grasping how generators work not only enhances your coding skills but also provides insight into the intended usage of such sophisticated tools by their developers.

Francis Wolff

01/26/2024

Python’s Gen, Iter, and Iters: The Right Time

Francis Wolff

Iterables in Python: Unveiling the Power of Custom Iterators

The __getitem__ approach

On-the-Fly Element Generation in Python

Exploring Iterators in Python

Should the ‘Sentence’ Class Implement a __next__ Method?

Generators in Python

Generators and Iterators: Enhancing Class Functionality

Generators, Iterators, and Iterables in Python

Conclusions

Francis Wolff

The getitem approach

Should the ‘Sentence’ Class Implement a next Method?