Mastering Synchronization in Python: A Detailed Guide

Master Python Synchronization: A Comprehensive Guide

16 min read

Francis Wolff

01/26/2024

There’s a challenge that every developer has likely encountered during their coding journey: a time-consuming task that halts any other program activity. A prompt termination is impossible, leaving no choice but to deploy the Ctrl+C command.

Thankfully, Python offers a variety of solutions to this issue, and with the right approach, developers can achieve efficient performance without any hang-ups. In this guide, we’ll take an in-depth look into one such approach – using Threads to enhance programming flexibility.

Unlocking Flexibility with Threads

Threads play a critical role in creating responsive programs. Especially in designing user interfaces (UI), these independent components allow for multitasking, making your program available for other tasks even when one operation is ongoing.

Understanding Threads and learning how to deploy them creatively in your programs will not only improve efficiency but will also provide a more responsive and seamless user experience. After all, no user likes an application that becomes unresponsive due to a time-consuming operation.

This guide aims to streamline your threading knowledge, organizing and presenting the essential points so that you can effectively leverage threads in your Python applications.

Key Takeaways:

Threads are independent components that allow for multitasking;
Python supports the use of threads, allowing developers to craft more responsive and flexible programs;
Threads are crucial in creating more flexible, efficient, and user-oriented applications;
Our guide will provide you with a comprehensive understanding of threading in Python.

By appreciating the significance of threads and mastering their usage in Python, you can develop more responsive and efficient programs. This not only reduces the instances of frozen apps due to time-consuming operations but also enhances the user experience by ensuring seamless program execution.

Exploring the World of Threads in Python

For those unfamiliar with the term in the context of coding, ‘thread’ may conjure up images of a spool of cotton or the intricacies of a woven fabric. However, in the realm of computer programming, threads paint a very different picture.

Threads in programming are analogous to the various threads that construct a piece of fabric. They intersect and weave together, creating a cohesive whole. But unlike the threads in your favorite sweater, threads in a program represent logical paths, each traversing from inception to termination. What’s more exciting is that these paths are not singular – multiple threads can exist within a single process.

Diving Deeper: The Concept of Concurrent Computing

Worth noting, however, is that threads within the same process don’t run strictly simultaneously in Python. A single-core processor can only execute one computation at a time. It’s amusing to think that in an era before multi-core processors became commonplace, we could still have multiple programs open and functioning concurrently. So how is this possible?

To understand this, we delve into the world of concurrent computing. Each thread is capable of being executed in small segments, giving the computer the ability to quickly alternate between them. At one moment, the computer may be checking your typo in a document, and in the next, it could be loading a webpage, and shortly thereafter, writing data to the hard drive. This swift switching is what lends programs their fluidity, something we leveraged in our work with Qt to prevent application freezing.

Threads – Not Always a Silver Bullet

However, threads have their limitations, particularly when a task is computationally demanding. For tasks like downloading data, waiting for user input, or writing to a hard drive, threads thrive as they are not computationally intensive, enabling simultaneous execution. Comparatively, rendering an image (as seen in video games) requires a myriad of complex calculations which can bog down processing power, preventing the smooth execution of other threads.

Understanding these limitations and advantages is crucial, as we’ll see when we dive into more complex examples and consider different approaches.

Understanding the Nuances of Threads in Python: A Primer

Embarking on the journey to understand threads in Python might seem daunting, but the truth is, unraveling the essence of threads is like following a recipe. It starts with simpler elements and gradually builds on those to create a more complex, yet organized whole.

To create a strong foundation, let’s start with a simple example of Python threading.

from time import sleep

def print_numbers(number, delay=1):
    for i in range(number):
        print(i)
        sleep(delay)

This function, ‘print_numbers’, will print numbers with a delay, which is by default set to one second. We’re choosing a simple, non-computationally expensive function for our initial exploration of threading. If we run print_numbers(10), the function will take ten seconds to execute, essentially halting any other program activity for that duration.

Creating a separate thread allows the function to run independently, while the rest of the program can continue performing other tasks.

from threading import Thread

t = Thread(target=print_numbers, args=(10,))
t.start()
print('Thread started')

In the code above, we create a thread, specify ‘print_numbers’ as the function to run within it, and kick start the thread with t.start(). The function itself is passed, not the result, which is why we omit the parentheses in ‘target=print_numbers’. Arguments are passed as a tuple.

Running the script will generate an output that looks something like this:

0
Thread started
1
2
3
4
5
6
7
8
9

Pro tip: You can also pass keyword arguments to the function, like this:
t = Thread(target=print_numbers, args=(10,), kwargs={'delay': .2})
t.start()
print('Thread started')

One caveat: you might notice, the ‘Thread Started’ message may not always appear after the initial ‘0’. This discrepancy is a result of the operating system’s execution order, which we have no direct control over.

Moreover, Python threads provide a .join() method, which, when invoked, makes the execution halt until the concerned thread completes its task. This is especially useful when you want to proceed with subsequent lines of code only after a particular thread finishes its execution.

t = Thread(target=print_numbers, args=(10,), kwargs={'delay': .2})
t.start()
print('Thread started')
t.join()
print('Thread finished')

In the final print statement, ‘Thread finished’ will always appear after the function has completed execution.

You’re not restricted to create a single thread; you can start as many as needed. For example:

t1 = Thread(target=print_numbers, args=(10,), kwargs={'delay': .5})
t2 = Thread(target=print_numbers, args=(5,))
t1.start()
t2.start()

t1.join()
t2.join()

In this case, numbers from both threads are printed concurrently, demonstrating the power of multi-threading. Although creating threads as t1, t2 might not be the most elegant solution, it effectively illustrates the key point.

Understanding Shared Memory in Python Threads

Exploring threads in Python invariably leads to a critical topic: shared memory. As a developer, you’ve likely noticed that variables defined in one program are not accessible in another. Each program operates within a designated memory space. However, threads within the same program share the same memory space, thereby allowing access and modification of the same data.

A Closer Look at Shared Memory

An illustrative example involves the modification of elements in a NumPy array:

import numpy as np
def increment_by_one(array):
    array += 1
data = np.ones((100,1))
increment_by_one(data)
print(data[0])

In this code snippet, increment_by_one function takes an argument, an array, and increments its value by one. It’s worth noting that the function doesn’t return any value. Instead, it mutates the original array, an action made possible because arrays are mutable data types.

The behavior changes when you pass a number instead of an array as the argument. If a number is passed, the effect of incrementing won’t be reflected outside the function, as numbers are immutable data types.

Shared Memory with Threads

Now let’s see how things change when we incorporate threads:

t = Thread(target=increment_by_one, args=(data,))
t.start()
t.join()
print(data[0])

Here, data defined in the main thread is passed as an argument to the child thread. The child thread modifies the data. Crucially, this modification is also applied to the data in the main thread because they share the same memory space.

This ability to share memory between threads allows for swift data extraction from a thread. If increment_by_one had returned a new array, as shown below, it would be impossible to fetch that new array from the thread.

def increment_by_one(array):
  new_arr = array + 1
  return new_arr

Understanding this behavior is critical when designing code that utilizes threads, especially when aiming to fetch information from them.

Shared Memory Across Multiple Threads

Shared memory isn’t limited to just two threads; you can share data among more than two threads as well. Let’s consider an example:

from threading import Thread
import numpy as np
def increment_by_one(array):
    for i in range(10000):
        array += 1
def divide_by_factor(array):
    for i in range(10000):
        array /= 1.1
data = np.ones((100,1))
t1 = Thread(target=increment_by_one, args=(data,))
t2 = Thread(target=divide_by_factor, args=(data,))
t1.start()
t2.start()
t1.join()
t2.join()
print(data[0])
print(np.mean(data))

A More Extreme Example

NumPy is a powerful Python library that is efficient in performing extensive operations. When you increase or divide values in an array using NumPy, there’s an underlying loop that processes each element individually even though it’s not explicitly visible. This hidden complexity is handled by NumPy, which ensures that the loop’s execution is uninterrupted. Therefore, when working with this library, you won’t face a situation where some array elements are initially increased and then divided while others follow the opposite order.

To better understand the intricacies of threading and shared memory, let’s modify our operations. There are two functions for our cause; one to increase values and another to divide them.

def increase_by_one(array):
    for i in range(len(array)):
        array[i] += 1

def divide(array):
    for i in range(len(array)):
        array[i] /= 1.1

These functions, though inefficient compared to our initial usage of NumPy, help illustrate our point more clearly. We can now execute the functions with threading:

data = np.ones((100000,1))

t = Thread(target=increase_by_one, args=(data,))
t2 = Thread(target=divide, args=(data,))
t.start()
t2.start()
t.join()
t2.join()
print(np.max(data))
print(np.min(data))

The printed output may show different maximum and minimum values in the array. Why does this happen? Because there’s an unpredictable nature to threading that might reverse the operation order for some elements.

This outcome underscores one of the key challenges with threading: its unpredictability makes it difficult to anticipate the flow, leading to potentially varying outcomes for the same program with each execution.

Hence, poorly designed multi-threaded programs can be a nightmare to debug due to these threading subtleties. It’s vital to approach multi-threading with a clear strategy, proper design, and a deep understanding of synchronization to avoid unexpected results and tough debugging situations.

3d rendering of abstract code on black background

Exploring Thread Synchronization with Locks in Python

In our prior examples, we observed that while executing multiple threads, the operating system dictates the execution sequence of each thread. Therefore, re-executing the code may lead to varying outcomes. To synchronize different threads, we could utilize a mechanism known as Locks.

What is a lock? A lock is a unique object that can be either acquired or released. Attempting to acquire a lock will cause the program to wait until the lock is free. This implies that a lock can’t be held by more than one recipient simultaneously. It allows programme flow control, ensuring one process completes before another starts.

Let us delve into a simple Python implementation based on the previous example:

from threading import Lock
lock = Lock()
def add_one(array):
    lock.acquire()
    for i in range(len(array)):
        array[i] += 1
    lock.release()
def divide_values(array):
    lock.acquire()
    for i in range(len(array)):
        array[i] /= 1.1
    lock.release()

Notice the creation of the lock object at the beginning. Every function now starts by attempting to acquire the lock. If it is already held, the function will wait until it’s released. This indicates that the loop, which either increments or divides each element, must finish running before the other is allowed to proceed.

Utilizing context managers simplifies the syntax further:

def add_one(array):
    with lock:
        for i in range(len(array)):
            array[i] += 1
def divide_values(array):
    with lock:
        for i in range(len(array)):
            array[i] /= 1.1

It’s also worth noting that we could acquire a lock in the main thread to defer the execution of the two functions until a particular moment. Here’s an example:

lock.acquire()
data = np.ones((100000,1))
t = Thread(target=add_one, args=(data,))
t2 = Thread(target=divide_values, args=(data,))
t2.start()
t.start()
print('Threads are still not running')
data += 10
lock.release()
t.join()
t2.join()
print(np.max(data))
print(np.min(data))

In this instance, the main thread acquires the lock, causing other threads to idle until the lock is relinquished. Only one thread executes at any given time. However, bear in mind that the sequence in which threads run depends on the operating system’s implementation.

Ensuring Thread-Safety with Reentrant Locks in Python

When working with threading in Python, locks can prove extremely useful in preventing data corruption by ensuring that a particular code block completes execution before another thread modifies the same data. However, there’s a particular scenario where the usage of locks can lead to an unexpected deadlock situation.

Let’s consider the functions increment_by_one and divide_values, both of which acquire a lock. Suppose you wish to execute increment_by_one in the main code, while preventing other threads from executing. You might set up your code like this:

lock.acquire()
data = np.ones((100000,1))
t = Thread(target=increment_by_one, args=(data,))
t2 = Thread(target=divide_values, args=(data,))
t2.start()
t.start()
increment_by_one(data)
lock.release()

Executing this code may present an unexpected outcome: the program hangs indefinitely, creating a puzzling situation especially for those relatively new to threading. To understand why, let’s dissect the code:

The lock is initially acquired in the main thread, rendering the data immune to modification from other threads. However, when we explicitly call increment_by_one, it also attempts to acquire the lock. This effectively puts the program in a never-ending pause since the lock it needs can’t be released.

A potential solution to this deadlock issue comes in the form of another object: the Reentrant Lock, or RLock. Contrary to the basic lock, an RLock is thread-aware, meaning it only pauses execution if it’s requested from a different thread. Since increment_by_one is executed from the main thread, the RLock will not obstruct its execution. Here’s how it works:

from threading import RLock
lock = RLock()
...

Implementing RLock in the earlier example will allow the program to execute as planned.

Reentrant locks are an excellent tool when you anticipate that certain functions may be executed from different threads, but are safe to run within the same lock context. However, caution is essential when designing your program to produce the expected behavior. Sometimes, reconfiguring your code can allow the substitution of RLocks with basic locks, and vice-versa. The choice between the two does not only rely on the current situation but also on what would be best for the codebase in the long run.

Thread Management: Implementing Timeouts

When dealing with multi-threading in Python, it’s not uncommon to encounter unexpected execution scenarios. These could range from events occurring prematurely or a software bug causing a thread to hang indefinitely. Such events have the potential to tie up resources for longer than necessary. One useful solution to handle these sticky situations is implementing timeouts for blocking operations.

Consider a simple Python thread that uses a lock:

def increment_values(array):
    if lock.acquire(timeout=1):
        for i in range(len(array)):
            array[i] += 1
        lock.release()
data = np.ones((100000,1))
t = Thread(target=increment_values, args=(data,))
lock.acquire()
t.start()
sleep(5)
lock.release()
t.join()
print(data[0])
print(np.mean(data))

This is quite similar to preceding examples, with minor modifications. Notably, the increment_values function attempts to acquire a lock, and if unsuccessful after waiting for one second (the timeout period), continues execution. This means that even when the lock is held by another thread and not yet released, the increment_values thread can continue.

Upon execution, you would observe that the increment_values function runs as intended, despite the lock being initially acquired by the main thread. However, it’s crucial to understand that in a real-world application, a timed-out lock could indicate that a certain anticipated state hasn’t been achieved. For instance, if the intention was to perform the increment operation exclusively before another operation, a timed-out lock would break this exclusivity.

It’s also noteworthy that the join function offers the ability to utilize timeouts. However, when implementing timeouts, it’s key to design your code to handle potential timeout situations. If a timeout occurs, it might signify that your code didn’t execute as planned. Therefore, contingencies should be put in place to handle such situations to maintain the health of your application.

Navigating Resources: Python Course Read the Docs

In your quest to master Python synchronization, it’s crucial to leverage the wealth of resources available. One invaluable tool at your disposal is the “Python Course Read the Docs.” This comprehensive online documentation provides a structured and well-documented guide to Python, offering insights into every facet of the language.

Within this expansive resource, you’ll discover detailed explanations of threading in Python, replete with code examples and best practices. Whether you’re a beginner seeking to grasp the fundamentals or an experienced developer looking to fine-tune your synchronization skills, the Python Course Read the Docs offers a plethora of information to suit your needs.

Combining this resource with the insights shared in this exhaustive guide, you’ll be well-equipped to navigate the intricate world of Python synchronization. Remember, mastering this skill not only enhances your programming versatility but also ensures that your applications deliver a responsive and uninterrupted user experience. Dive in, explore the depths of Python synchronization, and let your coding journey reach new heights of efficiency and innovation.

By seamlessly integrating the Python Course Read the Docs into your learning journey, you can further elevate your understanding of Python’s threading capabilities. It’s a valuable companion to this comprehensive guide, enhancing your grasp of synchronization and empowering you to create robust, responsive Python applications.

Conclusion

In our exploration of Python synchronization, we’ve unveiled the threads that empower responsive and efficient programming. From fundamentals to shared memory and synchronization techniques, you’re now equipped to create seamless, high-performance applications. With the Python Course Read the Docs as your guide, your coding journey reaches new heights of efficiency and innovation. Embrace the art of Python synchronization and unlock a world of possibilities.

Francis Wolff

01/26/2024