The process of duplicating data or object structures in Python – which may seem simple at first glance – can surprisingly have complex implications on your code. Two primary methods exist for undertaking this process; either by duplicating the data entirely or using references for the objects, a method that significantly lowers memory usage. This comprehensive guide aims to simplify this complex subject, focusing on the differences between deep and shallow copies of objects in Python – and including Python’s custom classes.
A Brief Recap: Mutable Data Types
Discussions on deep and shallow copies are only significant in light of mutable data types. To comprehend this topic, an understanding of mutable data types is essential. In case of any confusion, a quick recap can be found in the linked article. Let’s delve into the basics of data copying, using a list as an example:
>>> a = [1, 2, 3]
>>> b = a
>>> print(b)
[1, 2, 3]
>>> a[0] = 0
>>> print(b)
[0, 2, 3]
This code demonstrates that if an element in list a is modified, the changes reflect in list b as well. However, to prevent this, we can make a copy of a:
>>> a = [1, 2, 3]
>>> b = list(a)
>>> a[0] = 0
>>> print(b)
[1, 2, 3]
Now, a and b are two independent objects. This independence can be confirmed by checking their identity with id(a) and id(b). However, the conversation doesn’t end here. The complexity of copying data increases with the complexity of data structures. Let’s examine a case with a list of lists:
>>> a = [[1, 2, 3], [4, 5, 6]]
>>> b = list(a)
If you use the id() function to check the identities of a and b, you’ll find they’re different. However, if we manipulate a:
>>> a.append([7, 8, 9])
>>> print(b)
[[1, 2, 3], [4, 5, 6]]
Despite the append operation to a, b remains unaltered. This might seem like great news until we perform the following:
>>> a[0][0] = 0
>>> print(b)
[[0, 2, 3], [4, 5, 6]]
Modifying a list within a surprisingly alters b. Here, we delve into the definitions of deep and shallow copies.
Shallow vs. Deep Copies: An Exploration
When copying a to b using list(a), we introduced the concept of shallow copying. Here, although a new element was created (as indicated by the differing identities of a and b), the reference points within these lists remained the same. This can be confirmed by checking the identity of the first element of a and b:
>>> id(a[0]) # 140381216067976
>>> id(b[0]) # 140381216067976
Shallow copying, as the name suggests, only duplicates the outermost structure, not the underlying layers. These characteristics hold true for other data structures like dictionaries. Shallow copies of lists can also be made using slicing:
>>> b = a[:]
For dictionaries, you have the option to use either the .copy() method or the dict() function:
>>> my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
>>> new_dict = my_dict.copy()
>>> other_option = dict(my_dict)
But if you intend to make a deep copy, you need to import Python’s copy module. Deep copies are an independent copy of the entire object hierarchy. First, let’s demonstrate a shallow copy using the copy module:
>>> import copy
>>> b = copy.copy(a)
>>> id(a[0]) # 140381216067976
>>> id(b[0]) # 140381216067976
As expected, the identities of a[0] and b[0] remain the same. To create a deep copy, the method deepcopy() is used:
>>> c = copy.deepcopy(a)
>>> id(c[0]) # 140381217929672
The Interplay of Deep Copy and Shallow Copy with Custom Classes in Python
So far, we have discussed the intricacies of shallow and deep copies in Python with regards to standard data structures such as lists and dictionaries. But how about user-defined classes? The uniqueness of Python’s copying mechanism comes to play in such instances.
Consider you define a class like below:
class MyClass:
def __init__(self, x, y):
self.x = x
self.y = y
If an instance of the class is created as my_class, and this instance is then simply assigned to my_new_class, like so my_new_class = my_class, both classes will reference the same object. This can be verified by comparing the id() of both classes:
my_class = MyClass([1, 2], [3, 4])
my_new_class = my_class
print(id(my_class)) # id - 140397059541368
print(id(my_new_class)) # id - 140397059541368
The id() values for both my_class and my_new_class are the same, implying that they point to the same object. Consequently, any modifications such as my_class.x[0] = 0 will also reflect in my_new_class.x:
print(my_new_class.x) will output [0, 2].
You might think that employing the copy module would effectively create a completely independent object. However, using copy.copy(), the outcome becomes a bit more nuanced:
import copy
my_class = MyClass([1, 2], [3, 4])
my_new_class = copy.copy(my_class)
print(id(my_class)) # id - 140129009113464
print(id(my_new_class)) # id - 140129008512416
The id() values are now different, indicating that my_new_class is a new, distinct object. However, any changes made to the mutable attributes of my_class will also reflect in my_new_class. For example, my_class.x[0] = 0 will result in print(my_new_class.x) outputting [0, 2] again.
On the other hand, employing copy.deepcopy() would create an entirely independent copy of my_class, including the mutable attributes. Therefore, changes to my_class after the deep copy will not affect my_new_class.
Interestingly, Python allows you to customize the behaviour of shallow and deep copies of objects, providing users with an incredible degree of flexibility and control.
Customizing Shallow and Deep Copies in Python
Python is renowned for the immense level of control it allows programmers over all aspects of coding, including the creation of shallow and deep copies. This is achieved by overriding the __copy__ and __deepcopy__ methods. Here’s how this can be done and why:
Consider a scenario where you need to create a copy of a class, maintaining all references except one, which needs to be independent. Python allows this using the following code:
class MyClass:
def __init__(self, x, y):
self.x = x
self.y = y
self.other = [1, 2, 3]
def __copy__(self):
new_instance = MyClass(self.x, self.y)
new_instance.__dict__.update(self.__dict__)
new_instance.other = copy.deepcopy(self.other)
return new_instance
Let’s break this down:
When copy.copy is used, the __copy__ method is executed with the object itself as the argument. The method returns the copied object. A fresh instance of the class is created first, which can also be done by replacing MyClass with type(self). All attributes of the original instance are then copied to the new one using the __dict__ attribute update, defining the standard behavior for a shallow copy of an object.
An additional step is added to achieve the specialized functionality of copying one specific attribute with a deep copy. This showcases how Python can provide granular control over data copying.
Now, when we try to repeat the tests:
my_class = MyClass([1, 2], [3, 4])
my_new_class = copy.copy(my_class)
print(id(my_class)) # Outputs - 139816535263552
print(id(my_new_class)) # Outputs - 139816535263720
my_class.x[0] = 0
my_class.y[0] = 0
my_class.other[0] = 0
print(my_new_class.x) # Outputs - [0, 2]
print(my_new_class.y) # Outputs - [0, 4]
print(my_new_class.other) # Outputs - [1, 2, 3]
The output confirms that the other attribute was deeply copied, and changing it in one class does not affect it in the other.
About the dict attribute
As is known, objects in Python are adorned with attributes, manifesting as variables akin to textual strings – entities you can discern and input via a keyboard.
This perspective allows us to envisage an object’s attribute ensemble as a lexicon, akin to a dictionary. Reflecting on the class discussed earlier, one can probe this notion through:
print(my_class.__dict__)
{'x': [0, 2], 'y': [0, 4], 'other': [1, 2, 3]}
The essence of this is becoming clear. One might even venture to modify the __dict__ directly:
my_class.__dict__['x'] = [1, 1]
print(my_class.x)
[1, 1]
Pycon Reflections
This implies a duality of interaction: the utilization of either .x or __dict__[‘x’] to engage with an identical element within your object. Additionally, this serves as a swift avenue to acquaint oneself with all the attributes your object encompasses.
Custom deep copy
Delving into the core theme of this discourse, the imperative lies in the customization of the profound duplication process within the class structure. It closely parallels the copy method, albeit with the addition of an extra parameter:
Consider this illustration:
class MyCategory:
def __init__(self, alpha, beta):
self.alpha = alpha
self.beta = beta
self.miscellaneous = [1, 2, 3]
def __deepcopy__(self, memodict={}):
new_instance = MyCategory(self.alpha, self.beta)
new_instance.__dict__.update(self.__dict__)
new_instance.alpha = copy.deepcopy(self.alpha, memodict)
new_instance.beta = copy.deepcopy(self.beta, memodict)
return new_instance
Python
At a cursory glance, it bears a striking semblance to the copy method. However, the exigency of an additional parameter, namely “memodict,” stems from the intrinsic essence of deep replication. Given that each object referenced within the primary class must undergo replication, there exists a looming specter of infinite recursion. This ominous outcome could be triggered if an object somehow establishes a self-referential link. Even in the absence of a bottomless recursive loop, the possibility of duplicating identical data multiple times cannot be overlooked. Memodict, in this context, functions as a sentinel, vigilantly monitoring the objects already subjected to duplication. It serves as the bastion against the unfathomable depths of recursive duplication, ensuring the integrity of the deepcopy method remains inviolate.
In the exemplar above, our stratagem revolves around thwarting the deep duplication process from generating a novel “miscellaneous” list. As a result, we achieve a heterogeneous deep copy scenario, wherein “alpha” and “beta” emerge as pristine entities, while the “miscellaneous” entity retains its original identity. In the event of executing the code snippet below:
my_category = MyCategory([1, 2], [3, 4])
my_new_category = copy.deepcopy(my_category)
print(id(my_category))
print(id(my_new_category))
my_category.alpha[0] = 0
my_category.beta[0] = 0
my_category.miscellaneous[0] = 0
print(my_new_category.alpha)
print(my_new_category.beta)
print(my_new_category.miscellaneous)
Python
The ensuing output manifests as follows:
139952436046312
139952436046200
[1, 2]
[3, 4]
[0, 2, 3]
Evidently, it becomes apparent that .alpha and .beta remain unaltered, while .miscellaneous dutifully mirrors the modifications wrought upon the original class.
Conclusion
Making copies of objects in Python is far from a simple operation. Even though the mechanism seems straightforward, the semantics of copying in Python can become complex, especially when dealing with mutable data types. A deep understanding of shallow versus deep copying is necessary to correctly navigate Python’s memory management, prevent unexpected behavior, and optimize code efficiency.
Customizing the behavior of shallow and deep copying methods further extends the power Python hands to programmers, allowing for fine-tuned control over every aspect of data handling. With this knowledge, Python developers can write more sophisticated, efficient, and reliable code. The power of Python’s object copying, especially the customizable behavior, paints a vivid picture of why Python continues to be a favorite among programmers.