Python variables - behind the scenes

We will now examine how Python stores objects in memory, and the link between variables and memory location. You might be wondering why you need to worry about this, but it is actually essential to understand this in order to make best use of Python's capabilities and avoid mistakes/bugs.

Assignment and modification

Consider the following two examples. First:

In [1]:
a = 2
b = a
print(a, b)
2 2
In [2]:
a = 4
print(a, b)
4 2

This should hopefully make sense so far.

Now consider the following example:

In [3]:
a = [2, 3, 4]
b = a
print(a, b)
[2, 3, 4, 5] [2, 3, 4, 5]

In this case, modifying a modified b too! This is not as intutitive... But if we do:

In [4]:
a = 9
print(a, b)
9 [2, 3, 4, 5]

This time, changing a did not change b - what is happening?

The key is to understand that doing:

variable = something

will change which object variable is pointing to in memory (assignment). On the other hand, when calling a method with:


some (but not all) methods will modify the variable in-place (more information below).

Let's go over the examples above but this time with a graphical representation, where the yellow circles show the variables, and the blue rectangles show the objects in memory. If we do:

In [5]:
a = 2
b = a
a = 4

then what is happening is the following.

First, when doing a = 2 we create space in memory for the value 2 and we assign that location in memory to the variable a:


By doing b = a, we are now assigning the variable b to point at the same object as a:


And finally by doing a = 4 we re-assign a to point at a different place in memory (containing 4) but b still points at the same object (2):


Now if we follow the same logic for the second example:

In [6]:
a = [2, 3, 4]
b = a

we again start off by creating space in memory for the list [2, 3, 4], then we point the variable a to that location.


By doing b = a, we then point b to the same location as a, so the list exists only once in memory (this is very important):


We now modify, in-place, the object that a is pointing to with a.append(5) - the concept of modifying the object is very important - we are not creating a new list, it is still in the same place in memory, even if it has one extra element now:


This means that since b is pointing to the same place in memory, it will also see a list with (now) four elements!

Then, if one does a = 9, then one is not modifying the list, but instead assigning a to point to a region in memory with the value 9:


In order to talk about this behavior, we use the terms copying and referencing. When we do:

variable = something

then variable is only a reference to something, not necessarily an entirely new object.

Another important point is that what is on the right hand side will get evaluated first, and in some cases will result in the creation of a new object. In each of the following examples, the term on the right hand side is new and creates a new object in memory:

In [7]:
a = 2
b = a + 1
c = b * 2
print(a, b, c)
2 3 6

while in other cases, the object on the right hand side already exists, in which case the object on the left hand side is a reference to the same object:

In [8]:
a = [2,3,4]
b = a  # b points to the same object than a


In some cases, the behavior described above is not desirable, and we want to make a true copy, not just a reference, because we want to change b without changing a:

In [9]:
from copy import deepcopy
a = [2,3,4]
b = deepcopy(a)
print(a, b)
[2, 3, 4, 5] [2, 3, 4]


As mentioned above, some methods modify object in-place:

In [10]:
a = [1,2,3]
a.append(5)  # modifies ``a``

and some will return a copy rather than modifying the object.

In [11]:
s = 'hello'
s.upper()  # returns a copy of the string in uppercase without modifying s

It should be clear from the documentation (e.g. s.upper?) how a particular method behaves.

Mutable vs immutable objects

Some objects are immutable, which means that they cannot be modified - examples include float, int, str. For instance, when doing:

In [12]:
a = 1.
a = 2. 

In the second line, a new location in memory is created for 2., and a points at that object, not at 1. (in other words, the float is not being changed, it is a that is pointing to a different object).

On the other hand, list, dict, and Numpy arrays are mutable, which means the object can be modified:

In [13]:
a = [1,2,3]

After the second line, a still points at the same list, but the list has now been modified.


A final but important point is that when passing variables to functions, variables are passed as references, so:

In [14]:
def do(x):
a = [1,2]
[1, 2, 1]

However, as before, if using the x = something notation, x is reassigned to a different memory location, so:

In [15]:
def do(x):
    x = 0  # re-assigns x to 0, but only in the function

a = [1,2]
[1, 2]

Exploring further

It is important to always bear these distinctions in mind, as they can be the source of bugs if not correctly understood. If you want to explore these distinctions more, you may find the id(...) function useful - given a Python object, it returns the memory address of the object, so that two variables pointing at the same object will have the same id:

In [16]:
a = [1,2,3]
b = a
print(id(a), id(b))
4566334472 4566334472

Copying and Referencing Numpy arrays

With Numpy arrays, one has to be particularly careful with the copying/referencing distinction. With a few exceptions, most slicing/masking operations in Numpy indicate references, not copies, to the data:

In [17]:
import numpy as np
In [18]:
x = np.arange(10)
y = x
y[3] = 1
In [19]:
array([0, 1, 2, 1, 4, 5, 6, 7, 8, 9])

This is similar to lists, but now consider the following:

In [20]:
x = np.arange(10)
y = x[::2]
y[3] = 1
In [21]:
array([0, 1, 2, 3, 4, 5, 1, 7, 8, 9])

Even though we took a slice with a given start, end, and slice, the resulting array was still just a reference, or view, of the array in the original array! (note that for lists, x[::2] returns a copy!). This can be very handy when combined with masking:

In [22]:
x = np.arange(10)
x[x < 5] = 0.
array([0, 0, 0, 0, 0, 5, 6, 7, 8, 9])

There is one exception to the referencing, which is:

In [23]:
x = np.arange(10)
y = x[[1,3,2,2]]  # returns a new array, not a view
y[0] = 9
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

As before, you can explore this further to understand in what cases references or copies are made. However, be aware that the id of a view will be different from the original array, even though the view is actually pointing to a subset of the original array.

In the case of Numpy arrays, one can force a copy by doing:

In [24]:
x = np.arange(10)
y = x.copy()
y[0] = -1
In [25]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [26]:
array([-1,  1,  2,  3,  4,  5,  6,  7,  8,  9])


The following questions are just to test your understanding of the variable assignment - you don't need to write any code - just try and think of what the output will be, then you can try it out to check if you got it right:

What will a be after the following?

a = [1, 3., [1, 2, 3], 'hello']
b = a[0]
b = 4.

What will c be after the following?

c = [1, 3., [1, 2, 3], 'hello']
d = c[2]

What will e be after the following?

e = [1, 3., [1, 2, 3], 'hello']
f = e[2]
f = [1, 2]

What will g be after the following?

g = [1, 2, 3, 4]
h = g[::2]
h[0] = 9

What will i be after the following?

import numpy as np
i = np.array([1, 2, 3, 4])
j = i[::2]
j[0] = 9
In [27]:
# You can try here to see if your guess is correct!