From what I have been reading Python has GIL that ensures only one thread can run the python code. So I am slightly confused when we say collections.dequeue is thread-safe. If only one thread is run at a time wouldn't objects be in a consistent state already?
It would be great if someone can give an example of how a list in python is not thread safe and using a dequeue possible counters that?
From what I have been reading Python has GIL that ensures only one thread can run the python code. So I am slightly confused when we say collections.dequeue is thread-safe. If only one thread is run at a time wouldn't objects be in a consistent state already?
It would be great if someone can give an example of how a list in python is not thread safe and using a dequeue possible counters that?
Share Improve this question asked Mar 21 at 19:42 Aniket ThakurAniket Thakur 69.1k41 gold badges289 silver badges297 bronze badges 02 Answers
Reset to default 4The GIL doesn't prevent race conditions, it only protects python internals from corruption, you cannot have a dangling pointer or a use-after-free inside the python interpreter. the only way they were able to make a GIL-free interpreter in python3.13 is to put a lock (or more than one lock) in every object in python, including things you don't think about like stack variables, and the stack frame.
You can have race conditions with the GIL for the same reason you needed locks on a CPU with 1 core, the thread can be suspended at any moment in time, python threads automatically drop the GIL and suspend themselves every few milliseconds to allow other threads to run, if you have 2 atomic operations, each one of them is atomic, the two operations together are not atomic.
CPython deque is similar to C++ deque, it is a linked list of small arrays (lists), currently 64 items per block.
A single append operation involves 2 operations
- put item at the
head
of the array - increment the
head
counter.
if the current head block is full it has to allocate a new empty block, and put it in the end of the linked list, to illustrate those 2 operations in python code
block = [None for i in range(64)] # one block in the linked list
head = 5
def append(obj):
block[head] = obj
# thread could get suspended here
head += 1
There is also a tail counter (called left and right in source code), so it can grow in 2 directions, the tail logic decrements instead of incrementing.
If you implemented that in python it won't be thread-safe without a lock, one thread will put its object at the current head
, then get suspended before it increments the head
counter, and another thread will end up overwriting this location before the first thread could increment the head count, then the first thread will increment it again, essentially the data of the first thread is lost, and there is now an empty slot in the middle of the deque.
In CPython without the GIL the entire deque is locked during this append operation, so the scenario above is not possible, and when the GIL is used, the GIL is not dropped until the append
operation is done, as the deque
is written in C, not in python. you can control the GIL in C extensions, but you cannot control it in python.
CPython list
has atomic operations too, its append
and pop
, are both thread-safe. however implementing a deque
using a simple list
will be very slow, popleft
done with pop(0)
will be an O(n)
operation, whereas python deque
makes it O(1)
Every other operation in the list like index
or remove
or indexing it are not thread-safe, and can remove or access the wrong element, demo: concurrent append + remove removing the wrong element, you'll need an external lock on all operations if you use those operations concurrently on a list or deque.
The GIL does not "ensure that only one thread can run..."
Think of a commercial kitchen. The recipes are the program, The cooks are the CPUs, and the orders-in-progress are the threads. The GIL ensures that no more than one cook can be in the kitchen at any given moment in time, but it does not prevent more than one order from being in-progress at the same time. The single cook still can divide their attention between several different orders, and that creates opportunities for conflict (e.g., two customers both want crêpes for breakfast, but the kitchen has only one crêpe pan.)
Having the GIL allows the authors of multi-threaded programs to be a little bit more relaxed as compared to other programming systems that have concerns with "memory visibility" and "happens before relationships", but the basic need for mutexes remains:
If the program has a collection of several variables, and
If one thread needs to perform a sequence of operations on those variables, and
If you would not want any other thread to use those variables when the sequence is only half-way done, then
You must use some kind of "mutual exclusion" mechanism (e.g., Python's
Threading.lock
objects) to prevent it...
...Even when the GIL is enabled.*
*The latest version of C Python no longer needs the GIL for its own self-consistency, but the GIL still is there because removing it would break some existing Python programs. If you build C Python yourself, an "experimental" configuration option allows you turn off the GIL.