(I know that this question has come up several times on Stack Overflow already, but I still think it is worth clarifying some details.)
Suppose that there are two independent Python processes: a.py
and b.py
which inflict changes on a json file r.json
through their respective functions funcA
and funcB
. Both functions funcA
and funcB
would run many times, each time starting at unpredictable moments, so it is essential to lock funcB
while funcA
is running, and vice versa. As funcA
and funcB
correspond to different processes, classical synchronisation techniques like threading locks cannot be used. After some research on the internet, I have tried the following two methods: I/O locks and flock
from the fcntl
library. (The fact that fcntl
only applies to POSIX is not a concern for me, as I am exclusively working on a Linux-based environment.)
Method 1: I/O locks
We first define the following util functions.
from pathlib import Path
def acquire_file_lock(filename: str) -> None:
LOCK_FILE_PATH = filename + ".lock"
with open(LOCK_FILE_PATH, 'w') as _:
pass
def file_lock_exists(filename: str) -> bool:
LOCK_FILE_PATH = filename + ".lock"
return Path(LOCK_FILE_PATH).is_file()
def remove_file_lock(filename: str) -> None:
LOCK_FILE_PATH = filename + ".lock"
Path(LOCK_FILE_PATH).unlink(missing_ok=True)
Here is the code in a.py
.
import time
def funcA():
while file_lock_exists("r.json"):
time.sleep(0.1)
try:
acquire_file_lock("r.json")
do_something_A()
remove_file_lock("r.json")
except:
remove_file_lock("r.json")
Same code in b.py
, but funcA
is now replaced by funcB
which would contain a different set of code in its corresponding do_something_B()
.
Method 2: Flock from fcntl
We first define the following util class.
import fcntl
class FileLocker:
def __init__(self, filename: str):
self.filename_with_lock = filename
def __enter__(self):
self.fp = open(self.filename_with_lock)
fcntl.flock(self.fp.fileno(), fcntl.LOCK_EX)
def __exit__(self, _type, value, tb):
fcntl.flock(self.fp.fileno(), fcntl.LOCK_UN)
self.fp.close()
Here is the code in a.py
.
def funcA():
with FileLocker("r.json"):
try:
do_something_A()
except:
pass
As before, same code in b.py
, but funcA
is now replaced by funcB
which would contain a different set of code in its corresponding do_something_B()
.
Comparison between the two methods
Both methods seem to work. Method 1 has the superficial advantage of relying on an implementation that is easier to understand. Nonetheless, it seems that Method 2 is more reliable: Method 1 relies on checking whether the lock file exists every 0.1 seconds (we can indeed increase the frequency of checking to every 0.01 or even 0.001 seconds, with the potential impact of draining the CPU core), and there is indeed a very small chance of race conditions happening if funcA
and funcB
both start executing at almost the same instant. On the other hand, this "waiting" mechanism seems to be automatically taken care of by the flock library in Method 2.
Any thoughts on their pros and cons? Thanks.