最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Locking a json file across 2 independent processes - a comparison of two methods - Stack Overflow

programmeradmin2浏览0评论

(I know that this question has come up several times on Stack Overflow already, but I still think it is worth clarifying some details.)

Suppose that there are two independent Python processes: a.py and b.py which inflict changes on a json file r.json through their respective functions funcA and funcB. Both functions funcA and funcB would run many times, each time starting at unpredictable moments, so it is essential to lock funcB while funcA is running, and vice versa. As funcA and funcB correspond to different processes, classical synchronisation techniques like threading locks cannot be used. After some research on the internet, I have tried the following two methods: I/O locks and flock from the fcntl library. (The fact that fcntl only applies to POSIX is not a concern for me, as I am exclusively working on a Linux-based environment.)

Method 1: I/O locks

We first define the following util functions.

from pathlib import Path

def acquire_file_lock(filename: str) -> None:
    LOCK_FILE_PATH = filename + ".lock"
    with open(LOCK_FILE_PATH, 'w') as _:
        pass

def file_lock_exists(filename: str) -> bool:
    LOCK_FILE_PATH = filename + ".lock"
    return Path(LOCK_FILE_PATH).is_file()

def remove_file_lock(filename: str) -> None:
    LOCK_FILE_PATH = filename + ".lock"
    Path(LOCK_FILE_PATH).unlink(missing_ok=True)

Here is the code in a.py.

import time 

def funcA():

    while file_lock_exists("r.json"):
        time.sleep(0.1)

    try:
        acquire_file_lock("r.json")
        do_something_A()
        remove_file_lock("r.json")
    except:
        remove_file_lock("r.json")

Same code in b.py, but funcA is now replaced by funcB which would contain a different set of code in its corresponding do_something_B().

Method 2: Flock from fcntl

We first define the following util class.

import fcntl 

class FileLocker:

    def __init__(self, filename: str):
        self.filename_with_lock = filename

    def __enter__(self):
        self.fp = open(self.filename_with_lock)
        fcntl.flock(self.fp.fileno(), fcntl.LOCK_EX)

    def __exit__(self, _type, value, tb):
        fcntl.flock(self.fp.fileno(), fcntl.LOCK_UN)
        self.fp.close()

Here is the code in a.py.

def funcA():
    with FileLocker("r.json"):
        try:
            do_something_A()
        except:
            pass

As before, same code in b.py, but funcA is now replaced by funcB which would contain a different set of code in its corresponding do_something_B().

Comparison between the two methods

Both methods seem to work. Method 1 has the superficial advantage of relying on an implementation that is easier to understand. Nonetheless, it seems that Method 2 is more reliable: Method 1 relies on checking whether the lock file exists every 0.1 seconds (we can indeed increase the frequency of checking to every 0.01 or even 0.001 seconds, with the potential impact of draining the CPU core), and there is indeed a very small chance of race conditions happening if funcA and funcB both start executing at almost the same instant. On the other hand, this "waiting" mechanism seems to be automatically taken care of by the flock library in Method 2.

Any thoughts on their pros and cons? Thanks.

发布评论

评论列表(0)

  1. 暂无评论