java - Array based Lock-Free stack. Is full fence necessary?

I'm new to lock-free algorithms and trying to implement Stack which is the simplest lock-free data structure. Here is my implementation of bounded array-based lock-free stack.

public class ConcurrentArrayStack<T> implements Stack<T> {
    private final AtomicInteger size;
    private final T array[];

    public ConcurrentArrayStack(int maxCapacity) {
        this.array = (T[]) new Object[maxCapacity];
        this.size = new AtomicInteger(0);
    }

    @Override
    public T push(T t) {
        if(t == null) {
            throw new NullPointerException("Null values are not allowed");
        }

        int currentSize;
        T currentValue;
        do {
            currentSize = size.get();
            if(currentSize == array.length) {
                return null;
            }
            currentValue = array[currentSize];
        } while(currentValue != null || !sizepareAndSet(currentSize, currentSize + 1));
        array[currentSize] = t;
        VarHandle.fullFence(); // <-- full fence
        return t;
    }

    @Override
    public T pop() {
        int currentSize;
        T currentValue;
        do {
            currentSize = size.get();
            if(currentSize == 0) {
                return null;
            }
            currentValue = array[currentSize - 1];
        } while(currentValue == null || !sizepareAndSet(currentSize, currentSize - 1));
        array[currentSize - 1] = null;
        VarHandle.fullFence(); // <--- full fence
        return currentValue;
    }
}

The primary motivation to use fullFence at the end of each method is this:

Act as a compiler barrier (flush any pending read/write on a compiler level)
Flush store-buffer (x86) to avoid reading stale data from another core.
We do not need a fence between load from array and store to array since loads are not reordered to earlier stores (x86).

QUESTION: The thing is I could not verify, that fullFence is strictly speaking necessary here and cannot be replaced with acquire-release. Is something like this possible?

I'm new to lock-free algorithms and trying to implement Stack which is the simplest lock-free data structure. Here is my implementation of bounded array-based lock-free stack.

public class ConcurrentArrayStack<T> implements Stack<T> {
    private final AtomicInteger size;
    private final T array[];

    public ConcurrentArrayStack(int maxCapacity) {
        this.array = (T[]) new Object[maxCapacity];
        this.size = new AtomicInteger(0);
    }

    @Override
    public T push(T t) {
        if(t == null) {
            throw new NullPointerException("Null values are not allowed");
        }

        int currentSize;
        T currentValue;
        do {
            currentSize = size.get();
            if(currentSize == array.length) {
                return null;
            }
            currentValue = array[currentSize];
        } while(currentValue != null || !sizepareAndSet(currentSize, currentSize + 1));
        array[currentSize] = t;
        VarHandle.fullFence(); // <-- full fence
        return t;
    }

    @Override
    public T pop() {
        int currentSize;
        T currentValue;
        do {
            currentSize = size.get();
            if(currentSize == 0) {
                return null;
            }
            currentValue = array[currentSize - 1];
        } while(currentValue == null || !sizepareAndSet(currentSize, currentSize - 1));
        array[currentSize - 1] = null;
        VarHandle.fullFence(); // <--- full fence
        return currentValue;
    }
}

The primary motivation to use fullFence at the end of each method is this:

Act as a compiler barrier (flush any pending read/write on a compiler level)
Flush store-buffer (x86) to avoid reading stale data from another core.
We do not need a fence between load from array and store to array since loads are not reordered to earlier stores (x86).

QUESTION: The thing is I could not verify, that fullFence is strictly speaking necessary here and cannot be replaced with acquire-release. Is something like this possible?

Share Improve this question edited Feb 15 at 17:10 asked Feb 15 at 16:57 Some Name 9,5408 gold badges35 silver badges95 bronze badges

4 @PeterCordes up to and including JDK23, A type parameter such as T here must be a reference type. Hence, it cannot be a long. It can Long, but the 'value' of a Long is a pointer (to a small object that just holds a long), not, itself, a long. How many bits is a pointer? The JVM does not specify (depends on arch), but the JVM guarantees that ref writes are atomic, in the sense that you cannot see a sheared ref, whereas the JVM spec explicitly calls out that long may shear (but, on just about every modern JVM/arch combo, they don't). – rzwitserloot Commented Feb 16 at 0:05
1 @rzwitserloot I agree with you to the point, but the idea of avoiding synchronization at all is that volatile semantic is too strong to prevent hardware read-write reordering to the same location on x86. Since I'm only targeting x86 it should be fine, shouldn't it? – Some Name Commented Feb 16 at 3:16
3 @SomeName The questions you are asking have no answer in a fundamental sense. No answer is possible. The JVM is a specification and the specification doesn't specify things in terms that are mappable to specific ISA. Your question would need to be: "On this exact chip with this exact version of the JDK, how does this code behave?". Which you can ask, and that question would be answerable. You'd have to put a sizable bounty on it to get an answer I'd guess, though. – rzwitserloot Commented Feb 16 at 4:27
1 @rzwitserloot: That's not what the OP is asking, though. Or at least, there are useful interpretations of the question like how to portably and safely get inter-thread visibility without volatile or full barriers, if you don't need as much ordering. (Answer, I think, is VarHandle .setOpaque or .setRelease, and .getOpaque. docs.oracle/javase/9/docs/api/java/lang/invoke/…). – Peter Cordes Commented Feb 16 at 6:30
1 In asm terms, you don't need to manually drain the store buffer unless you want to block local reordering; it already drains itself as fast as it can. All you need for inter-thread visibility (on any modern ISA, not just x86) is for the store and load to actually happen in the asm, not be optimized away. That's what setOpaque gives you, but with guarantees in terms of the JMM. (I don't know Java well enough to explain the details, but it's very obviously a way to do relaxed and acquire or release atomic ops on plain variables.) – Peter Cordes Commented Feb 16 at 6:33

| Show 7 more comments

1 Answer 1

Sorted by: Reset to default 2

It appears to me to be utterly unworkable.

The problem is in the act of trusting that the execution of array[5] = t in thread A is somehow visible to thread B, as in, that you get the same value out when calling return array[5].

Whenever you write to a field from thread A and read it in thread B (and array elements act like fields for this rule), your code is broken unless [A] your code is written such that it does not matter which state you observe (incredibly rare and incredibly arrogant: You generate a 'branch' in your code logic, and one you cannot test. It only works if you're so good at programming, you don't write bugs. And you know it. Without even running your program. No mortal can claim this), or [B] you establish happens-before.

Happens-Before is explained in depth in JDK23 Java Language Specification §17.4.5: Happens-before Order and as far as I can tell from your code, you aren't establishing it.

Therefore, the JVM is free to have that array[5] = v; call have absolutely no effect whatsoever for days, as in, that other threads still see the 'old' value when reading array[5].

fullFence does nothing: That prevents in-same-thread re-ordering, and global reordering. In other words, given:

class GlobalStorage {
  static int x = 10;
  static int y = 20;
}

class ThreadA {
  void run() {
    x = 100;
    fullFence();
    y = 200;
  }
}

class ThreadB {
  void run() {
    int a = x;
    int b = y;
    System.out.println(a + " " + b);
  }
}

without the fullFence, the java memory model guarantees that this code must output one of the following 4 answers:

10 20
10 200
100 20
100 200

If it outputs anything else, the JVM is broken. If it outputs one of those 4, it is not; any bug report that one of those is 'wrong' or that the JVM is inconsistent would be denied as 'works as designed'.

With the full fence instruction added, 10 200 is stricken from the list. That is no longer a valid answer. But 10 20 remains on the list. The only guarantee the full fence does as far as 'interaction between threads' goes, is that you cannot possibly observe a re-ordering of assignments in the 'full fenced' code. Which 10 200 would be, hence, we strike that.

Even going: Eh, it's fine, we don't need perfect, nanosecond accuracy, if it a call to push in thread A followed less than a second later by a pull in thread B doesn't actually result in data being transferred, that's fine with me - that does not work - the JVM is free to spend 5 days not relaying that push to thread B.

Except it gets much worse here - because you've got an AtomicInteger for size, that is definitely 'safely conveyed' across threads. So thread B (dealing with a pop()) can observe that the size has increased, and dutifully returns that entry from array at the appropriate index. The line in thread A (that push()ed this value) has definitely occurred, but you have no guarantee that thread B will witness this and thus you get an old value back, the JMM dictates your JVM is not broken when it does this.

And to make matters worse, of course, if you never observe that on your architecture, with this OS, with this song playing on your music player, and this phase of the moon, then that's fine too. The JMM spells out that 10 20, 10 200, 100 20, 100 200 are all fine; it does not promise that you will see all 4.

NB: This stuff is hard and one of the things that makes it hard is that you can't test it properly - java is designed to be a thing that can be easily adapted to run swiftly and according to spec on any modern architecture, which means the JMM gives the JVM various freedoms... but not all of those freedoms are necessarily 'used' on all architectures, or, as you have tagged the question, on x86 or x64 architecture. Hence, you can write code that reliably works perfectly on x86 architecture every time, today, but ocassionally fails on other architectures. Even if x86 is all you care about (weird thing to do if you write in java), you're still not allowed to do this: By definition whatever trickery you are using cannot possibly be guaranteed by the JMM. Therefore, a future version of java may no longer run your app perfectly well every time on x86 architecture. You'd have to say: "I have thoroughly inspected how all JVMs I know of that run on x86 work, and for each and every one of them, this code works perfectly well, guaranteed, on x86 hardware. But do not run this on other arch and do not run this on different JVM versions, as I cannot possibly guarantee what happens at that point".

I'm not sure you'd want to go down that path.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

java - Array based Lock-Free stack. Is full fence necessary? - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)