c - Using = operator on atomic variable?

Is newNode->next = NULL; an undefined behavior in this case?

struct node {
  int value;
  _Atomic(struct node*) next;
};

//at the initialization stage
struct node* newNode = malloc(sizeof(struct node));
newNode->next = NULL; //other threads are not able to access this node at this point

Is newNode->next = NULL; an undefined behavior in this case?

struct node {
  int value;
  _Atomic(struct node*) next;
};

//at the initialization stage
struct node* newNode = malloc(sizeof(struct node));
newNode->next = NULL; //other threads are not able to access this node at this point

Share Improve this question edited Nov 21, 2024 at 7:41 asked Nov 20, 2024 at 10:55 Mr.nerd3345678 476 bronze badges

There are already answers but I would like to add some detail that may not be obvious to you. For valid C code, the last 3 lines must be located inside a function. If any other thread calls that function at the same time, then the other thread will have its own newNode pointer. Concurrent accesses will only become a problem if you assign that address to some variable that can be accessed systemwide. – Gerhardh Commented Nov 20, 2024 at 11:46
1 ...in other words, it doesn't matter whether next is atomic or not because in your example, it is impossible for two different threads to execute newNode–>next=NULL for the same *newNode object. Every execution of that line happens for a unique, new object. – Solomon Slow Commented Nov 20, 2024 at 12:20
@SolomonSlow: That being the case, you could do a non-atomic init of the whole struct object by doing struct assignment instead of to the individual member. Unlike in C++, assigning the whole struct doesn't go member by member, it's like a memcpy of the struct, ignoring qualifiers like _Atomic on members. You want this for efficiency before you've published a pointer to the struct to any other thread, especially if you were going to just use = instead atomic_store_explicit(NULL, memory_order_relaxed); – Peter Cordes Commented Nov 20, 2024 at 17:42

Add a comment |

4 Answers 4

Sorted by: Reset to default 2

Since the definition of node is bound to a scope (otherwise you couldn't call malloc) it is indeed only accessible by the current thread. So if the allocation succeeds, the pointed-to object is only visible by that thread.

In consequence, the assignment as given is always done by the same thread that has done the allocation and there is no race on that assignment. Thereafter the member newNode is initialized and all other operations on that initialized atomic object then are atomic.

Assuming malloc doesn't return NULL, there's no UB, and actually room for optimization by avoiding an atomic seq_cst store (newNode->next = NULL; is equivalent to atomic_store).

malloc returns memory that no other thread has a pointer to. (Unless your program is already has UB, like use-after free, including via insufficient memory-ordering of stores with free()).

You're storing to that memory before giving other threads a pointer to that object, so data-race UB would be impossible even if the member wasn't _Atomic.

foo = val assignment to an _Atomic object is the same as atomic_store(&foo, val);, which in turn is equivalent to atomic_store_explicit(&foo, val, memory_order_seq_cst);.

You don't need any ordering wrt. other operations, because any way of publishing the newNode pointer to other threads will need to synchronize-with them (via release/acquire) so our malloc happens-before their access to the pointed-to memory. Any other operations we do between malloc and atomic_store_explicit(&shared, newNode, release) will also happen-before anything in a reader thread.

atomic_store_explicit(&newNode->next, NULL, memory_order_relaxed); // cheapest atomic way
*newNode = (struct node){0, NULL};  // assignment of whole non-atomic struct

So we could use a relaxed store, but that's still atomic which will prevent real compilers from combining it with init of the int value; member. (Especially on 32-bit machines or ILP32 ABIs where the whole struct is only 8 bytes. Or if we use long count so there's no padding on most 64-bit ABIs, other than Windows, or intptr_t so it's always a pair of same-size members. Compilers often avoid storing to padding, sometimes stopping themselves from using one wider store like AArch64 stp (store pair).)

#include <stdlib.h>
struct node {
  long value;
  _Atomic(struct node*) next;
};

struct node* alloc_and_init_orig(long val){
    struct node* newNode = malloc(sizeof(struct node));
    if (!newNode)
        return NULL;
    newNode->next = NULL; // atomic seq_cst store
    return newNode;
}

Clang for x86-64 uses xchg to do a seq_cst store, which is a full memory barrier, because x86 doesn't have anything that's only strong enough without being much stronger and more expensive. (AArch64 does, stlr).

# clang 19 -O3 -fno-plt
alloc_and_init_orig:
        push    rax           # align the stack
        mov     edi, 16
        call    qword ptr [rip + malloc@GOTPCREL]
        test    rax, rax
        je      .LBB0_2
        xor     ecx, ecx
        xchg    qword ptr [rax + 8], rcx      # Slow.  The seq_cst store
.LBB0_2:
        pop     rcx
        ret

A relaxed atomic store would compile as cheaply as a non-atomic init of the pointer. But if we wanted to also init count, especially with 0, for maximum efficiency we want to let the compiler do both member non-atomically.

struct node* alloc_and_init_zero(){
    struct node* newNode = malloc(sizeof(struct node));
    if (!newNode)
        return NULL;
    *newNode = (struct node){0, NULL};
    // equivalent to
    //struct node tmp = {0, NULL};
    //*newNode = tmp;
    return newNode;
}

The whole struct node pointed to by newNode is not itself _Atomic, so this struct-assignment is non-atomic. There happens to be an _Atomic member, but C struct assignment just copies the whole thing, ignoring qualifiers like volatile or _Atomic on members. (So it's like a memcpy. I think it's well-defined to copy the object-representation of an _Atomic type, as long as you don't expect the copy itself to be _Atomic. It certainly works in practice on compilers where the object-representation of _Atomic T is the same as plain T, with non-lock-free using a separate hash table of spinlocks or mutexes.)

Clang is pretty clever, compiling the whole function into calloc(1, 16). (With int count instead of long, this optimization to calloc only happens with current nightly builds of clang, not clang 19).

If you had an atomic store, current compilers wouldn't optimize it away, defeating this optimization. (Why don't compilers merge redundant std::atomic writes?).

With a non-zero initializer, Clang for AArch64 compiles it to a single 16-byte stp (store-pair), again which doesn't happen with atomic_store_explicit(&p->next, NULL, relaxed) and a separate assignment to p->value. (That would be a legal optimization, but compilers don't do it.)

# clang -O3 -Wall -fomit-frame-pointer -mcpu=cortex-a77
alloc_and_init_const:
        str     x30, [sp, #-16]!   # save return address (link register)
        mov     w0, #16
        bl      malloc
        cbz     x0, .LBB3_2       # compare-and-branch if zero NULL check
        mov     w8, #123
        stp     x8, xzr, [x0]     # store 123 and the zero-register
.LBB3_2:
        ldr     x30, [sp], #16    # restore return address
        ret

All of these and a couple other examples on the Godbolt compiler explorer. Clang for x86-64 makes the weird choice to load a 16-byte vector constant from .rodata and store that, instead of doing two separate 8-byte mov stores like GCC does, or mov ecx, 123 / movd xmm0, ecx / movaps [rax], xmm0. So compilers are fully capable of shooting themselves in the foot when given more freedom to optimize.

The line newNode->next = NULL; is safe and does not cause undefined behavior until you miss these points given below:

The memory allocation succeeds.
Other threads cannot access newNode during initialization.

It is not a UB "per se”.

When Undefined Behavior Might Occur?

Undefined behaviour could occur if:

The object was not properly allocated (e.g., malloc fails and returns NULL).
The assignment happens concurrently in multiple threads without proper synchronization. In this case, atomic operations like atomic_store should be used.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

c - Using = operator on atomic variable? - Stack Overflow

4 Answers 4

与本文相关的文章

评论列表(0)