Is newNode->next = NULL;
an undefined behavior in this case?
struct node {
int value;
_Atomic(struct node*) next;
};
//at the initialization stage
struct node* newNode = malloc(sizeof(struct node));
newNode->next = NULL; //other threads are not able to access this node at this point
Is newNode->next = NULL;
an undefined behavior in this case?
struct node {
int value;
_Atomic(struct node*) next;
};
//at the initialization stage
struct node* newNode = malloc(sizeof(struct node));
newNode->next = NULL; //other threads are not able to access this node at this point
Share
Improve this question
edited Nov 21, 2024 at 7:41
Mr.nerd3345678
asked Nov 20, 2024 at 10:55
Mr.nerd3345678Mr.nerd3345678
476 bronze badges
3
|
4 Answers
Reset to default 2Since the definition of node
is bound to a scope (otherwise you couldn't call malloc
) it is indeed only accessible by the current thread. So if the allocation succeeds, the pointed-to object is only visible by that thread.
In consequence, the assignment as given is always done by the same thread that has done the allocation and there is no race on that assignment. Thereafter the member newNode
is initialized and all other operations on that initialized atomic object then are atomic.
Assuming malloc
doesn't return NULL
, there's no UB, and actually room for optimization by avoiding an atomic seq_cst
store (newNode->next = NULL;
is equivalent to atomic_store
).
malloc
returns memory that no other thread has a pointer to. (Unless your program is already has UB, like use-after free, including via insufficient memory-ordering of stores with free()
).
You're storing to that memory before giving other threads a pointer to that object, so data-race UB would be impossible even if the member wasn't _Atomic
.
foo = val
assignment to an _Atomic
object is the same as atomic_store(&foo, val);
, which in turn is equivalent to atomic_store_explicit(&foo, val, memory_order_seq_cst);
.
You don't need any ordering wrt. other operations, because any way of publishing the newNode
pointer to other threads will need to synchronize-with them (via release/acquire) so our malloc
happens-before their access to the pointed-to memory. Any other operations we do between malloc
and atomic_store_explicit(&shared, newNode, release)
will also happen-before anything in a reader thread.
atomic_store_explicit(&newNode->next, NULL, memory_order_relaxed); // cheapest atomic way
*newNode = (struct node){0, NULL}; // assignment of whole non-atomic struct
So we could use a relaxed
store, but that's still atomic which will prevent real compilers from combining it with init of the int value;
member. (Especially on 32-bit machines or ILP32 ABIs where the whole struct is only 8 bytes. Or if we use long count
so there's no padding on most 64-bit ABIs, other than Windows, or intptr_t
so it's always a pair of same-size members.
Compilers often avoid storing to padding, sometimes stopping themselves from using one wider store like AArch64 stp
(store pair).)
#include <stdlib.h>
struct node {
long value;
_Atomic(struct node*) next;
};
struct node* alloc_and_init_orig(long val){
struct node* newNode = malloc(sizeof(struct node));
if (!newNode)
return NULL;
newNode->next = NULL; // atomic seq_cst store
return newNode;
}
Clang for x86-64 uses xchg
to do a seq_cst store, which is a full memory barrier, because x86 doesn't have anything that's only strong enough without being much stronger and more expensive. (AArch64 does, stlr
).
# clang 19 -O3 -fno-plt
alloc_and_init_orig:
push rax # align the stack
mov edi, 16
call qword ptr [rip + malloc@GOTPCREL]
test rax, rax
je .LBB0_2
xor ecx, ecx
xchg qword ptr [rax + 8], rcx # Slow. The seq_cst store
.LBB0_2:
pop rcx
ret
A relaxed
atomic store would compile as cheaply as a non-atomic init of the pointer. But if we wanted to also init count
, especially with 0
, for maximum efficiency we want to let the compiler do both member non-atomically.
struct node* alloc_and_init_zero(){
struct node* newNode = malloc(sizeof(struct node));
if (!newNode)
return NULL;
*newNode = (struct node){0, NULL};
// equivalent to
//struct node tmp = {0, NULL};
//*newNode = tmp;
return newNode;
}
The whole struct node
pointed to by newNode
is not itself _Atomic
, so this struct-assignment is non-atomic. There happens to be an _Atomic
member, but C struct assignment just copies the whole thing, ignoring qualifiers like volatile
or _Atomic
on members. (So it's like a memcpy
. I think it's well-defined to copy the object-representation of an _Atomic
type, as long as you don't expect the copy itself to be _Atomic
.
It certainly works in practice on compilers where the object-representation of _Atomic T
is the same as plain T
, with non-lock-free using a separate hash table of spinlocks or mutexes.)
Clang is pretty clever, compiling the whole function into calloc(1, 16)
. (With int count
instead of long
, this optimization to calloc
only happens with current nightly builds of clang, not clang 19).
If you had an atomic store, current compilers wouldn't optimize it away, defeating this optimization. (Why don't compilers merge redundant std::atomic writes?).
With a non-zero initializer, Clang for AArch64 compiles it to a single 16-byte stp
(store-pair), again which doesn't happen with atomic_store_explicit(&p->next, NULL, relaxed)
and a separate assignment to p->value
. (That would be a legal optimization, but compilers don't do it.)
# clang -O3 -Wall -fomit-frame-pointer -mcpu=cortex-a77
alloc_and_init_const:
str x30, [sp, #-16]! # save return address (link register)
mov w0, #16
bl malloc
cbz x0, .LBB3_2 # compare-and-branch if zero NULL check
mov w8, #123
stp x8, xzr, [x0] # store 123 and the zero-register
.LBB3_2:
ldr x30, [sp], #16 # restore return address
ret
All of these and a couple other examples on the Godbolt compiler explorer. Clang for x86-64 makes the weird choice to load a 16-byte vector constant from .rodata
and store that, instead of doing two separate 8-byte mov
stores like GCC does, or mov ecx, 123
/ movd xmm0, ecx
/ movaps [rax], xmm0
. So compilers are fully capable of shooting themselves in the foot when given more freedom to optimize.
The line newNode->next = NULL;
is safe and does not cause undefined behavior until you miss these points given below:
- The memory allocation succeeds.
- Other threads cannot access
newNode
during initialization.
It is not a UB "per se”.
When Undefined Behavior Might Occur?
Undefined behaviour could occur if:
- The object was not properly allocated (e.g.,
malloc
fails and returnsNULL
). - The assignment happens concurrently in multiple threads without proper synchronization. In this case, atomic operations like
atomic_store
should be used.
newNode
pointer. Concurrent accesses will only become a problem if you assign that address to some variable that can be accessed systemwide. – Gerhardh Commented Nov 20, 2024 at 11:46next
is atomic or not because in your example, it is impossible for two different threads to executenewNode–>next=NULL
for the same*newNode
object. Every execution of that line happens for a unique, new object. – Solomon Slow Commented Nov 20, 2024 at 12:20_Atomic
on members. You want this for efficiency before you've published a pointer to the struct to any other thread, especially if you were going to just use=
insteadatomic_store_explicit(NULL, memory_order_relaxed);
– Peter Cordes Commented Nov 20, 2024 at 17:42