visual c++ - Why does MSVC x64 C use 8-byte int32 parameter alignment instead of 4-byte?

I'm writing a C compiler as a hobby and would like it to be able to link against C static libraries produced by MSVC.

I read the Microsoft x64 ABI, and it doesn't seem to have a strongly mandated alignment for integer primitive types. It "recommends" aligning them with their natural size, for example an int32 would be 4-byte aligned.

But when I compile a minimal program that passes many ints as parameters, it's clearly using 8-byte alignment for them, despite only referencing them as DWORDs.

int add_many_args(int a, int b, int c, int d, int e, int f, int g, int h) {
    return a + b + c + d + e + f + g + h;
}

a$ = 8
b$ = 16
c$ = 24
d$ = 32
e$ = 40
f$ = 48
g$ = 56
h$ = 64
add_many_args PROC
        mov     DWORD PTR [rsp+32], r9d
        mov     DWORD PTR [rsp+24], r8d
        mov     DWORD PTR [rsp+16], edx
        mov     DWORD PTR [rsp+8], ecx
        mov     eax, DWORD PTR b$[rsp]
        mov     ecx, DWORD PTR a$[rsp]
        add     ecx, eax
        mov     eax, ecx
        add     eax, DWORD PTR c$[rsp]
        add     eax, DWORD PTR d$[rsp]
        add     eax, DWORD PTR e$[rsp]
        add     eax, DWORD PTR f$[rsp]
        add     eax, DWORD PTR g$[rsp]
        add     eax, DWORD PTR h$[rsp]
        ret     0
add_many_args ENDP

First question is why would it do that? Why isn't it aligning them using the natural size, 4 bytes?

Second question is: as I try to write a compiler that aims to be able to link against C static libraries, how am I supposed to know what alignment the library used, so that my code can correctly pass stack parameters to library functions? I hear people say that the "C ABI is stable", so where are the rules for this written down?

I'm writing a C compiler as a hobby and would like it to be able to link against C static libraries produced by MSVC.

But when I compile a minimal program that passes many ints as parameters, it's clearly using 8-byte alignment for them, despite only referencing them as DWORDs.

int add_many_args(int a, int b, int c, int d, int e, int f, int g, int h) {
    return a + b + c + d + e + f + g + h;
}

a$ = 8
b$ = 16
c$ = 24
d$ = 32
e$ = 40
f$ = 48
g$ = 56
h$ = 64
add_many_args PROC
        mov     DWORD PTR [rsp+32], r9d
        mov     DWORD PTR [rsp+24], r8d
        mov     DWORD PTR [rsp+16], edx
        mov     DWORD PTR [rsp+8], ecx
        mov     eax, DWORD PTR b$[rsp]
        mov     ecx, DWORD PTR a$[rsp]
        add     ecx, eax
        mov     eax, ecx
        add     eax, DWORD PTR c$[rsp]
        add     eax, DWORD PTR d$[rsp]
        add     eax, DWORD PTR e$[rsp]
        add     eax, DWORD PTR f$[rsp]
        add     eax, DWORD PTR g$[rsp]
        add     eax, DWORD PTR h$[rsp]
        ret     0
add_many_args ENDP

First question is why would it do that? Why isn't it aligning them using the natural size, 4 bytes?

Share Improve this question edited Mar 26 at 11:24 phuclv 42.2k15 gold badges184 silver badges527 bronze badges asked Mar 26 at 6:42 knutaf 931 silver badge4 bronze badges

A lot depends on what is natural for the processor AND for what runs quickly. I assume you compiled this using the “compile for size” flags instead of “compile for speed”. An x86-family processor naturally aligns on 16 bytes, but the processor is very happy to access integers on 8-byte boundaries. Remember, the folks at MS have spent a lot of time making things work right. This includes handling things like crossing cache line boundaries and individual instruction speeds, and a whole lot of tweaked heuristics for optimizations. But there is nothing wrong with working on a 4-byte boundary. – Dúthomhas Commented Mar 26 at 7:59
ABI sets minimum requirements. "econd question is: as I try to write a compiler that aims to be able to link against C static libraries" - your compiler ad linker should be able to link against any alignment – 0___________ Commented Mar 26 at 8:05
3 These variables are function parameters and they are subject to platform calling conventions. You need to read about those to be able to generate functions or function calls. Frankly the calling conventions document here is not written very clearly and it is not evident that int (or short or char!) arguments are passed as if they were 8-byte ones. But it is a fact. – n. m. could be an AI Commented Mar 26 at 8:26
4 The x64 calling convention demands that parameters take at least 8 bytes. The stack is always aligned to 16. Couldn't easily find a good link that states this explicitly. – Hans Passant Commented Mar 26 at 10:28

Add a comment |

1 Answer 1

Sorted by: Reset to default 2

Look at local vars or struct layout. The Windows x64 calling convention makes every arg take exactly 8 bytes (1 register or stack slot), so variadic functions are easy just by dumping the 4 arg-passing regs to shadow space and indexing the args as an array.

It's normal for other calling conventions to make each arg take the stack space of a register, instead of having complicated rules for foo(int a, int64_t b, double c) to make sure the wider args are aligned.

The Windows x64 docs (https://learn.microsoft/en-us/cpp/build/x64-calling-convention?view=msvc-170#parameter-passing) don't clearly state that stack arg slots are always 8 bytes even for narrow types, but they are.

The normal reason for making stack args take a full stack slot is to allow narrow args to be written with push, but you don't normally do that in Windows x64 because shadow space goes below them. So normally you'd sub rsp, imm8 at the top of a function and use mov to store args, not constantly push and dealloc / realloc shadow space. I can't immediately think of a reason why packing narrow args wouldn't work, just enforcing that each one is aligned by at least alignof(T), but it's not a big deal. Especially since aligning RSP by 16 before a call would often mean rounding up the space needed for stack args.

Examples

Godbolt with MSVC and GCC -O2 -mabi=ms, and GCC -O2 targeting Linux (-mabi=sysv being the default for Godbolt's Linux compilers.)

int foo(){
    volatile int a = 1;
    volatile int b = 2;
    volatile int c = 3;
    return a+b+c;
}

Huh, strangely MSVC chooses to put each one in a separate 8-byte slot of the shadow space its caller reserved.

; x64 MSVC 19.40 -O2
c$ = 8               ; offsets from the return address where RSP points on function entry
b$ = 16
a$ = 24
int foo(void) PROC                                        ; foo, COMDAT
        mov     DWORD PTR a$[rsp], 1
        mov     DWORD PTR b$[rsp], 2
        mov     DWORD PTR c$[rsp], 3
        mov     ecx, DWORD PTR c$[rsp]
        mov     eax, DWORD PTR b$[rsp]      ; apparently it doesn't want to add eax, mem with volatile?
        add     ecx, eax
        mov     eax, DWORD PTR a$[rsp]
        add     eax, ecx
        ret     0

But GCC does what I expected:

Linux GCC 14.2 -O2 -mabi=ms
foo():
        sub     rsp, 24             # unfortunately fails to use its shadow space
        mov     DWORD PTR [rsp+4], 1
        mov     DWORD PTR [rsp+8], 2
        mov     DWORD PTR [rsp+12], 3
        mov     eax, DWORD PTR [rsp+4]
        mov     ecx, DWORD PTR [rsp+8]
        mov     edx, DWORD PTR [rsp+12]  # volatile defeats add eax, mem
        add     rsp, 24
        add     eax, ecx
        add     eax, edx
        ret

In a debug build with more variables, MSVC will pack them only 4 bytes apart. In an optimized build with a bunch more unused volatile variables all =2 from copy/paste, it will store them all in the same place, [rsp+32]!! (I put an #if 0 in the godbolt link.)

struct int3{
    int a,b,c;
};

int bar(int3 st){
    return st.a + st.b + st.c;
}

; x64 MSVC -O2
int bar(int3) PROC                       ; bar, COMDAT
        mov     eax, DWORD PTR [rcx+8]
        add     eax, DWORD PTR [rcx+4]
        add     eax, DWORD PTR [rcx]
        ret     0

Windows x64 passes objects larger than 8 bytes by pointer to space allocated by the caller. So it's like bar(int3 &st) except the caller needs to copy so changes made to the arg object aren't visible in the caller's copy if its value is used after the call.

Just for fun, compare the x86-64 System V calling convention which passes structs up to 16 bytes in a pair of registers. In this case, the first two integer arg-passing regs for that convention, RDI and RSI:

# x86-64 Linux GCC -O2
bar(int3):
        mov     rax, rdi
        shr     rax, 32         # st.b
        add     eax, edi        # st.a
        add     eax, esi        # st.c

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

visual c++ - Why does MSVC x64 C use 8-byte int32 parameter alignment instead of 4-byte? - Stack Overflow

1 Answer 1

Examples

与本文相关的文章

评论列表(0)