开发者

Why doesn't the Windows x64 calling convention use XMM registers to pass more than 4 integer args?

开发者 https://www.devze.com 2023-03-11 22:35 出处:网络
The (Microsoft) x64 calling convention states: The arguments are passed in registers RCX, RDX, R8, and R9. If the arguments are float/double, they are passed in XMM0L, XMM1L, XMM2L, and XMM3L.

The (Microsoft) x64 calling convention states:

The arguments are passed in registers RCX, RDX, R8, and R9. If the arguments are float/double, they are passed in XMM0L, XMM1L, XMM2L, and XMM3L.

That's great, but why just floats/doubles? Why aren't integers (and maybe pointers) also passed via 开发者_C百科XMM registers?

Seems a little like a waste of available space, doesn't it?


Because most operations on non-FP values (i.e. integers and addresses) are designed to use general purpose registers.

There're integer SSE operations but they are arithmetical only.

So, if calling convention supported passing integers and addresses via SSE registers, it would be almost always necessary to copy value to general purpose registers.


Functions often want to use integer args with pointers (as indices or to calculate an end-pointer as a loop bound), or with other integer args in GP registers. Or with other integers loaded from memory that they want to work with in GP registers

You can't efficiently use an integer in an XMM reg as a loop counter or bound, because there's no packed-integer compare that sets integer flags for branch instructions. (pcmpgtd creates a mask of 0/-1 elements).

See also Why not store function parameters in XMM vector registers? and the other answer here for more.


But even beyond that, this design idea is not even an option for Windows x64 fastcall / vectorcall.

Windows x64 chooses to waste space on purpose to simplify variadic functions. The register args can be dumped into the 32-byte "shadow space" / "home space" above the return address, to form an array of args.

This is why (for example) Windows x64 passes the 3rd arg in R8 or XMM2, regardless of the types of the earlier args. And why calls to variadic functions require FP args to also be copied to the corresponding integer register, so the function prologue can dump the arg regs without figuring out which variadic args were FP and which were integer.

To make the arg-array thing work, only 4 total args can be passed in registers, regardless of whether you have a mix of integer and FP args. There are enough GP integer regs to hold the max number of register args already, even if they're all integer.


(Unlike x86-64 System V, where the first up-to-8 FP args are passed in xmm0..7 regardless of how many integer/pointer arg-passing registers are used.)

Why does Windows64 use a different calling convention from all other OSes on x86-64?

0

精彩评论

暂无评论...
验证码 换一张
取 消