When I have per-CPU data structures, does it improve performance to have them on different pages?_问答_开发者

When I have per-CPU data structures, does it improve performance to have them on different pages?

开发者 https://www.devze.com 2023-01-31 14:21 出处：网络

I have a small struct of per-CPU data in a linux kernel module, where each CPU frequently writes and reads its own data.I know that I need to make sure these items of data aren\'t on the same cache li

I have a small struct of per-CPU data in a linux kernel module, where each CPU frequently writes and reads its own data. I know that I need to make sure these items of data aren't on the same cache line, because if they were then the cores would be forever dirtying each other's caches. However, is there anything at the page level that I need to worry about from an SMP performance point of view? ie. would there be any performance impact from padding these per-cpu struct开发者_如何学JAVAures out to 4096 bytes and aligning them?

This is on linux 2.6 on x86_64.

(Points about whether it's worth optimising and suggestions that I go benchmark it aren't needed -- what I'm looking for is whether there's any theoretical basis for worrying about page alignment).

Within a single NUMA node, different pages are only helpful if you want to apply different permissions, or map them individually into processes. For performance issues, being on different cachelines is sufficient.

On NUMA architectures, you may want to place a CPU's per-CPU structure on a page that is local to that CPU's node - but you still wouldn't pad the structure out to a page size to achieve that, because you can place the structures for multiple CPUs within the same NUMA node on the same page.

Even on a NUMA system, you probably won't benefit much by allocating memory pages local to each cpu (use kmalloc_node(), if you're curious).

Node-local memory will be faster, but only in the case where it misses at all cache levels. For anything used with any frequency, you probably won't be able to tell the difference. If you're allocating megabytes of cpu-local data, then it probably makes sense to allocate pages local to each cpu.

Well, I've read a fair bit about linux having NUMA support these days. In a NUMA setup, it would be helpful if the data for each CPU was located on a page that is local to that CPU.

percpu generally makes sure that they don't share a cache line. Otherwise commits like 7489aec8eed4f2f1eb3b4d35763bd3ea30b32ef5 would have been pretty useless.