I know that some processors fail with misaligned data, and others like the oh-so-common x86, would just be slower with that.
My question is why? Why is it harder for an x86 processor to get the data f开发者_开发知识库rom the pointer 0x12345679
than it is from the pointer 0x12345678
? Just to be clear, I'm aware that page faults may happen if the data is in multiple pages, and I understand that more data may need to be fetched from memory (one part for the start of the value and one for the end), but that isn't always true and this isn't what my question is about. I'm asking, why is it always slower?
Suppose the memory starts at 0x10000000
. Why is it harder for the processor to get a 2-byte short
from 0x10000001
than it is from 0x10000002
? Why is it harder to get a 4-byte int
from 0x10000001
than it is from 0x10000000
? And so forth.
Because the data bus is wider than eight bits.
Let assume that the data bus is 32 bits. To get 16 bits from address 0x10000001, it has to get the four bytes that starts at 0x10000000 and shift the value to get the two bytes in the middle.
To get 16 bits from the address 0x10000003, it has to get the words that start at 0x10000000 and 0x10000004, and use one byte from each value.
The processor can only access memory in an aligned fashion. This is a consequence of how the interconnect between the processor and memory functions.
When a processor supports unaligned reads, what's really happening is the processor issuing two separate reads (or one read of larger size) and stitching the parts together, which is why it's slower than an aligned read.
One example: if the databus is 32 bits and a 32 bit value is not on a 32 bit boundary, the bytes will have to be fetched in more than one operation and moved around to load the value properly into a processor register.
精彩评论