I have to pack and unpack a Cardinal into four one-byte fields (in Delphi 2010).
I'm doing this across all the pixels of a large image, so I need it to be fast!
Can anyone开发者_JS百科 show me how to write these two functions? (The const and out keywords are just for clarity. If they interfere with inline assembly, then I can remove them.)
procedure FromCardinalToBytes( const aInput: Cardinal;
out aByte1: Byte;
out aByte2: Byte;
out aByte3: Byte;
out aByte4: Byte); inline;
function FromBytesToCardinal( const aByte1: Byte;
const aByte2: Byte;
const aByte3: Byte;
const aByte4: Byte):Cardinal; inline;
I'd recommed not using a function, just use a variant record.
type
TCardinalRec = packed record
case Integer of
0: (Value: Cardinal;);
1: (Bytes: array[0..3] of Byte;);
end;
Then you can easily use this to obtain the individual bytes.
var
LPixel: TCardinalRec;
...
LPixel.Value := 123455;
//Then read each of the bytes using
B1 := LPixel.Bytes[0];
B2 := LPixel.Bytes[1];
//etc.
If you absolutely must, you can put this into a function, but it's trivial enough not to bother with the overhead of a function call.
EDIT
To illustrate the efficiency of the variant record approach consider the following (assuming you're reading your image from a Stream).
var
LPixelBuffer: array[0..1023] of TCardinalRec;
...
ImageStream.Read(LPixelBuffer, SizeOf(LPixelBuffer));
for I := Low(LPixelBuffer) to High(LPixelBuffer) do
begin
//Here each byte is accessible by:
LPixelBuffer[I].Bytes[0]
LPixelBuffer[I].Bytes[1]
LPixelBuffer[I].Bytes[2]
LPixelBuffer[I].Bytes[3]
end;
PS: Instead of an arbitrarily generic Bytes array, you could explicitly name each Byte in the variant record as Red, Green, Blue, (and whatever the fourth Byte means).
There are many ways. The simplest is
function FromBytesToCardinal(const AByte1, AByte2, AByte3,
AByte4: byte): cardinal; inline;
begin
result := AByte1 + (AByte2 shl 8) + (AByte3 shl 16) + (AByte4 shl 24);
end;
procedure FromCardinalToBytes(const AInput: cardinal; out AByte1,
AByte2, AByte3, AByte4: byte); inline;
begin
AByte1 := byte(AInput);
AByte2 := byte(AInput shr 8);
AByte3 := byte(AInput shr 16);
AByte4 := byte(AInput shr 24);
end;
Slightly more sophisticated (but not necessarily faster) is
function FromBytesToCardinal2(const AByte1, AByte2, AByte3,
AByte4: byte): cardinal; inline;
begin
PByte(@result)^ := AByte1;
PByte(NativeUInt(@result) + 1)^ := AByte2;
PByte(NativeUInt(@result) + 2)^ := AByte3;
PByte(NativeUInt(@result) + 3)^ := AByte4;
end;
procedure FromCardinalToBytes2(const AInput: cardinal; out AByte1,
AByte2, AByte3, AByte4: byte); inline;
begin
AByte1 := PByte(@AInput)^;
AByte2 := PByte(NativeUInt(@AInput) + 1)^;
AByte3 := PByte(NativeUInt(@AInput) + 2)^;
AByte4 := PByte(NativeUInt(@AInput) + 3)^;
end;
If you don't need the bytes to be byte variables, you can do even trickier things, like declaring
type
PCardinalRec = ^TCardinalRec;
TCardinalRec = packed record
Byte1,
Byte2,
Byte3,
Byte4: byte;
end;
and then just cast:
var
c: cardinal;
begin
c := $12345678;
PCardinalRec(@c)^.Byte3 // get or set byte 3 in c
If you want fast you need to consider the 80x86 architecture.
The speed depends heavily on what you are doing with the bytes.
The x86 can access the bottom 2 bytes really fast, using the AL and AH registers
(least significant bytes in the 32-bit EAX register)
If you want to get at the higher order two bytes, you do not want to access those directly. Because you'll get an unaligned memory access, wasting CPU-cycles and messing up with the cache.
Making it faster
All this stuff messing with individual bytes is really not needed.
If you want to be really fast, work with 4 bytes at a time.
NewPixel:= OldPixel or $0f0f0f0f;
If you want to process your pixels really fast use inline MMX assembly and work with 8 bytes at a time.
Links:
Wikipedia: http://en.wikipedia.org/wiki/MMX_%28instruction_set%29
Explanation of the MMX instruction set: http://webster.cs.ucr.edu/AoA/Windows/HTML/TheMMXInstructionSet.html
Or re-ask your question on SO: How do I do this bitmap manipulation ... in MMX.
Really really fast
If you want it really really fast, like 100 or 1000x faster than MMX can, your videocard can do that. Google for CUDA or GPGPU.
精彩评论