I found this Q&A one the Web:
Q: Which is better a char, short or int type for optimization?
A: Where possible, it is best to avoid using char and short as local variables. For the types char and short the compiler needs to reduce the size of the local variable to 8 or 16 bits after each assignment. This is called sign-extending for signed variables and zeroextending for unsigned variables. It is implemented by shifting the register left by 24 or 16 bits, followed by a signed or unsigned shift right by the same amount, taking two instructions (zero-extension of an unsigned char takes one instruction). These shifts can be avoided by using int and unsigned int for local variables. This is p开发者_如何学Pythonarticularly important for calculations which first load data into local variables and then process the data inside the local variables. Even if data is input and output as 8- or 16-bit quantities, it is worth considering processing them as 32-bit quantities.
Is this correct? I thought it is better to avoid char and short because of arithmetic conversion (most likely they will be converted to ints or longs, and this will cause compiler to generate extra instructions).
Q: How to reduce function call overhead in ARM based systems?
A: Avoid functions with a parameter that is passed partially in a register and partially on the stack (split-argument). This is not handled efficiently by the current compilers: all register arguments are pushed on the stack.
· Avoid functions with a variable number of parameters. Varargs functions. ...
Concerning 'varargs' -- is this because the arguments will be passed over the stack? What is a function with args partially passed in registers, and partially via stack, could you provide example?
Can we say, that the way function arguments are passed (either by registers or stack) strongly depends on architecture?
Thanks !
Simply put: that advice on optimization is misleading. Not necessarily wrong, but incomplete.
It appears your source was CodeProject. He states he's mostly talking about optimization for ARM.
First, it's highly processor-dependent how char and short are handled. Depending on the architecture, conversions may be zero or minimal cost, depending on when and how they occur -- at load time, the type of operation, what instructions can run in parallel and in effect may be free, depending on the rest of the code - for example on the TI DSP c64 architecture, which can run 8 ops per cycle. Typically the most-efficient use will be the "native" integer size, but it also depends on where the data comes from - it may be more efficient to load, modify and store back char/short data than to load and convert to int, modify, and store back as char/short. Or it may not - it depends on the architecture and the operations being performed. The compiler often has a better look at whether to do this for you or not.
Second, in many, many architectures char and short are as fast as int, especially if the calculation avoids implicit conversions to int. Note: this is easy to mess up in C, like "x = y + 1" - that forces conversion up to int (assuming x & y are char or short), but the good thing is that almost all compilers are smart enough to optimize-away the conversion for you. Many other cases of having a local be char/short will cause the compiler to optimize-away any conversions depending on how it's used later. This is helped by the fact that in typical processors, the overflow/wrap-around of a char/short is the same result as calculating it as an int and converting on store (or by simply addressing that register as char/short in a later operation - getting the conversion for 'free').
In their example:
int wordinc (int a)
{
return a + 1;
}
short shortinc (short a)
{
return a + 1;
}
char charinc (char a)
{
return a + 1;
}
In many architectures/compilers, these will run equally fast in practice.
Third, in some architectures char/short are faster than int. Embedded architectures with a natural size of 8 or 16 bits (admittedly not the sort of development you're thinking of nowadays) is an example.
Fourth, though not a big issue generally in modern ram-heavy, huge-cache processor environments, keeping local stack storage size down (assuming the compiler doesn't hoist it to a register) may help improve the efficiency of cache accesses, especially level-1 caches.
On the other side, IF the compiler isn't smart enough to hide it from you, local char/shorts passed as arguments to other functions (especially not file-local 'static' functions) may entail up-conversions to int. Again, per above, the compiler may well be smart enough to hide the conversion.
I do agree with this statement at the start of the site you quote:
Although a number of guidelines are available for C code optimization, there is no substitute for having a thorough knowledge of the compiler and machine for which you are programming.
- Yes, according to the standard almost all computations and comparisons are done with integral types that have at least the width of
int
. So using smaller types "only" saves space and may on the other hand have an overhead. - Varargs have to use the stack, since the corresponding macros that process these arguments usually just use a pointer to keep track of the actual position of the argument that is processed.
Concerning 'varargs' -- is this because the arguments will be passed over the stack? What is a function with args partially passed in registers, and partially via stack, could you provide example?
if you have a function like:
int my_func(int v1, int v2)
The compiler can use the internal register of the processor to pass the argument v1, v2 during the function call.
if you have:
int my_func(int v1, int v2, ...., int v10)
The space used by your parameter is too big to use the processor internal register (not enough space), so you use internal register and the stack.
Can we say, that the way function arguments are passed (either by registers or stack) strongly depends on architecture?
Yes, it also strongly depend on the compiler.
I wouldn't think the reduction in size when assigning to 8 or 16 bits would only take place when assigning from a larger value. For example, if a function returns char, why would it need to modify the value at all when assigning to a char? There may be an exception if there were some operations that could only be done with larger variables, but depending on the compiler and processor, I'm not sure this would come up that often.
On some processors, unsigned char is the fastest type. On some, it will be consistently slower than int. On the ARM, an unsigned char which is stored in memory should run the same speed as an int stored in memory, but an unsigned char stored in a register will frequently have to be 'normalized' to the value 0-255 at the cost of an instruction; an unsigned short would have to be 'normalized' to 0-65535 at the cost of two instructions. I would expect that a good compiler could eliminate a lot of unnecessary normalizations either by working with 65536 times the value of interest, or by observing that upper bits aren't going to matter; I don't know to what extent actual compilers do either of those things.
BTW, it's worth noting that while the C standard requires that adding 1 to a 16-bit unsigned integer that holds 65,535 must yield zero (not 65,536), there is no similar requirement for signed integers. A compiler would be free to regard a signed short or signed char as an int when it's held in a register, and as its proper-sized type when stored in memory. Thus, using signed types would avoid the need for extra value-truncation instructions.
It is target and/or compiler dependent. It may also depend on what you want to optimise, memory usage, code space, or execution time.
Regarding ARM function calls, the ARM ABI defines a standard to which most ARM compiler will comply. It is a rather useless answer since you you would not generally implement or call a variadic function unless you actually needed one.
Genarally let the compiler worry about efficient code generation; it is your expert system for the target, and get on with productive work. Worry about optimisation only when you know that it is needed (i.e. when it is shown to be otherwise too slow or too large).
精彩评论