开发者

Programmatically determining max value of a signed integer type

开发者 https://www.devze.com 2023-02-07 12:37 出处:网络
This related question is about determining the max value of a signed type at compile-time: C question: off_t (and other signed integer types) minimum and maximum values

This related question is about determining the max value of a signed type at compile-time:

C question: off_t (and other signed integer types) minimum and maximum values

However, I've sin开发者_开发问答ce realized that determining the max value of a signed type (e.g. time_t or off_t) at runtime seems to be a very difficult task.

The closest thing to a solution I can think of is:

uintmax_t x = (uintmax_t)1<<CHAR_BIT*sizeof(type)-2;
while ((type)x<=0) x>>=1;

This avoids any looping as long as type has no padding bits, but if type does have padding bits, the cast invokes implementation-defined behavior, which could be a signal or a nonsensical implementation-defined conversion (e.g. stripping the sign bit).

I'm beginning to think the problem is unsolvable, which is a bit unsettling and would be a defect in the C standard, in my opinion. Any ideas for proving me wrong?


Let's first see how C defines "integer types". Taken from ISO/IEC 9899, §6.2.6.2:

6.2.6.2 Integer types
1
For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2N−1, so that objects of that type shall be capable of representing values from 0 to 2N − 1 using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified.44)
2
For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; there shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M ≤ N). If the sign bit is zero, it shall not affect the resulting value. If the sign bit is one, the value shall be modified in one of the following ways:

— the corresponding value with sign bit 0 is negated (sign and magnitude);
— the sign bit has the value −(2N) (two’s complement);
— the sign bit has the value −(2N − 1) (ones’ complement).

Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones’ complement), is a trap representation or a normal value. In the case of sign and magnitude and ones’ complement, if this representation is a normal value it is called a negative zero.

Hence we can conclude the following:

  • ~(int)0 may be a trap representation, i.e. setting all bits to is a bad idea
  • There might be padding bits in an int that have no effect on its value
  • The order of the bits actually representing powers of two is undefined; so is the position of the sign bit, if it exists.

The good news is that:

  • there's only a single sign bit
  • there's only a single bit that represents the value 1


With that in mind, there's a simple technique to find the maximum value of an int. Find the sign bit, then set it to 0 and set all other bits to 1.

How do we find the sign bit? Consider int n = 1;, which is strictly positive and guaranteed to have only the one-bit and maybe some padding bits set to 1. Then for all other bits i, if i==0 holds true, set it to 1 and see if the resulting value is negative. If it's not, revert it back to 0. Otherwise, we've found the sign bit.

Now that we know the position of the sign bit, we take our int n, set the sign bit to zero and all other bits to 1, and tadaa, we have the maximum possible int value.

Determining the int minimum is slightly more complicated and left as an exercise to the reader.



Note that the C standard humorously doesn't require two different ints to behave the same. If I'm not mistaken, there may be two distinct int objects that have e.g. their respective sign bits at different positions.



EDIT: while discussing this approach with R.. (see comments below), I have become convinced that it is flawed in several ways and, more generally, that there is no solution at all. I can't see a way to fix this posting (except deleting it), so I let it unchanged for the comments below to make sense.


Mathematically, if you have a finite set (X, of size n (n a positive integer) and a comparison operator (x,y,z in X; x<=y and y<=z implies x<=z), it's a very simple problem to find the maximum value. (Also, it exists.)

The easiest way to solve this problem, but the most computationally expensive, is to generate an array with all possible values from, then find the max.

Part 1. For any type with a finite member set, there's a finite number of bits (m) which can be used to uniquely represent any given member of that type. We just make an array which contains all possible bit patterns, where any given bit pattern is represented by a given value in the specific type.

Part 2. Next we'd need to convert each binary number into the given type. This task is where my programming inexperience makes me unable to speak to how this may be accomplished. I've read some about casting, maybe that would do the trick? Or some other conversion method?

Part 3. Assuming that the previous step was finished, we now have a finite set of values in the desired type and a comparison operator on that set. Find the max.

But what if...

...we don't know the exact number of members of the given type? Than we over-estimate. If we can't produce a reasonable over-estimate, than there should be physical bounds on the number. Once we have an over-estimate, we check all of those possible bit patters to confirm which bit patters represent members of the type. After discarding those which aren't used, we now have a set of all possible bit patterns which represent some member of the given type. This most recently generated set is what we'd use now at part 1.

...we don't have a comparison operator in that type? Than the specific problem is not only impossible, but logically irrelevant. That is, if our program doesn't have access to give a meaningful result to if we compare two values from our given type, than our given type has no ordering in the context of our program. Without an ordering, there's no such thing as a maximum value.

...we can't convert a given binary number into a given type? Then the method breaks. But similar to the previous exception, if you can't convert types, than our tool-set seems logically very limited.

Technically, you may not need to convert between binary representations and a given type. The entire point of the conversion is to insure the generated list is exhaustive.

...we want to optimize the problem? Than we need some information about how the given type maps from binary numbers. For example, unsigned int, signed int (2's compliment), and signed int (1's compliment) each map from bits into numbers in a very documented and simple way. Thus, if we wanted the highest possible value for unsigned int and we knew we were working with m bits, than we could simply fill each bit with a 1, convert the bit pattern to decimal, then output the number.

This relates to optimization because the most expensive part of this solution is the listing of all possible answers. If we have some previous knowledge of how the given type maps from bit patterns, we can generate a subset of all possibilities by making instead all potential candidates.

Good luck.


Update: Thankfully, my previous answer below was wrong, and there seems to be a solution to this question.

intmax_t x;
for (x=INTMAX_MAX; (T)x!=x; x/=2);

This program either yields x containing the max possible value of type T, or generates an implementation-defined signal.

Working around the signal case may be possible but difficult and computationally infeasible (as in having to install a signal handler for every possible signal number), so I don't think this answer is fully satisfactory. POSIX signal semantics may give enough additional properties to make it feasible; I'm not sure.

The interesting part, especially if you're comfortable assuming you're not on an implementation that will generate a signal, is what happens when (T)x results in an implementation-defined conversion. The trick of the above loop is that it does not rely at all on the implementation's choice of value for the conversion. All it relies upon is that (T)x==x is possible if and only if x fits in type T, since otherwise the value of x is outside the range of possible values of any expression of type T.


Old idea, wrong because it does not account for the above (T)x==x property:

I think I have a sketch of a proof that what I'm looking for is impossible:

  1. Let X be a conforming C implementation and assume INT_MAX>32767.
  2. Define a new C implementation Y identical to X, but where the values of INT_MAX and INT_MIN are each divided by 2.
  3. Prove that Y is a conforming C implementation.

The essential idea of this outline is that, due to the fact that everything related to out-of-bound values with signed types is implementation-defined or undefined behavior, an arbitrary number of the high value bits of a signed integer type can be considered as padding bits without actually making any changes to the implementation except the limit macros in limits.h.

Any thoughts on if this sounds correct or bogus? If it's correct, I'd be happy to award the bounty to whoever can do the best job of making it more rigorous.


I might just be writing stupid things here, since I'm relatively new to C, but wouldn't this work for getting the max of a signed?

unsigned x = ~0;
signed y=x/2;

This might be a dumb way to do it, but as far as I've seen unsigned max values are signed max*2+1. Won't it work backwards?

Sorry for the time wasted if this proves to be completely inadequate and incorrect.


Shouldn't something like the following pseudo code do the job?

signed_type_of_max_size test_values =
    [(1<<7)-1, (1<<15)-1, (1<<31)-1, (1<<63)-1];

for test_value in test_values:
    signed_foo_t a = test_value;
    signed_foo_t b = a + 1;
    if (b < a):
        print "Max positive value of signed_foo_t is ", a

Or much simpler, why shouldn't the following work?

signed_foo_t signed_foo_max = (1<<(sizeof(signed_foo_t)*8-1))-1;

For my own code, I would definitely go for a build-time check defining a preprocessor macro, though.


Assuming modifying padding bits won't create trap representations, you could use an unsigned char * to loop over and flip individual bits until you hit the sign bit. If your initial value was ~(type)0, this should get you the maximum:

type value = ~(type)0;
assert(value < 0);

unsigned char *bytes = (void *)&value;
size_t i = 0;
for(; i < sizeof value * CHAR_BIT; ++i)
{
    bytes[i / CHAR_BIT] ^= 1 << (i % CHAR_BIT);
    if(value > 0) break;
    bytes[i / CHAR_BIT] ^= 1 << (i % CHAR_BIT);
}

assert(value != ~(type)0);
// value == TYPE_MAX


Since you allow this to be at runtime you could write a function that de facto does an iterative left shift of (type)3. If you stop once the value is fallen below 0, this will never give you a trap representation. And the number of iterations - 1 will tell you the position of the sign bit.

Remains the problem of the left shift. Since just using the operator << would lead to an overflow, this would be undefined behavior, so we can't use the operator directly.

The simplest solution to that is not to use a shifted 3 as above but to iterate over the bit positions and to add always the least significant bit also.

type x;
unsigned char*B = &x;
size_t signbit = 7;
for(;;++signbit) {
  size_t bpos = signbit / CHAR_BIT;
  size_t apos = signbit % CHAR_BIT;
  x = 1;
  B[bpos] |= (1 << apos);
  if (x < 0) break;
}

(The start value 7 is the minimum width that a signed type must have, I think).


Why would this present a problem? The size of the type is fixed at compile time, so the problem of determining the runtime size of the type reduces to the problem of determining the compile-time size of the type. For any given target platform, a declaration such as off_t offset will be compiled to use some fixed size, and that size will then always be used when running the resulting executable on the target platform.

ETA: You can get the size of the type type via sizeof(type). You could then compare against common integer sizes and use the corresponding MAX/MIN preprocessor define. You might find it simpler to just use:

uintmax_t bitWidth = sizeof(type) * CHAR_BIT;
intmax_t big2 = 2;  /* so we do math using this integer size */
intmax_t sizeMax = big2^bitWidth - 1;
intmax_t sizeMin = -(big2^bitWidth - 1);

Just because a value is representable by the underlying "physical" type does not mean that value is valid for a value of the "logical" type. I imagine the reason max and min constants are not provided is that these are "semi-opaque" types whose use is restricted to particular domains. Where less opacity is desirable, you will often find ways of getting the information you want, such as the constants you can use to figure out how big an off_t is that are mentioned by the SUSv2 in its description of <unistd.h>.


For an opaque signed type for which you don't have a name of the associated unsigned type, this is unsolvable in a portable way, because any attempt to detect whether there is a padding bit will yield implementation-defined behavior or undefined behavior. The best thing you can deduce by testing (without additional knowledge) is that there are at least K padding bits.

BTW, this doesn't really answer the question, but can still be useful in practice: If one assumes that the signed integer type T has no padding bits, one can use the following macro:

#define MAXVAL(T) (((((T) 1 << (sizeof(T) * CHAR_BIT - 2)) - 1) * 2) + 1)

This is probably the best that one can do. It is simple and does not need to assume anything else about the C implementation.


Maybe I'm not getting the question right, but since C gives you 3 possible representations for signed integers (http://port70.net/~nsz/c/c11/n1570.html#6.2.6.2):

  • sign and magnitude
  • ones' complement
  • two's complement

and the max in any of these should be 2^(N-1)-1, you should be able to get it by taking the max of the corresponding unsigned, >>1-shifting it and casting the result to the proper type (which it should fit).

I don't know how to get the corresponding minimum if trap representations get in the way, but if they don't the min should be either (Tp)((Tp)-1|(Tp)TP_MAX(Tp)) (all bits set) (Tp)~TP_MAX(Tp) and which it is should be simple to find out.

Example:

#include <limits.h>
#define UNSIGNED(Tp,Val) \
    _Generic((Tp)0, \
            _Bool: (_Bool)(Val), \
            char: (unsigned char)(Val), \
            signed char: (unsigned char)(Val), \
            unsigned char: (unsigned char)(Val), \
            short: (unsigned short)(Val), \
            unsigned short: (unsigned short)(Val), \
            int: (unsigned int)(Val), \
            unsigned int: (unsigned int)(Val), \
            long: (unsigned long)(Val), \
            unsigned long: (unsigned long)(Val), \
            long long: (unsigned long long)(Val), \
            unsigned long long: (unsigned long long)(Val) \
            )
#define MIN2__(X,Y) ((X)<(Y)?(X):(Y))
#define UMAX__(Tp) ((Tp)(~((Tp)0)))
#define SMAX__(Tp) ((Tp)( UNSIGNED(Tp,~UNSIGNED(Tp,0))>>1 ))
#define SMIN__(Tp) ((Tp)MIN2__( \
                    (Tp)(((Tp)-1)|SMAX__(Tp)), \
                    (Tp)(~SMAX__(Tp)) ))
#define TP_MAX(Tp) ((((Tp)-1)>0)?UMAX__(Tp):SMAX__(Tp))
#define TP_MIN(Tp) ((((Tp)-1)>0)?((Tp)0): SMIN__(Tp))
int main()
{
#define STC_ASSERT(X) _Static_assert(X,"")
    STC_ASSERT(TP_MAX(int)==INT_MAX);
    STC_ASSERT(TP_MAX(unsigned int)==UINT_MAX);
    STC_ASSERT(TP_MAX(long)==LONG_MAX);
    STC_ASSERT(TP_MAX(unsigned long)==ULONG_MAX);
    STC_ASSERT(TP_MAX(long long)==LLONG_MAX);
    STC_ASSERT(TP_MAX(unsigned long long)==ULLONG_MAX);

    /*STC_ASSERT(TP_MIN(unsigned short)==USHRT_MIN);*/
    STC_ASSERT(TP_MIN(int)==INT_MIN);
    /*STC_ASSERT(TP_MIN(unsigned int)==UINT_MIN);*/
    STC_ASSERT(TP_MIN(long)==LONG_MIN);
    /*STC_ASSERT(TP_MIN(unsigned long)==ULONG_MIN);*/
    STC_ASSERT(TP_MIN(long long)==LLONG_MIN);
    /*STC_ASSERT(TP_MIN(unsigned long long)==ULLONG_MIN);*/

    STC_ASSERT(TP_MAX(char)==CHAR_MAX);
    STC_ASSERT(TP_MAX(signed char)==SCHAR_MAX);
    STC_ASSERT(TP_MAX(short)==SHRT_MAX);
    STC_ASSERT(TP_MAX(unsigned short)==USHRT_MAX);

    STC_ASSERT(TP_MIN(char)==CHAR_MIN);
    STC_ASSERT(TP_MIN(signed char)==SCHAR_MIN);
    STC_ASSERT(TP_MIN(short)==SHRT_MIN);
}


For all real machines, (two's complement and no padding):

type tmp = ((type)1)<< (CHAR_BIT*sizeof(type)-2);
max = tmp + (tmp-1);

With C++, you can calculate it at compile time.

template <class T>
struct signed_max
{
      static const T max_tmp =  T(T(1) << sizeof(T)*CO_CHAR_BIT-2u);    
      static const T value = max_tmp + T(max_tmp -1u);
};
0

精彩评论

暂无评论...
验证码 换一张
取 消