开发者

Can a pointer (address) ever be negative?

开发者 https://www.devze.com 2023-01-08 03:02 出处:网络
I have a function that I would like to be able to return special values for failure and uninitialized (it returns a pointer on success).

I have a function that I would like to be able to return special values for failure and uninitialized (it returns a pointer on success).

Currently it returns NULL for failure, and -1 for uninitialized, and this se开发者_如何转开发ems to work... but I could be cheating the system. IIRC, addresses are always positive, are they not? (although since the compiler is allowing me to set an address to -1, this seems strange).

[update]

Another idea I had (in the event that -1 was risky) is to malloc a char @ the global scope, and use that address as a sentinel.


No, addresses aren't always positive - on x86_64, pointers are sign-extended and the address space is clustered symmetrically around 0 (though it is usual for the "negative" addresses to be kernel addresses).

However the point is mostly moot, since C only defines the meaning of < and > pointer comparisons between pointers that are to part of the same object, or one past the end of an array. Pointers to completely different objects cannot be meaningfully compared other than for exact equality, at least in standard C - if (p < NULL) has no well defined semantics.

You should create a dummy object with static storage duration and use its address as your unintialised value:

extern char uninit_sentinel;
#define UNINITIALISED ((void *)&uninit_sentinel)

It's guaranteed to have a single, unique address across your program.


The valid values for a pointer are entirely implementation-dependent, so, yes, a pointer address could be negative.

More importantly, however, consider (as an example of a possible implementation choice) the case where you are on a 32-bit platform with a 32-bit pointer size. Any value that can be represented by that 32-bit value might be a valid pointer. Other than the null pointer, any pointer value might be a valid pointer to an object.

For your specific use case, you should consider returning a status code and perhaps taking the pointer as a parameter to the function.


It's generally a bad design to try to multiplex special values onto a return value... you're trying to do too much with a single value. It would be cleaner to return your "success pointer" via argument, rather than the return value. That leaves lots of non-conflicting space in the return value for all of the conditions you want to describe:

int SomeFunction(SomeType **p)
{
    *p = NULL;
    if (/* check for uninitialized ... */)
        return UNINITIALIZED;
    if (/* check for failure ... */)
        return FAILURE;

    *p = yourValue;
    return SUCCESS;
}

You should also do typical argument checking (ensure that 'p' isn't NULL).


The C language does not define the notion of "negativity" for pointers. The property of "being negative" is a chiefly arithmetical one, not in any way applicable to values of pointer type.

If you have a pointer-returning function, then you cannot meaningfully return the value of -1 from that function. In C language integral values (other than zero) are not implicitly convertible to pointer types. An attempt to return -1 from a pointer-returning function is an immediate constraint violation that will result in diagnostic message. In short, it is an error. If your compiler allows it, it simply means that it doesn't enforce that constraint too strictly (most of the time they do it for compatibility with pre-standard code).

If you force the value of -1 to pointer type by an explicit cast, the result of the cast will be implementation-defined. The language itself makes no guarantees about it. It might easily prove to be the same as some other, valid pointer value.

If you want to create a reserved pointer value, there no need to malloc anything. You can simple declare a global variable of the desired type and use its address as the reserved value. It is guaranteed to be unique.


Pointers can be negative like an unsigned integer can be negative. That is, sure, in a two's-complement interpretation, you could interpret the numerical value to be negative because the most-significant-bit is on.


What's the difference between failure and unitialized. If unitialized is not another kind of failure, then you probably want to redesign the interface to separate these two conditions.

Probably the best way to do this is to return the result through a parameter, so the return value only indicates an error. For example where you would write:

void* func();

void* result=func();
if (result==0)
  /* handle error */
else if (result==-1)
  /* unitialized */
else
  /* initialized */

Change this to

// sets the *a to the returned object
// *a will be null if the object has not been initialized
// returns true on success, false otherwise
int func(void** a);

void* result;
if (func(&result)){
  /* handle error */
  return;
}

/*do real stuff now*/
if (!result){
  /* initialize */
}
/* continue using the result now that it's been initialized */


@James is correct, of course, but I'd like to add that pointers don't always represent absolute memory addresses, which theoretically would always be positive. Pointers also represent relative addresses to some point in memory, often a stack or frame pointer, and those can be both positive and negative.

So your best bet is to have your function accept a pointer to a pointer as a parameter and fill that pointer with a valid pointer value on success while returning a result code from the actual function.


James answer is probably correct, but of course describes an implementation choice, not a choice that you can make.

Personally, I think addresses are "intuitively" unsigned. Finding a pointer that compares as less-than a null pointer would seem wrong. But ~0 and -1, for the same integer type, give the same value. If it's intuitively unsigned, ~0 may make a more intuitive special-case value - I use it for error-case unsigned ints quite a lot. It's not really different (zero is an int by default, so ~0 is -1 until you cast it) but it looks different.

Pointers on 32-bit systems can use all 32 bits BTW, though -1 or ~0 is an extremely unlikely pointer to occur for a genuine allocation in practice. There are also platform-specific rules - for example on 32-bit Windows, a process can only have a 2GB address space, and there's a lot of code around that encodes some kind of flag into the top bit of a pointer (e.g. for balancing flags in balanced binary trees).


Actually, (at least on x86), the NULL-pointer exception is generated not only by dereferencing the NULL pointer, but by a larger range of addresses (eg, first 65kb). This helps catching such errors as

int* x = NULL;
x[10] = 1;

So, there are more addresses that are garanteed to generate the NULL pointer exception when dereferenced. Now consider this code (made compilable for AndreyT):

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#define ERR_NOT_ENOUGH_MEM (int)NULL
#define ERR_NEGATIVE       (int)NULL + 1
#define ERR_NOT_DIGIT      (int)NULL + 2

char* fn(int i){
    if (i < 0)
        return (char*)ERR_NEGATIVE;
    if (i >= 10)
        return (char*)ERR_NOT_DIGIT;
    char* rez = (char*)malloc(strlen("Hello World ")+sizeof(char)*2);
    if (rez)
        sprintf(rez, "Hello World %d", i);
    return rez;
};

int main(){
    char* rez = fn(3);
    switch((int)rez){
        case ERR_NOT_ENOUGH_MEM:    printf("Not enough memory!\n"); break;
        case ERR_NEGATIVE:          printf("The parameter was negative\n"); break;
        case ERR_NOT_DIGIT:         printf("The parameter is not a digit\n"); break;
        default:                    printf("we received %s\n", rez);
    };
    return 0;
};

this could be useful in some cases. It won't work on some Harvard architectures, but will work on von Neumann ones.


Do not use malloc for this purpose. It might keep unnecessary memory tied up (if a lot of memory is already in use when malloc gets called and the sentinel gets allocated at a high address, for example) and it confuses memory debuggers/leak detectors. Instead simply return a pointer to a local static const char object. This pointer will never compare equal to any pointer the program could obtain in any other way, and it only wastes one byte of bss.


You don't need to care about the signness of a pointer, because it's implementation defined. The real question here is "how to return special values from a function returning pointer?" which I've explained in detail in my answer to the question Pointer address span on various platforms

In summary, the all-one bit pattern (-1) is (almost) always safe, because it's already at the end of the spectrum and data cannot be stored wrapped around to the first address, and the malloc family never returns -1. In fact this value is even returned by many Linux system calls and Win32 APIs to indicate another state for the pointer. So if you need just failure and uninitialized then it's a good choice

But you can return far more error states by utilizing the fact that variables must be aligned properly (unless you specified some other options). For example in a pointer to int32_t the low 2 bits are always zero which means only ¹⁄₄ of the possible values are valid addresses, leaving all of the remaining bit patterns for you to use. So a simple solution would be just checking the lowest bit

int* result = func();
if (!result)
    error_happened();
else if ((uintptr_t)result & 1)
    uninitialized();

In this case you can return both a valid pointer and some additional data at the same time

You can also use the high bits for storing data in 64-bit systems. On ARM there's a flag that tells the CPU to ignore the high bits in the addresses. On x86 there isn't a similar thing but you can still use those bits as long as you make it canonical before dereferencing. See Using the extra 16 bits in 64-bit pointers

See also

  • Is ((void *) -1) a valid address?


NULL is the only valid error return in this case, this is true anytime an unsigned value such as a pointer is returned. It may be true that in some cases pointers will not be large enough to use the sign bit as a data bit, however since pointers are controlled by the OS not the program I would not rely on this behavior.

Remember that a pointer is basically a 32-bit value; whether or not this is a possible negative or always positive number is just a matter of interpretation (i.e.) whether the 32nd bit is interpreted as the sign bit or as a data bit. So if you interpreted 0xFFFFFFF as a signed number it would be -1, if you interpreted it as an unsigned number it would be 4294967295. Technically, it is unlikely that a pointer would ever be this large, but this case should be considered anyway.

As far as an alternative you could use an additional out parameter (returning NULL for all failures), however this would require clients to create and pass a value even if they don't need to distinguish between specific errors.

Another alternative would be to use the GetLastError/SetLastError mechanism to provide additional error information (This would be specific to Windows, don't know if that is an issue or not), or to throw an exception on error instead.


Positive or negative is not a meaningful facet of pointer type. They pertain to signed integer including signed char, short, int etc.

People talk about negative pointer mostly in a situation that treats pointer's machine representation as an integer type. e.g. reinterpret_cast<intptr_t>(ptr). In this case, they are actually talking about the cast integer, not the pointer itself.

In some scenario I think pointer is inherently unsigned, we talk about address in terms below or above. 0xFFFF.FFFF is above 0x0AAAA.0000, which is intuitively for human beings. Although 0xFFFF.FFFF is actually a "negative" while 0x0AAA.0000 is positive.

But in other scenarios such as pointer subtraction (ptr1 - ptr2) that results in a signed value whose type is ptrdiff_t, it's inconsistent when you compare with integer's subtraction, signed_int_a - signed_int_b results in a signed int type, unsigned_int_a - unsigned_int_b produces an unsigned type. But for pointer subtraction, it produces a signed type, because the semantic is the distance between two pointers, the unit is number of elements.

In summary I suggest treating pointer type as standalone type, every type has it's set of operation on it. For pointers (excluding function pointer, member function pointer, and void *):

  1. List item
  2. +, +=

    ptr + any_integer_type

  3. -, -=

    ptr - any_integer_type

    ptr1 - ptr2

  4. ++ both prefix and postfix

  5. -- both prefix and postfix

Note there are no / * % operations for pointer. That's also supported that pointer should be treated as a standalone type, instead of "A type similar to int" or "A type whose underlying type is int so it should looks like int".

0

精彩评论

暂无评论...
验证码 换一张
取 消