开发者

C: Why do unassigned pointers point to unpredictable memory and NOT point to NULL?

开发者 https://www.devze.com 2023-01-04 23:13 出处:网络
A long time ago I used to program in C for school. I remember something that I really hated about C: unassigned pointers do not point to NULL.

A long time ago I used to program in C for school. I remember something that I really hated about C: unassigned pointers do not point to NULL.

I asked many people including teac开发者_运维技巧hers why in the world would they make the default behavior of an unassigned pointer not point to NULL as it seems far more dangerous for it to be unpredictable.

The answer was supposedly performance but I never bought that. I think many many bugs in the history of programming could have been avoided had C defaulted to NULL.

Here some C code to point out (pun intended) what I am talking about:

#include <stdio.h>

void main() {

  int * randomA;
  int * randomB;
  int * nullA = NULL;
  int * nullB = NULL;


  printf("randomA: %p, randomB: %p, nullA: %p, nullB: %p\n\n", 
     randomA, randomB, nullA, nullB);
}

Which compiles with warnings (Its nice to see the C compilers are much nicer than when I was in school) and outputs:

randomA: 0xb779eff4, randomB: 0x804844b, nullA: (nil), nullB: (nil)


Actually, it depends on the storage of the pointer. Pointers with static storage are initizalized with null pointers. Pointers with automatic storage duration are not initialized. See ISO C 99 6.7.8.10:

If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static storage duration is not initialized explicitly, then:

  • if it has pointer type, it is initialized to a null pointer;
  • if it has arithmetic type, it is initialized to (positive or unsigned) zero;
  • if it is an aggregate, every member is initialized (recursively) according to these rules;
  • if it is a union, the first named member is initialized (recursively) according to these rules.

And yes, objects with automatic storage duration are not initialized for performance reasons. Just imagine initializing a 4K array on every call to a logging function (something I saw on a project I worked on, thankfully C let me avoid the initialization, resulting in a nice performance boost).


Because in C, declaration and initialisation are deliberately different steps. They are deliberately different because that is how C is designed.

When you say this inside a function:

void demo(void)
{
    int *param;
    ...
}

You are saying, "my dear C compiler, when you create the stack frame for this function, please remember to reserve sizeof(int*) bytes for storing a pointer." The compiler does not ask what's going there - it assumes you're going to tell it soon. If you don't, maybe there's a better language for you ;)

Maybe it wouldn't be diabolically hard to generate some safe stack clearing code. But it'd have to be called on every function invocation, and I doubt that many C developers would appreciate the hit when they're just going to fill it themselves anyway. Incidentally, there's a lot you can do for performance if you're allowed to be flexible with the stack. For example, the compiler can make the optimisation where...

If your function1 calls another function2 and stores its return value, or maybe there are some parameters passed in to function2 that aren't changed inside function2... we don't have to create extra space, do we? Just use the same part of the stack for both! Note that this is in direct conflict with the concept of initialising the stack before every use.

But in a wider sense, (and to my mind, more importantly) it's aligned with C's philosophy of not doing very much more than is absolutely necessary. And this applies whether you're working on a PDP11, a PIC32MX (what I use it for) or a Cray XT3. It's exactly why people might choose to use C instead of other languages.

  • If I want to write a program with no trace of malloc and free, I don't have to! No memory management is forced upon me!
  • If I want to bit-pack and type-pun a data union, I can! (As long as I read my implementation's notes on standard adherence, of course.)
  • If I know exactly what I'm doing with my stack frame, the compiler doesn't have to do anything else for me!

In short, when you ask the C compiler to jump, it doesn't ask how high. The resulting code probably won't even come back down again.

Since most people who choose to develop in C like it that way, it has enough inertia not to change. Your way might not be an inherently bad idea, it's just not really asked for by many other C developers.


It's for performance.

C was first developed around the time of the PDP 11, for which 60k was a common maximum amount of memory, many will have had a lot less. Unnecessary assignments would be particularly expensive is this kind of environment

These days there are many many embedded devices that use C for which 60k of memory would seem infinite, the PIC 12F675 has 1k of memory.


This is because when you declare a pointer, your C compiler will just reserve the necessary space to put it. So when you run your program, this very space can already have a value in it, probably resulting of a previous data allocated on this part of the memory.

The C compiler could assign this pointer a value, but this would be a waste of time in most cases since you are excepted to assign a custom value yourself in some part of the code.

That is why good compilers give warning when you do not initialize your variables; so I don't think that there are so much bugs because of this behavior. You just have to read the warnings.


Pointers are not special in this regard; other types of variables have exactly the same issue if you use them uninitialised:

int a;
double b;

printf("%d, %f\n", a, b);

The reason is simple: requiring the runtime to set uninitialised values to a known value adds an overhead to each function call. The overhead might not be much with a single value, but consider if you have a large array of pointers:

int *a[20000];


When you declare a (pointer) variable at the beginning of the function, the compiler will do one of two things: set aside a register to use as that variable, or allocate space on the stack for it. For most processors, allocating the memory for all local variables in the stack is done with one instruction; it figures out how much memory all the local vars will need, and pulls down (or pushes up, on some processors) the stack pointer by that much. Whatever is already in that memory at the time is not changed unless you explicitely change it.

The pointer is not "set" to a "random" value. Before allocation, the stack memory below the stack pointer (SP) contains whatever is there from earlier use:

         .
         .
 SP ---> 45
         ff
         04
         f9
         44
         23
         01
         40
         . 
         .
         .

After it allocates memory for a local pointer, the only thing that has changed is the stack pointer:

         .
         .
         45
         ff |
         04 | allocated memory for pointer.
         f9 |
 SP ---> 44 |
         23
         01
         40
         . 
         .
         .

This allows the compiler to allocate all local vars in one instruction that moves the stack pointer down the stack (and free them all in one instruction, by moving the stack pointer back up), but forces you to initialize them yourself, if you need to do that.

In C99, you can mix code and declarations, so you can postpone your declaration in the code until you are able to initialize it. This will allow you to avoid having to set it to NULL.


First, forced initialization doesn't fix bugs. It masks them. Using a variable that doesn't have a valid value (and what that is varies by application) is a bug.

Second, you can often do your own initialization. Instead of int *p;, write int *p = NULL; or int *p = 0;. Use calloc() (which initializes memory to zero) rather than malloc() (which doesn't). (No, all bits zero doesn't necessarily mean NULL pointers or floating-point values of zero. Yes, it does on most modern implementations.)

Third, the C (and C++) philosophy is to give you the means to do something fast. Suppose you have the choice of implementing, in the language, a safe way to do something and a fast way to do something. You can't make a safe way any faster by adding more code around it, but you can make a fast way safer by doing so. Moreover, you can sometimes make operations fast and safe, by ensuring that the operation is going to be safe without additional checks - assuming, of course, that you have the fast option to begin with.

C was originally designed to write an operating system and associated code in, and some parts of operating systems have to be as fast as possible. This is possible in C, but less so in safer languages. Moreover, C was developed when the largest computers were less powerful than the telephone in my pocket (which I'm upgrading soon because it's feeling old and slow). Saving a few machine cycles in frequently used code could have visible results.


So, to sum up what ninjalj explained, if you change your example program slightly you pointers will infact initialize to NULL:

#include <stdio.h>

// Change the "storage" of the pointer-variables from "stack" to "bss"  
int * randomA;
int * randomB;

void main() 
{
  int * nullA = NULL;
  int * nullB = NULL;

  printf("randomA: %p, randomB: %p, nullA: %p, nullB: %p\n\n", 
     randomA, randomB, nullA, nullB);
}

On my machine this prints

randomA: 00000000, randomB: 00000000, nullA: 00000000, nullB: 00000000


I think it comes from the following: there's no reason why memories should contain (when powered up) specific values (0, NULL or whatever). So, if not previously specifically written, a memory location can contain whatever value, that from your point of view, is anyway random (but that very location could have been used before by some other software, and so contain a value that was meaningful for that application, e.g. a counter, but from "your" point of view, is just a random number). To initialize it to a specific value, you need at least one instruction more; but there are situation where you don't need this initialization a priori, e.g. v = malloc(x) will assign to v a valid address or NULL, no matter the initial content of v. So, initializing it could be considered a waste of time, and a language (like C) can choose not to do it a priori. Of course, nowadays this is mainly insignificant, and there are languages where uninitialized variables have default values (null for pointers, when supported; 0/0.0 for numerical... and so on; lazy initialization of course make it not so expensive to initialize an array of say 1 million elements, since they are initialized for real only if accessed before an assignment).


The idea that this has anything to do with random memory contents when a machine is powered up is bogus, except on embedded systems. Any machine with virtual memory and a multiprocess/multiuser operating system will initialize memory (usually to 0) before giving it to a process. Failure to do so would be a major security breach. The 'random' values in automatic-storage variables come from previous use of the stack by the same process. Similarly, the 'random' values in memory returned by malloc/new/etc. come from previous allocations (that were subsequently freed) in the same process.


For it to point to NULL it would have to have NULL assigned to it ( even if it was done automatically and transparently ).

So, to answer your question, the reason a pointer can't be both unassigned and NULL is because a pointer can not be both not assigned and assigned at the same time.

0

精彩评论

暂无评论...
验证码 换一张
取 消