What exactly is meant by "de-referencing a NULL pointer"?_问答_开发者

I am a complete novice to C, and during my university work I've come across comments in code that often refer 开发者_开发百科to de-referencing a NULL pointer. I do have a background in C#, I've been getting by that this might be similar to a "NullReferenceException" that you get in .Net, but now I am having serious doubts.

Can someone please explain to me in layman's terms exactly what this is and why it is bad?

A NULL pointer points to memory that doesn't exist. This may be address 0x00000000 or any other implementation-defined value (as long as it can never be a real address). Dereferencing it means trying to access whatever is pointed to by the pointer. The * operator is the dereferencing operator:

int a, b, c; // some integers
int *pi;     // a pointer to an integer

a = 5;
pi = &a; // pi points to a
b = *pi; // b is now 5
pi = NULL;
c = *pi; // this is a NULL pointer dereference

This is exactly the same thing as a NullReferenceException in C#, except that pointers in C can point to any data object, even elements inside an array.

Dereferencing just means accessing the memory value at a given address. So when you have a pointer to something, to dereference the pointer means to read or write the data that the pointer points to.

In C, the unary * operator is the dereferencing operator. If x is a pointer, then *x is what x points to. The unary & operator is the address-of operator. If x is anything, then &x is the address at which x is stored in memory. The * and & operators are inverses of each other: if x is any data, and y is any pointer, then these equations are always true:

*(&x) == x
&(*y) == y

A null pointer is a pointer that does not point to any valid data (but it is not the only such pointer). The C standard says that it is undefined behavior to dereference a null pointer. This means that absolutely anything could happen: the program could crash, it could continue working silently, or it could erase your hard drive (although that's rather unlikely).

In most implementations, you will get a "segmentation fault" or "access violation" if you try to do so, which will almost always result in your program being terminated by the operating system. Here's one way a null pointer could be dereferenced:

int *x = NULL;  // x is a null pointer
int y = *x;     // CRASH: dereference x, trying to read it
*x = 0;         // CRASH: dereference x, trying to write it

And yes, dereferencing a null pointer is pretty much exactly like a NullReferenceException in C# (or a NullPointerException in Java), except that the langauge standard is a little more helpful here. In C#, dereferencing a null reference has well-defined behavior: it always throws a NullReferenceException. There's no way that your program could continue working silently or erase your hard drive like in C (unless there's a bug in the language runtime, but again that's incredibly unlikely as well).

It means

myclass *p = NULL;
*p = ...;  // illegal: dereferencing NULL pointer
... = *p;  // illegal: dereferencing NULL pointer
p->meth(); // illegal: equivalent to (*p).meth(), which is dereferencing NULL pointer

myclass *p = /* some legal, non-NULL pointer */;
*p = ...;  // Ok
... = *p;  // Ok
p->meth(); // Ok, if myclass::meth() exists

basically, almost anything involving (*p) or implicitly involving (*p), e.g. p->... which is a shorthand for (*p). ...; except for pointer declaration.

From wiki

A null pointer has a reserved value, often but not necessarily the value zero, indicating that it refers to no object
..

Since a null-valued pointer does not refer to a meaningful object, an attempt to dereference a null pointer usually causes a run-time error.

int val =1;
int *p = NULL;
*p = val; // Whooosh!!!!

Quoting from wikipedia:

A pointer references a location in memory, and obtaining the value at the location a pointer refers to is known as dereferencing the pointer.

Dereferencing is done by applying the unary * operator on the pointer.

int x = 5;
int * p;      // pointer declaration
p = &x;       // pointer assignment
*p = 7;       // pointer dereferencing, example 1
int y = *p;   // pointer dereferencing, example 2

"Dereferencing a NULL pointer" means performing *p when the p is NULL

A NULL pointer points to memory that doesn't exist, and will raise Segmentation fault. There's an easier way to de-reference a NULL pointer, take a look.

int main(int argc, char const *argv[])
{
    *(int *)0 = 0; // Segmentation fault (core dumped)
    return 0;
}

Since 0 is never a valid pointer value, a fault occurs.

SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL}

Lots of confusion and confused answers here. First of all, there is strictly speaking nothing called a "NULL pointer". There are null pointers, null pointer constants and the NULL macro.

Start by studying my answer from Codidact: What's the difference between null pointers and NULL? Quoting some parts of it here:

There are three different, related concepts that are easy to mix up:

null pointers

null pointer constants

the NULL macro

Formal definitions

The first two of these terms are formally defined in C17 6.3.2.3/3:

An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant.⁶⁷⁾ If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.

In other words, a null pointer is a pointer of any type pointing at a well-defined "nowhere". Any pointer can turn into a null pointer when it is assigned a null pointer constant.

The standard mentions 0 and (void*)0 as two valid null pointer constants, but note that it says "an integer constant expression with the value 0". This means that things like 0u, 0x00 and other variations are also null pointer constants. These are particular special cases that can be assigned to any pointer type, regardless of the various type compatibility rules that would normally apply.

Notably, both object pointers and function pointers can be null pointers. Meaning that we must be able to assign null pointer constants to them, no matter the actual pointer type.

NULL

The note 67) from above adds (not normative):

⁶⁷⁾ The macro NULL is defined in <stddef.h> (and other headers) as a null pointer constant; see 7.19.

where 7.19 simply defines NULL as (normative):

NULL which expands to an implementation-defined null pointer constant;

In theory this could perhaps be something other than 0 and (void*)0, but the implementation-defined part is more likely saying that NULL can either be #define NULL 0 or #define NULL (void*)0 or some other integer constant expression with the value zero, depending on the C library used. But all we need to know and care about is that NULL is a null pointer constant.

NULL is also the preferred null pointer constant to use in C code, because it is self-documenting and unambiguous (unlike 0). It should only be used together with pointers and not for any other purpose.

Additionally, do not mix this up with "null termination of strings", which is an entirely separate topic. Null termination of strings is just a value zero, often referred to either as nul (one L) or '\0' (an octal escape sequence), just to separate it from null pointers and NULL.

Dereferencing

Having cleared that out, we cannot access what a null pointer points at, because it is as mentioned a well-defined "nowhere". The process of accessing what a pointer points at is known as dereferencing, and is done in C (and C++) through the unary * indirection operator. The C standard specifying how this operator works simply states (C17 6.5.3.3):

If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined

Where an informative note adds:

Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime.

And this would be where "segmentation faults" or "null pointer/reference exceptions" might be thrown. The reason for such is almost always an application bug such as these examples:

int* a = NULL; // create a null pointer by initializing with a null pointer constant
*a = 1;        // null pointer is dereferenced, undefined behavior

int* b = 0;    // create a null pointer by initializing with a null pointer constant
               // not to be confused with similar looking dereferencing and assignment:
*b = 0;        // null pointer is dereferenced, undefined behavior

Let's look at an example of dereferencing a NULL pointer, and talk about it.

Here is an example of dereferencing a NULL pointer, from this duplicate question here: uint32_t *ptr = NULL;:

int main (void) 
{
    uint32_t *ptr = NULL;
    
    // `*ptr` dereferences the NULL ptr
    *ptr = 0;
    
    return 0;
}

Memory hasn't been allocated for the uint32_t, so calling *ptr, which "dereferences" the pointer, ptr, or otherwise said: accesses memory at an unallocated (NULL--usually 0, but implementation-defined) address, is illegal. It is "undefined behavior"--ie: a bug.

So, you should statically (preferred, where possible), or dynamically allocate space for a uint32_t and then only dereference a pointer which points to valid memory, as follows.

Here is how to statically allocate memory and use it with a pointer. Note even that the memory for the pointer itself is statically allocated in my example!:

// allocate enough memory for a 4-byte (32-bit) variable
uint32_t variable;

// allocate enough memory for a pointer, which is **usually** 2 bytes on an
// 8-bit microcontroller such as Arduino, or usually 4 bytes on a 32-bit
// architecture, or usually 8 bytes on a 64-bit Linux computer, for example 
uint32_t* ptr;

// assign the address of `variable` to the pointer; you can now say that
// `ptr` "points to" the variable named `variable`; in literal terms, `ptr` now
// contains the numerical value of the address of the first byte of the
// variable `variable`
ptr = &variable;

// Store a number into the 4-byte variable named `variable`, via a pointer to it
*ptr = 1234;
// OR, same exact thing as just above: store a number into that 4-byte
// variable, but this time via the variable name, `variable`, directly
variable = 1234;

Note, dynamic allocation is fine too, but static memory allocation is safer, deterministic, faster, better for memory-constrained embedded systems, blah blah blah. The point is simply that you cannot legally dereference any pointer (meaning: put an asterisk "dereference operator" in front of it, like *ptr) which does not point to a chunk of allocated memory. I generally allocate memory statically by declaring a variable.