开发者

What does it mean to be "terminated by a zero"?

开发者 https://www.devze.com 2022-12-27 04:06 出处:网络
I am getting into C/C++ and a lot of terms are popping up unfamiliar to me. One of them is a variable or pointer that is terminated by a zero. What do开发者_运维百科es it mean for a space in memory to

I am getting into C/C++ and a lot of terms are popping up unfamiliar to me. One of them is a variable or pointer that is terminated by a zero. What do开发者_运维百科es it mean for a space in memory to be terminated by a zero?


Take the string Hi in ASCII. Its simplest representation in memory is two bytes:

0x48
0x69

But where does that piece of memory end? Unless you're also prepared to pass around the number of bytes in the string, you don't know - pieces of memory don't intrinsically have a length.

So C has a standard that strings end with a zero byte, also known as a NUL character:

0x48
0x69
0x00

The string is now unambiguously two characters long, because there are two characters before the NUL.


It's a reserved value to indicate the end of a sequence of (for example) characters in a string.

More correctly known as null (or NUL) terminated. This is because the value used is zero, rather than being the character code for '0'. To clarify the distinction check out a table of the ASCII character set.

This is necessary because languages like C have a char data type, but no string data type. Therefore it is left to the devleoper to decide how to manage strings in their application. The usual way of doing this is to have an array of chars with a null value used to terminate (i.e. signify the end of) the string.

Note that there is a distinction between the length of the string, and the length of the char array that was originally declared.

char name[50];

This declares an array of 50 characters. However, these values will be uninitialised. So if I want to store the string "Hello" (5 characters long) I really don't want to bother setting the remaining 45 characters to spaces (or some other value). Instead I store a NUL value after the last character in my string.

More recent languages such as Pascal, Java and C# have a specific string type defined. These have a header value to indicate the number of characters in the string. This has a couple of benefits; firstly you don't need to walk to the end of the string to find out its length, secondly your string can contain null characters.

Wikipedia has further information in the String (computer science) entry.


Arrays and string in C is just a pointers to a memory location. By pointer you can find a start of array. The end of array is undefined. The end of character array (which is the string) is zero-byte.

So, in memory string hello is written as:

68 65 6c 6c 6f 00                                 |hello|


It refers to how C strings are stored in memory. The NUL character represented by \0 in string iterals is present at the end of a C string in memory. There is no other meta data associated with a C string like length for example. Note the different spelling between NUL character and NULL pointer.


There are two common ways to handle arrays that can have varying-length contents (like Strings). The first is to separately keep the length of the data stored in the array. Languages like Fortran and Ada and C++'s std::string do this. The disadvantage to doing this is that you somehow have to pass that extra information to everything that is dealing with your array.

The other way, is to reserve an extra non-data element at the end of the array to serve as a sentinel. For the sentinel you use a value that should never appear in the actual data. For strings, 0 (or "NUL") is a good choice, as that is unprintable and serves no other purpose in ASCII. So what C (and many languages copied from C) do is to assume that all strings end (or "are terminated by") a 0.

There are several drawbacks to this. For one thing, it is slow. Any time a routine needs to know the length of the string, it is an O(n) operation (searching through the entire string looking for the 0). Another problem is that you may one day want to put a 0 in your string for some reason, so now you need a whole second set of string routines that ignore the null and use a separate length anyway (eg: strnlen() ). The third big problem is that if someone forgets to put that 0 at the end (or it gets wiped out somehow), the next string operation to do a lenth check will go merrily marching through memory until it either happens to randomly find another 0, crashes, or the user loses patience and kills it. Such bugs can be a serious PITA to track down.

For all these reasons, the C approach is generally viewed with disfavor.


C-style strings are terminated by a NUL character ('\0'). This provides a marker for functions that operate on strings (e.g. strlen, strcpy) to use to identify the end of the string.


While the classic example of "terminated by a zero" is that of strings in C, the concept is more general. It can be applied to any list of things stored in an array, the size of which is not known explicitly.

The trick is simply to avoid passing around an array size by appending a sentinel value to the end of the array. Typically, some form of a zero is used, but it can be anything else (like a NAN if the array contains floating point values).

Here are three examples of this concept:

  1. C strings, of course. A single zero character is appended to the string: "Hello" is encoded as 48 65 6c 6c 6f 00.

  2. Arrays of pointers naturally allow zero termination, because the null pointer (the one that points to address zero) is defined to never point to a valid object. As such, you might find code like this:

    Foo list[] = { somePointer, anotherPointer, NULL };
    bar(list);
    

    instead of

    Foo list[] = { somePointer, anotherPointer };
    bar(sizeof(list)/sizeof(*list), list);
    

    This is why the execvpe() only needs three arguments, two of which pass arrays of user defined length. Since all that's passed to execvpe() are (possibly lots of) strings, this little function actually sports two levels of zero termination: null pointers terminating the string lists, and null characters terminating the strings themselves.

  3. Even when the element type of the array is a more complex struct, it may still be zero terminated. In many cases, one of the struct members is defined to be the one that signals the end of the list. I have seen such function definitions, but I can't unearth a good example of this right now, sorry. Anyway, the calling code would look something like this:

    Foo list[] = {
        { someValue, somePointer },
        { anotherValue, anotherPointer },
        { 0, NULL }
    };
    bar(list);
    

    or even

    Foo list[] = {
        { someValue, somePointer },
        { anotherValue, anotherPointer },
        {}    //C zeros out an object initialized with an empty initializer list.
    };
    bar(list);
    
0

精彩评论

暂无评论...
验证码 换一张
取 消