开发者

Are string literals const?

开发者 https://www.devze.com 2023-01-31 17:44 出处:网络
Both GCC and Clang do not complain if I assign a string literal to a char*, even when using lots of pedantic options开发者_运维技巧 (-Wall -W -pedantic -std=c99):

Both GCC and Clang do not complain if I assign a string literal to a char*, even when using lots of pedantic options开发者_运维技巧 (-Wall -W -pedantic -std=c99):

char *foo = "bar";

while they (of course) do complain if I assign a const char* to a char*.

Does this mean that string literals are considered to be of char* type? Shouldn't they be const char*? It's not defined behavior if they get modified!

And (an uncorrelated question) what about command line parameters (ie: argv): is it considered to be an array of string literals?


They are of type char[N] where N is the number of characters including the terminating \0. So yes you can assign them to char*, but you still cannot write to them (the effect will be undefined).

Wrt argv: It points to an array of pointers to strings. Those strings are explicitly modifiable. You can change them and they are required to hold the last stored value.


For completeness sake the C99 draft standard(C89 and C11 have similar wording) in section 6.4.5 String literals paragraph 5 says:

[...]a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence;[...]

So this says a string literal has static storage duration(lasts the lifetime of the program) and it's type is char[](not char *) and its length is the size of the string literal with an appended zero. *Paragraph 6` says:

If the program attempts to modify such an array, the behavior is undefined.

So attempting to modify a string literal is undefined behavior regardless of the fact that they are not const.

With respect to argv in section 5.1.2.2.1 Program startup paragraph 2 says:

If they are declared, the parameters to the main function shall obey the following constraints:

[...]

-The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.

So argv is not considered an array of string literals and it is ok to modify the contents of argv.


Using -Wwrite-strings option you will get:

warning: initialization discards qualifiers from pointer target type

Irrespective of that option, GCC will put literals into read-only memory section, unless told otherwise by using -fwritable-strings (however this option has been removed from recent GCC versions).

Command line parameters are not const, they typically live on the stack.


(Sorry, I've only just noticed this question is tagged as c, not c++. Maybe my answer isn't so relevant to this question after all!)

String literals are not quite const or not-const, there is a special strange rule for literals.

(Summary: Literals can be taken by reference-to-array as foo( const char (&)[N]) and cannot be taken as the non-const array. They prefer to decay to const char *. So far, that makes it seem like they are const. But there is a special legacy rule which allows literals to decay to char *. See experiments below.)

(Following experiments done on clang3.3 with -std=gnu++0x. Perhaps this is a C++11 issue? Or specific to clang? Either way, there is something strange going on.)

At first, literals appears to be const:

void foo( const char  * ) { std::cout << "const char *" << std::endl; }
void foo(       char  * ) { std::cout << "      char *" << std::endl; }

int main() {
        const char arr_cc[3] = "hi";
        char arr_c[3] = "hi";

        foo(arr_cc); // const char *
        foo(arr_c);  //       char *
        foo("hi");   // const char *
}

The two arrays behave as expected, demonstrating that foo is able to tell us whether the pointer is const or not. Then "hi" selects the const version of foo. So it seems like that settles it: literals are const ... aren't they?

But, if you remove void foo( const char * ) then it gets strange. First, the call to foo(arr_c) fails with an error at compile time. That is expected. But the literal call (foo("hi")) works via the non-const call.

So, literals are "more const" than arr_c (because they prefer to decay to the const char *, unlike arr_c. But literals are "less const" than arr_cc because they are willing to decay to char * if needed.

(Clang gives a warning when it decays to char *).

But what about the decaying? Let's avoid it for simplicity.

Let's take the arrays by reference into foo instead. This gives us more 'intuitive' results:

void foo( const char  (&)[3] ) { std::cout << "const char (&)[3]" << std::endl; }
void foo(       char  (&)[3] ) { std::cout << "      char (&)[3]" << std::endl; }

As before, the literal and the const array (arr_cc) use the const version, and the non-const version is used by arr_c. And if we delete foo( const char (&)[3] ), then we get errors with both foo(arr_cc); and foo("hi");. In short, if we avoid the pointer-decay and use reference-to-array instead, literals behave as if they are const.

Templates?

In templates, the system will deduce const char * instead of char * and you're "stuck" with that.

template<typename T>
void bar(T *t) { // will deduce   const char   when a literal is supplied
    foo(t);
}

So basically, a literal behaves as const at all times, except in the particular case where you directly initialize a char * with a literal.


Johannes' answer is correct concerning the type and contents. But in addition to that, yes, it is undefined behavior to modify contents of a string literal.

Concerning your question about argv:

The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.


In both C89 and C99, string literals are of type char * (for historical reasons, as I understand it). You are correct that trying to modify one results in undefined behavior. GCC has a specific warning flag, -Wwrite-strings (which is not part of -Wall), that will warn you if you try to do so.

As for argv, the arguments are copied into your program's address space, and can safely be modified in your main() function.

EDIT: Whoops, had -Wno-write-strings copied by accident. Updated with the correct (positive) form of the warning flag.


String literals have formal type char [] but semantic type const char []. The purists hate it but this is generally useful and harmless, except for bringing lots of newbies to SO with "WHY IS MY PROGRAM CRASHING?!?!" questions.


They are const char*, but there is a specific exclusion for assigning them to char* for legacy code that existed before const did. And the command line arguments are definitely not literal, they are created at run-time.

0

精彩评论

暂无评论...
验证码 换一张
取 消