The code below is said to give a segmentation violation:
#include <stdio.h>
#include <s开发者_开发百科tring.h>
void function(char *str) {
char buffer[16];
strcpy(buffer,str);
}
int main() {
char large_string[256];
int i;
for( i = 0; i < 255; i++)
large_string[i] = 'A';
function(large_string);
return 1;
}
It's compiled and run like this:
gcc -Wall -Wextra hw.cpp && a.exe
But there is nothing output.
NOTE
The above code indeed overwrites the ret address and so on if you really understand what's going underneath.
The ret address will be 0x41414141
to be specific.
Important This requires profound knowledge of stack
You're just getting lucky. There's no reason that code has to generate a segmentation fault (or any other kind of error). It's still probably a bad idea, though. You can probably get it to fail by increasing the size of large_string
.
Probably in your implementation buffer
is immediately below large_string
on the stack. So when the call to strcpy
overflows buffer
, it's just writing most of the way into large_string
without doing any particular damage. It will write at least 255 bytes, but whether it writes more depends what's above large_string
(and the uninitialised value of the last byte of large_string). It seems to have stopped before doing any damage or segfaulting.
By fluke, the return address of the call to function
isn't being trashed. Either it's below buffer
on the stack or it's in a register, or maybe the function is inlined, I can't remember what no optimisation does. If you can't be bothered to check the disassembly, I can't either ;-). So you're returning and exiting without problems.
Whoever said that code would give a segfault probably isn't reliable. It results in undefined behaviour. On this occasion, the behaviour was to output nothing and exit.
[Edit: I checked on my compiler (GCC on cygwin), and for this code it is using the standard x86 calling convention and entry/exit code. And it does segfault.]
You're compiling a .cpp (c++) program by invoking gcc (instead of g++)... not sure if this is the cause, but on a linux system (it appears your running on windows due to the default .exe output) it throws the following error when trying to compile as you have stated:
/tmp/ccSZCCBR.o:(.eh_frame+0x12): undefined reference to `__gxx_personality_v0' collect2: ld returned 1 exit status
Its UB ( undefined behavior).
Strcpy
might have copied more bytes into memory pointed by buffer and it might not cause problem at that moment.
It's undefined behavior, which means that anything can happen. The program can even appear to work correctly.
It seem that you just happen to not overwrite any parts of memory that are still needed by the rest of the (short) program (or are out of the programs address space/write protected/...), so nothing special happens. At least nothing that would lead to any output.
There's a zero byte on the stack somewhere that stops the strcpy()
and there's enough room on the stack not to hit protected page. Try printing out strlen(buffer)
in that function. In any case the result is undefined behavior.
Get into habit of using strlcpy(3)
family of functions.
You can test this in other ways:
#include <stdlib.h>
int main() {
int *a=(int *)malloc(10*sizeof(int));
int i;
for (i=0;i<1000000; i++) a[i] = i;
return 0;
}
In my machine, this causes SIGSEGV only at around i = 37000! (tested by inspecting the core with gdb).
To guard against these problems, test your programs using a malloc debugger... and use lots of mallocs, since there are no memory debugging libraries that I know of that can look into static memory. Example: Electric Fence
gcc -g -Wall docore.c -o c -lefence
And now the SIGSEGV is triggered as soon as i=10
, as would be expected.
As everyone says, your program has undefined behaviour. In fact your program has more bugs than you thought it did, but after it's already undefined it doesn't get any further undefined.
Here's my guess about why there was no output. You didn't completely disable optimization. The compiler saw that the code in function() doesn't have any defined effect on the rest of the program. The compiler optimized out the call to function().
Odds are that the long string is, in fact, terminated by the zero byte in i. Assuming that the variables in main are laid out in the order they are declared -- which isn't required by anything in the language spec that I know of but seems likely in practice -- then large_string would be first in memory, followed by i. The loop sets i to 0 and counts up to 255. Whether i is stored big-endian or little-endian, either way it has a zero byte in it. So in traversing large_string, at either byte 256 or 257 you'll hit a null byte.
Beyond that, I'd have to study the generated code to figure out why this didn't blow. As you seem to indicate, I'd expect that the copy to buffer would overwrite the return address from the strcpy, so when it tried to return you'd be going into deep space some where and would quickly blow up on something.
But as others say, "undefined" means "unpredictable".
There may be anything in your 'char buffer[16]', including \0. strcpy copies till it finds first \0 - thus not going above your boundary of 16 characters.
精彩评论