Finding a string in a compiled executable_问答_开发者

Finding a string in a compiled executable

开发者 https://www.devze.com 2023-02-21 22:32 出处：网络

I have a very simple program as below: #include int main(){ char* mystring = \"ABCDEFGHIJKLMNO\"; puts(mystring);

相关专题：c string

I have a very simple program as below: #include

int main(){
char* mystring = "ABCDEFGHIJKLMNO";
puts(mystring);

char otherstring[15];
otherstring[0]  = 'a';
otherstring[1]  = 'b';
otherstring[2]  = 'c';
otherstring[3]  = 'd';
otherstring[4]  = 'e';
otherstring[5]  = 'f';
otherstring[6]  = 'g';
otherstring[7]  = 'h';
otherstring[8]  = 'i';
otherstring[9]  = 'j';
otherstring[10] = 'k';
otherstring[11] = 'l';
otherstring[12] = 'm';
otherstring[13] = 'n';
otherstring[14] = 'o';
puts(otherstring);

return 0;
}

Compiler was MS VC++.

Whether I build this program with or without optimisations I can find the string "ABCDEFGHIJKLMNO" in the 开发者_如何学Cexecutable using a hex editor.

However, I cannot find the string "abcdefghijklmno"

What is the compiler doing that is different for otherstring?

The hex editor I used was Hexedit - but tried others and still couldn't find otherstring. Anyone any ideas why not or how to find?

By the way I am not doing this for hacking reasons.

This is what my gcc did with this code. I assume your compiler does a similar thing. The string constant is stored in the read only section and mystring is initialized with it's address.
The individual chars are placed directly into their array location on the stack. Also note that otherstring is not NULL terminated when you're calling puts with it.

           .file   "test.c"
            .section        .rodata
    .LC0:
            .string "ABCDEFGHIJKLMNO"
            .text
    .globl main
            .type   main, @function
    main:
    .LFB0:
            .cfi_startproc
            pushq   %rbp
                .cfi_def_cfa_offset 16
            movq    %rsp, %rbp
            .cfi_offset 6, -16
            .cfi_def_cfa_register 6
            subq    $48, %rsp
            movq    %fs:40, %rax
            movq    %rax, -8(%rbp)
            xorl    %eax, %eax
    /* here is where mystring is loaded with the address of "ABCDEFGHIJKLMNO" */
            movq    $.LC0, -40(%rbp)
    /* this is the call to puts */
                movq    -40(%rbp), %rax
            movq    %rax, %rdi
            call    puts
    /* here is where the bytes are loaded into otherstring on the stack */            
            movb    $97, -32(%rbp)  //'a'
            movb    $98, -31(%rbp)  //'b'
            movb    $99, -30(%rbp)  //'c'
            movb    $100, -29(%rbp) //'d'
            movb    $101, -28(%rbp) //'e'
            movb    $102, -27(%rbp) //'f'
            movb    $103, -26(%rbp) //'g'
            movb    $104, -25(%rbp) //'h'
            movb    $105, -24(%rbp) //'i'
            movb    $106, -23(%rbp) //'j'
            movb    $107, -22(%rbp) //'k'
            movb    $108, -21(%rbp) //'l'
            movb    $109, -20(%rbp) //'m'
            movb    $110, -19(%rbp) //'n'
            movb    $111, -18(%rbp) //'o'

The compiler is likely placing the number for each character into each array position, just as you wrote it, without any optimization that would be found from reading the code. Remember that a single character is no different than a number in c, so you could even use the ascii codes instead of 'a'. From a hexeditor I would expect you would see those converted back to letters, just spaced out a bit.

In the first case the compiler initializes data with the exact string "ABC...".

In the second case, each assignment is done sequentially, therefore compiler generates code to perform this assignment. In the executable you should see 15 repeating byte sequences where only the initializer ('a', 'b', 'c'...) changes.