开发者

Assembly language : try to understand a small function

开发者 https://www.devze.com 2022-12-21 04:50 出处:网络
for my work, I need to reverse what this portion of code (ARM9) is doing. Im a java developper & I really don\'t understand this portion of code related to a single function.

for my work, I need to reverse what this portion of code (ARM9) is doing. Im a java developper & I really don't understand this portion of code related to a single function.

Of course I'm asking help because the original code is not more available. Anyone can help me to know what this code is doing with a smal algorithm in any high language? It would be nice. I have tried for many hours without results.

sub_FFFF7B38
    PUSH    {LR}
    ADDS    R2, R0, #0
    LDRB    R3, [R2]
    CMP     R3, #0
    BEQ     loc_FFFF7B52
    SUBS    R1, #1
    BCC     loc_FFFF7B52

loc_FFFF7B46:
    ADDS    R0, #1
    LDRB    R3, [R0]
    CMP   开发者_JAVA技巧  R3, #0
    BEQ     loc_FFFF7B52
    SUBS    R1, #1
    BCS     loc_FFFF7B46

loc_FFFF7B52:
    SUBS    R0, R0, R2
    POP     {R1}


Except for the last two lines, it could be something like the following.
Please don't hit me if I am not 100% correct.

If
R0 is p0 or p and
R1 is n and
R2 is temporary value (edited; first I thought: i or address of p0[i])
R3 is temporary value

.

sub_FFFF7B38
          PUSH {LR}           ; save return address
          ADDS R2, R0, #0     ; move R0 to R2
          LDRB R3, [R2]       ; load *p0
          CMP R3, #0          ; if *p0==0 
          BEQ loc_FFFF7B52    ; then jump to loc_FFFF7B52 
          SUBS R1, #1         ; decrement n
          BCC loc_FFFF7B52    ; if there was a borrow (i.e. n was 0): jump to loc_FFFF7B52


loc_FFFF7B46:
          ADDS R0, #1         ; increment p
          LDRB R3, [R0]       ; load *p
          CMP R3, #0          ; if *p==0
          BEQ loc_FFFF7B52    ; jump to loc_FFFF7B52
          SUBS R1, #1         ; decrement n
          BCS loc_FFFF7B46    ; if there was no borrow (i.e. n was not 0): jump to loc_FFFF7B46


loc_FFFF7B52:
          SUBS R0, R0, R2     ; calculate p - p0
          POP {R1}            ; ??? I don't understand the purpose of this
                              ; isn't there missing something?

or in C:

int f(char *p0, unsigned int n)
{
  char *p;

  if (*p0==0 || n--==0)
    return 0;

  for(p=p0; *++p && n>0; n--)
  {
  }
  return p - p0;
}


Here are the instructions commented line by line

sub_FFFF7B38
    PUSH    {LR}          ; save LR (link register) on the stack
    ADDS    R2, R0, #0    ; R2 = R0 + 0 and set flags (could just have been MOV?)
    LDRB    R3, [R2]      ; Load R3 with a single byte from the address at R2
    CMP     R3, #0        ; Compare R3 against 0...
    BEQ     loc_FFFF7B52  ; ...branch to end if equal
    SUBS    R1, #1        ; R1 = R1 - 1 and set flags
    BCC     loc_FFFF7B52  ; branch to end if carry was clear which for subtraction is
                          ; if the result is not positive

loc_FFFF7B46:
    ADDS    R0, #1        ; R0 = R0 + 1 and set flags
    LDRB    R3, [R0]      ; Load R3 with byte from address at R0
    CMP     R3, #0        ; Compare R3 against 0...
    BEQ     loc_FFFF7B52  ; ...branch to end if equal
    SUBS    R1, #1        ; R1 = R1 - 1 and set flags
    BCS     loc_FFFF7B46  ; loop if carry set  which for subtraction is
                          ; if the result is positive

loc_FFFF7B52:
    SUBS    R0, R0, R2    ; R0 = R0 - R2
    POP     {R1}          ; Load what the previously saved value of LR into R1
                          ; Presumably the missing next line is MOV PC, R1 to
                          ; return from the function.

So in very basic C code:

void unknown(const char* r0, int r1)
{
    const char* r2 = r0;
    char r3 = *r2;
    if (r3 == '\0')
        goto end;
    if (--r1 <= 0)
        goto end;

loop:
    r3 = *++r0;
    if (r3 == '\0')
        goto end;
    if (--r1 > 0)
        goto loop;

end:
    return r0 - r2;
}

Adding some control structures to get rid of the gotos:

void unknown(const char* r0, int r1)
{
    const char* r2 = r0;
    char r3 = *r2;

    if (r3 != '\0')
    {
        if (--r1 >= 0)
        do
        {
             if (*++r0 == '\0')
                 break;
        } while (--r1 >= 0);
    }

    return r0 - r2;
}

Edit: Now that my confusion about the carry bit and SUBS has been cleared up this makes more sense.

Simplifying:

void unknown(const char* r0, int r1)
{
    const char* r2 = r0;

    while (*r0 != '\0' && --r1 >= 0)
        r0++;

    return r0 - r2;
}

In words, this is find the index of the first NUL in the first r1 chars of the string pointer to by r0, or return r1 if none.


Filip has provided some pointers, you also need to read up on the ARM calling convention. (That is to say, which register(s) contain the function arguments on entry and which its return value.)

From a quick reading I think this code is strnlen or something closely related to it.


How about this: Instruction set for ARM

Some hints / simplicifed asm

  • Push - Puts something on the "Stack" / Memory
  • Add - Usualy "add" as in +
  • Pop retreives something from the "stack" / Memory
  • CMP - is Short of Compare, which compares something with something else.

X: or: Whatever: means that the following is a "subroutine". Ever used "goto" in Java? Similar to that actually.

If you have the following ( ignore if it is correct arm-asm it's just pseduo ):

PUSH 1
x:     
    POP %eax

First it would put 1 on the stack and then pop it back into eax ( which is short for extended ax, which is a register where you can put 32-bit amount of data )

Now, what does the x: do then? Well let's assume that there are 100 lines of asm before that aswell, then you could use a "jump"-instruction to navigate to x:.

That's a little bit of introduction to asm. Simplified.

Try to understand the above code and examine the instruction-set.


My ASM is a bit rusty, so no rotten tomatoes please. Assuming this starts at sub_FFFF7B38:

The command PUSH {LR} preserves the link register, which is a special register which holds the return address during a subroutine call.

ADDS sets the flags (like CMN would). Also ADDS R2, R0, #0 adds R0 to 0 and stores in R2. (Correction from Charles in comments)

LDRB R3, [R2] is loading the contents of R2 into main memory instead of a register, referenced by R3. LDRB only loads a single byte. The three unused bytes in the word are zeroed upon loading. Basically, getting R2 out of the registers and in safe keeping (maybe).

CMP R3, #0 performs a subtraction between the two operands and sets the register flags, but does not store a result. Those flags lead to...

BEQ loc_FFFF7B521, which means "If the previous comparison was equal, go to loc_FFFF7B521" or if(R3 == 0) {goto loc_FFFF7B521;}

So if R3 isn't zero, then the SUBS R1, #1 command subtracts one from R1 and sets a flag.

BCC loc_FFFF7B52 will cause execution to jump to loc_FFFF7B52 if the carry flag is set.

( snip )

Finally, POP {LR} restores the previous return address that was held on the link register before this code executed.

Edit - While I was in the car, Curd spelled out just about what I was thinking when I was trying to write out my answer and ran out of time.

0

精彩评论

暂无评论...
验证码 换一张
取 消