Is it safe to modify the content of a string variable through a pointer?_问答_开发者

Consider I have a procedure with Str parameter passed by reference, and I want to modify content of the given variable through the procedure, e.g.

procedure Replace(var Str: string);
var
  PStr: PChar;
  i: Integer;
begin
  PStr := @Str[1];
  for i := 1 to Length(Str) do begin
    PStr^ := 'x';
    Inc(PStr);
  end;
end;

Is it an acceptable pointer usage? I'm not sure开发者_开发百科 whether it has a memory leak.

What really happen in PStr := @Str[1], does compiler make a copy of Str internally, or what?

Is this kind of code optimization worth?

Is it an acceptable pointer usage?

You need to make sure that you don't call

PStr := @Str[1];

for an empty string, as that would crash. The easiest way to do that is to replace that line with

PStr := PChar(Str);

so that the compiler will make sure that either a pointer to the first char of the string, or a pointer to #0 is returned. As Ken correctly pointed out in a comment there is no call to UniqueString() in this case, so you would need to do it yourself.

I'm not sure whether it has a memory leak.

No, there is no memory leak. Obtaining a pointer to a string character will call UniqueString() internally, but that will happen for write access to a string character too, so there's nothing special about the character pointer.

What really happen in PStr := @Str[1], does compiler make a copy of Str internally, or what?

No, it just makes sure that the string is unique (so that write access through the pointer does not change the contents of any other string that shares the same data). Afterwards it returns a pointer to that character in the string, which you can then treat as any other PChar variable, pass it to API functions, increment it and so on.

Is this kind of code optimization worth?

It is not only worth it, it is necessary to really achieve good performance for large strings. The reason for this is that the compiler is not smart enough to only call UniqueString() once, but it will insert calls to it for each write access to a character in the string. So if you process a large string character by character you will have a big overhead from all these calls.

Yes, it's safe, as long as you don't go beyond the bounds of the string. The string has metadata attached that tells how long it is, and if you write beyond the length of the string, you won't leak memory, but you could corrupt it.

If Str is passed by reference, why would you need another pointer to the string? Apart from that, there should be no memory leak: PStr is initialized with the adress of the first element of the string and then incremented, so it will always point to one of the characters in your string.

The compile does not make a copy of Str internally. One of the uses for pointers is to avoid making copies. When you say

PStr := @Str[1]

is that PStr will now store the adress of Str[1], that is, the adress of the first char in the string.

I am sure this will work for AnsiString and PAnsiChar, but will it still work for unicode strings in Delphi 2009 and above? I think it should, because both, a char of a string (str[i]) and the char pointed to by PChar, should be 2 bytes in size.

Could somebody with more experience with unicode strings please confirm this?

As in D2010, looks like codegen employs copy-on-write on such construct

Unit9.pas.34: S := 'abcd';
004B32EF 8D45F4           lea eax,[ebp-$0c]
004B32F2 BA98334B00       mov edx,$004b3398
004B32F7 E89C35F5FF       call @UStrLAsg
Unit9.pas.35: P := @S[1];
004B32FC 8D45F4           lea eax,[ebp-$0c]
004B32FF E8343FF5FF       call @UniqueStringU    ; <== here you are
004B3304 8945F0           mov [ebp-$10],eax
Unit9.pas.36: Exit;
004B3307 EB61             jmp $004b336a

by the way, generic referencing P := @S does not emit UniqueString.

As conclusion, i do not recommend to count on codegen's internals and use recommended PChar(S) construct (emits one xStrToPxChar call as overhead)