开发者

How to merge 2 string array in Delphi

开发者 https://www.devze.com 2023-03-09 11:02 出处:网络
I have 2 or more dynamic string array that fill 开发者_JAVA百科with some huge data , i want to merge this 2 array to one array , i know i can do it with a for loop like this :

I have 2 or more dynamic string array that fill 开发者_JAVA百科with some huge data , i want to merge this 2 array to one array , i know i can do it with a for loop like this :

var
  Arr1, Arr2, MergedArr: Array of string;
  I: Integer;
begin
  // Arr1:= 5000000 records
  // Arr2:= 5000000 records

  // Fill MergedArr by Arr1
  MergedArr:= Arr1;

  // Set length of MergedArr to length of ( Arra1 + Arr2 )+ 2
  SetLength(MergedArr, High(Arr1)+ High(Arr2)+2);

  // Add Arr2 to MergedArr
  for I := Low(Arr2)+1 to High(Arr2)+1 do
    MergedArr[High(Arr1)+ i]:= Arr2[i-1];
end;

but it is slow on huge data , is there faster way like copy array memory data ?


First of all string is special, so it should be treated specially: Don't try outsmarting the compiler, keep your code unchanged. String is special because it's reference counted. Every time you copy a string from one place to an other it's reference count is incremented. When the reference count reaches 0, the string is destroyed. Your code plays nice because it lets the compiler know what you're doing, and in turn the compiler gets the chance to properly increment all reference counts.

Sure, you can play all sorts of tricks as suggested in the comments to gabr's answer, like filling the old arrays with zero's so the reference count in the new array remains valid, but you can't do that if you actually need the old arrays as well. And this is a bit of a hack (albeit one that will probably be valid for the foreseeable future). (and to be noted, I actually like this hack).

Anyway, and this is the important part of my answer, your code is most likely not slow in the copying of the strings from one array to the other, it's most likely slowly somewhere else. Here's a short console application that creates two arrays, each with 5M random strings, then merges the two arrays into a third and displays the time it took to create the merge. Merging only takes about 300 milliseconds on my machine. Filling the array takes much longer, but I'm not timing that:

program Project26;

{$APPTYPE CONSOLE}

uses SysUtils, Windows;

var a, b, c: array of string;
    i: Integer;

    Freq: Int64;
    Start, Stop: Int64;
    Ticks: Cardinal;

const count = 5000000;

begin
  SetLength(a,count);
  SetLength(b,count);
  for i:=0 to count-1 do
  begin
    a[i] := IntToStr(Random(1));
    b[i] := IntToStr(Random(1));
  end;

  WriteLn('Moving');

  QueryPerformanceFrequency(Freq);
  QueryPerformanceCounter(Start);

  SetLength(c, Length(a) + Length(b));
  for i:=0 to High(a) do
    c[i] := a[i];
  for i:=0 to High(b) do
    c[i+Length(a)] := b[i];

  QueryPerformanceCounter(Stop);
  WriteLn((Stop - Start) div (Freq div 1000), ' milliseconds');
  ReadLn;

end.


You can use built-in Move function which moves a block of memory to another location. Parameters are source and target memory blocks and size of data to be moved.

Because you are copying strings, source arrays must be destroyed after the merging by filling them with zeroes. Otherwise refcounts for strings will be all wrong causing havoc and destruction later in the program.

var
  Arr1, Arr2, MergedArr: Array of string;
  I: Integer;
begin
  SetLength(Arr1, 5000000);
  for I := Low(Arr1) to High(Arr1) do
    Arr1[I] := IntToStr(I);

  SetLength(Arr2, 5000000);
  for I := Low(Arr2) to High(Arr2) do
    Arr2[I] := IntToStr(I);

  // Set length of MergedArr to length of ( Arra1 + Arr2 )+ 2
  SetLength(MergedArr, High(Arr1)+ High(Arr2)+2);

  // Add Arr1 to MergedArr
  Move(Arr1[Low(Arr1)], MergedArr[Low(MergedArr)], Length(Arr1)*SizeOf(Arr1[0]));

  // Add Arr2 to MergedArr
  Move(Arr2[Low(Arr2)], MergedArr[High(Arr1)+1], Length(Arr2)*SizeOf(Arr2[0]));

  // Cleanup Arr1 and Arr2 without touching string refcount.
  FillChar(Arr1[Low(Arr1)], Length(Arr1)*SizeOf(Arr1[0]), 0);
  FillChar(Arr2[Low(Arr2)], Length(Arr2)*SizeOf(Arr2[0]), 0);

  // Test
  for I := Low(Arr1) to High(Arr1) do begin
    Assert(MergedArr[I] = IntToStr(I));
    Assert(MergedArr[I] = MergedArr[Length(Arr1) + I]);
  end;

  // Clear the array to see if something is wrong with refcounts
  for I := Low(MergedArr) to High(MergedArr) do
    MergedArr[I] := '';
end;


An excellent maxim is that the fastest code is that which never runs. Since copying is expensive you should look to avoid the cost of copying.

You can do this with a virtual array. Create a class which holds an array of array of string. In your example the outer array would hold two string arrays.

  • Add a Count property that returns the total number of strings in all of the arrays.
  • Add a default indexed property that operates by working out which of the outer arrays the index refers to and then returns the appropriate value from the inner array.
  • For extra points implement an enumerator to make for in work.
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号