开发者

how was Array.Sort implemented in .NET?

开发者 https://www.devze.com 2023-01-22 09:56 出处:网络
I am using structures in my programming and I sort the structure according to a value in the structure using ICom开发者_开发技巧parer.

I am using structures in my programming and I sort the structure according to a value in the structure using ICom开发者_开发技巧parer.

How did Microsoft implement the Array.Sort() method? Is there any documentation (references) for this? Is it the same for all types of Sort() in Visual Basic?

This is a simple example for what I want.

Dim MyArray(6) As Integer
    MyArray(0) = 1
    MyArray(1) = 45
    MyArray(2) = 45
   ' Some Code.....
    '.........
    '..........
    MyArray(3) = 1
    MyArray(4) = 10
    ' Some Code.....
    '.........
    '..........
    MyArray(5) = 1
    MyArray(6) = 57

    Array.Sort(MyArray)

Array.Sort() will sort this array as: (1 1 1 10 45 45 57)

How does number 1 get sorted? Is it bringing to the end the first one or keeps the old one in the same index?

In my original example (before sorting), MyArray(0) = 1 and after sorting MyArray(0) = 1.

Is this the same original 1 or this another 1 (the newest one added to the array) moved to that position?

In case the MyArray(0) = 1 after sorting should be MyArray(5) = 1 before sorting.


It uses the Quicksort algorithm, which is not stable when implemented efficiently (in place). Meaning that it doesn't guarantee that values which are equal retain their prior relative position after sorting.

For example, if you have a bunch of points:

Point[] points = new Point[]
{
   new Point(0, 1),
   new Point(0, 2),
   new Point(0, 3),
   new Point(1, 1),
   new Point(1, 2),
   new Point(1, 3)
};

And you sort these points by x-coordinate only, using this comparer:

private int CompareByX(Point a, Point b)
{
    return a.X - b.X;
}

It will only guarantee that the points are sorted by their x-coordinate, meaning you could easily end up with a mixed up order (when looking at the y-coordinate):

Point(0, 3)
Point(0, 2)
Point(0, 1)
Point(1, 3)
Point(1, 2)
Point(1, 1)

[Edit]

This doesn't mean that the sorting algorithm is non-deterministic (random). For same input data, you will get same output data on each run. You can also predict the actual way it will be reorganized if you examine the algorithm precisely, but it is unnecessary. It is sufficient just to know that this happens when using the sort routine.

Here is a working example for your problem, try changing the test data sizes (first line in Main) and watch how the array gets reorganized on each run:

class Program
{
    static void Main()
    {
        Point[] points = CreateTestData(1, 4).ToArray();
        DisplayItems("Before", points);
        Array.Sort(points, CompareByX);
        DisplayItems("After", points);
        Console.ReadLine();
    }

    private class Point
    {
        public int X { get; private set; }
        public int Y { get; private set; }
        public override string ToString()
        { return string.Format("({0},{1})", X, Y); }
        public Point(int x, int y)
        { X = x; Y = y; }
    }

    private static int CompareByX(Point a, Point b)
    { return a.X - b.X; }

    private static IEnumerable<Point> CreateTestData(int maxX, int maxY)
    {
        for (int x = 0; x <= 1; x++)
            for (int y = 0; y <= 4; y++)
                yield return new Point(x, y);
    }

    private static void DisplayItems(string msg, Point[] points)
    {
        Console.WriteLine(msg);
        foreach (Point p in points)
            Console.WriteLine(p.ToString());
        Console.WriteLine();
    }
}

Of course, if you extend the comparer delegate to include the Y coordinate, you will not have this problem:

    private static int CompareByX(Point a, Point b)
    {
         if (a.X == b.X) 
            return a.Y - b.Y;
         else
            return a.X - b.X;
    }


Array.Sort is an unstable sort, so the order of elements which are the same is undefined and not conserved. The article on Array.Sort in MSDN states:

This method uses the QuickSort algorithm. This implementation performs an unstable sort; that is, if two elements are equal, their order might not be preserved. In contrast, a stable sort preserves the order of elements that are equal.

LINQ's OrderBy methods on the other hand are stable. The article on OrderBy in the MSDN states:

This method performs a stable sort; that is, if the keys of two elements are equal, the order of the elements is preserved. In contrast, an unstable sort does not preserve the order of elements that have the same key.


Use .Net Reflector and see for yourself... From the method names it looks like they are using the QuickSort algorithm: System.Array+SorterObjectArray.QuickSort


Array.Sort(), like most built-in sorters, uses a QuickSort implementation in a helper class behind the scenes. The sort is relatively efficient, and customizable using the IComparable and IComparer interfaces, but it's unstable; the three 1s in your example may end up in a different relative order than they were before the sort. You can see this if you use a more complex structure:

struct TestStruct
{
   int a;
   int b;
}

...

//As declared, this array is already sorted by both "a" and "b" properties
var myStructAray = new [] {new TestStruct{a=1,b=1}, new TestStruct{a=1,b=2}, new TestStruct{a=1,b=3});

//QuickSorts myStructArray based on the comparison of the lambda for each element
var newArray = Array.Sort(myStructArray, x=>x.a); 

//newArray may have a different order as myStructArray at this time
for(var i=0;i<myStructArray.Count();i++)
{
   //NUnit assertion; will almost surely fail given a sufficient array length
   Assert.AreEqual(myStructArray[i].b, newArray[i].b);
}


First of all, let's address several issues in your current plan with regards to best practices for .Net (VB or C#):

  1. Prefer Class over Structure unless you have a good reason to do otherwise
  2. Avoid using Arrays
  3. You can build that array as a one-liner: Dim MyArray() As Integer = {1, 45, 45, 1, 10, 1, 57}

As to your question of whether it's the "same" value 1, the answer is that it depends on how you look at it. For the general case, the answer is whether or not the sorting algorithm is considered stable. .Net's sorting algorithm in not stable.

For this specific case, you're asking the wrong question. 1 is 1 is 1. There is no distinction between them. If you feel like it matters, I challenge you to provide code to detect a difference between any two of the "1s" from that list in your original code (aside from array index).


Other answers are based on old documentation, so here is an updated answer. According to the latest documentation (emphasis mine):

The .NET Framework 4 and earlier versions used only the Quicksort algorithm. Now, Array.Sort uses the introspective sort (introsort) algorithm as follows:

  • If the partition size is fewer than 16 elements, it uses an insertion sort algorithm.

  • If the number of partitions exceeds 2 * Log N, where N is the range of the input array, it uses a Heapsort algorithm.

  • Otherwise, it uses a Quicksort algorithm.

It is still an unstable sort.

0

精彩评论

暂无评论...
验证码 换一张
取 消