开发者

Generating a dataset with few unique values

开发者 https://www.devze.com 2023-03-28 08:17 出处:网络
Note: This is part 2 of a 2 part question. Part 1 here I\'m wanting to more about sorting algorithms and what better way to do than then to code! So I figure I need some data to work with.

Note: This is part 2 of a 2 part question.

Part 1 here

I'm wanting to more about sorting algorithms and what better way to do than then to code! So I figure I need some data to work with.

My approach to creating some "standard" data will be as follows: create a set number of items, not sure how large to make it but I w开发者_如何学Goant to have fun and make my computer groan a little bit :D

Once I have that list, I'll push it into a text file and just read off that to run my algorithms against. I should have a total of 4 text files filled with the same data but just sorted differently to run my algorithms against (see below).

Correct me if I'm wrong but I believe I need 4 different types of scenarios to profile my algorithms.

  • Randomly sorted data (for this I'm going to use the knuth shuffle)
  • Reversed data (easy enough)
  • Nearly sorted (not sure how to implement this)
  • Few unique (once again not sure how to approach this)

This question is for generating a list with a few unique items of data.

Which approach is best to generate a dataset with a few unique items.


Answering my own question here. Don't know if this is the best but it works.

    public static int[] FewUnique(int uniqueCount, int returnSize)
    {
        Random r = _random;
        int[] values = new int[uniqueCount];
        for (int i = 0; i < uniqueCount; i++)
        {
            values[i] = i;
        }

        int[] array = new int[returnSize];
        for (int i = 0; i < returnSize; i++)
        {
            array[i] = values[r.Next(0, values.Count())];
        }

        return array;
    }


It might be worth having a look at NBuilder. It's a framework designed to generate objects for testing with and sounds like just what you need.

You could deal with the "few unique" items with some code like this:

var products = Builder<YourClass>.CreateListOfSize(1000)
   .WhereAll().AreConstructedWith("some fixed value")
   .WhereRandom(20).AreConstructedWith("some other fixed value")
   .Build();

There's plenty of other variations you can use as well to get the data like you want it. Have a look at some of the samples on the site for more ideas.


http://pages.cs.wisc.edu/~bart/fuzz/

Is all about fuzz testing which focuses on semi random data. It should be straight forward to adapt this approach to your problem


I guess your solution is ok. I would only modify it slighly:

public static int[] FewUnique(int uniqueCount, int low, int high, int returnSize)
{
    Random r = _random;
    int[] values = new int[uniqueCount];
    for (int i = 0; i < uniqueCount; i++)
    {
        values[i] = r.Next(low, high);
    }

    int[] array = new int[returnSize];
    for (int i = 0; i < returnSize; i++)
    {
        array[i] = values[r.Next(0, values.Count())];
    }

    return array;
}

For some algorithms this might make a difference.

0

精彩评论

暂无评论...
验证码 换一张
取 消