I've got a List
of Objects, this list could potentially contain thousands of elements.
I want to 开发者_StackOverflowobtain 10, 20, 34, 56 (whatever size portion the user chooses) subset of those, this subset must be randomly selected and I can't have duplicates.
Would Collections.shuffle()
be sufficient for large Lists of POJOs? Or is there a more efficient/safer way of doing this?
Taking my example here, if myStrings
had 50,000 Strings in it, would calling Collections.shuffle()
be a good idea if you only wanted 5 items?
public class ShuffleMe
{
public static void main(String[] args)
{
int NUM_OF_ELEMENTS_TO_PICK = 3;
List<String> myStrings = new ArrayList<String>();
myStrings.add("A");
myStrings.add("B");
myStrings.add("C");
myStrings.add("D");
myStrings.add("E");
myStrings.add("F");
Collections.shuffle(myStrings);
for (int i = 0; i < NUM_OF_ELEMENTS_TO_PICK; i++)
{
System.out.println(myStrings.get(i));
}
}
}
Shuffling the whole list would be a bit of a waste of resources if what you want is significantly smaller. I'd personally just pick n unique random numbers between 0..size and use the objects at those indexes for the randomised subset.
If you're talking about picking a random subset very close to the size of the whole collection, then chances are you're better of just calling Collections.shuffle()
and choosing the first n
entries. But if we're talking about ~5 / 50,000, definitely use the above approach.
If the number of items you want is much less than the size of the collection, just draw them at random:
Set<Integer> randSubSet = new HashSet<Integer>();
while(randSubSet.size() < NUM_OF_ELEMENTS_TO_PICK) {
randSubSet.add((int)(Math.random()*myStrings.size()));
}
for (int i : randSubSet) {
System.out.println(myStrings.get(i));
}
Use a Fisher-Yates shuffle, but only run it far enough to pick the number of elements you need.
精彩评论