I am working on a survey application. I have a sql server database holding upto 3000 survey questions. I call database for a set of 10 questions. I have to make sure that none of the of the questions are repeated for the longest time possible. What is the best possible approach I can take here? Write a custom randomization algorithm or SQL server provides some 开发者_JAVA技巧functionality.
Could you simply keep track of the last question id retrieved. Then always retrieve the next question with a higher id than the previous. If nothing is returned, get the first/lowest.
It is not going to scale well, but then it doesn't sound like you need to scale.
SELECT question_id
FROM questions
ORDER BY NEWID()
gives you all the question ids in random order. store the list, and work your way down.
There isn't a way to ensure the longest time between repeats without storing the order. You could make it another column on the same table...
EDIT: a variation on Mr Jacques' idea:
how about a column named random? Shuffle the order like so:
update questions set random = (
SELECT question_id
FROM questions
ORDER BY NEWID()
)
then get each new question like this:
select * where random = 'prev_question_id';
Then you could shuffle the order at intervals, every night maybe.
In reality, getting a random set is only half the problem. What you really want is a new random set that contains entries that haven't been selected before.
Here is an article by Jeff Atwood that talks about "shuffling" that may help you out: Shuffling and a follow up article The Danger of Naïveté.
You could choose questions based on a combination of a random generator and a frequency counter on the question itself.
If you always needed 10 questions, start with a set of questions that has the lowest use frequency, then apply a random number generator to pick from that set. Increment the frequency of those questions. These most recently asked questions have a higher frequency number than the rest of the pool, so these are ignored in the next pass.
The process continues until you have all of the questions stepped to the next frequency, and the whole pool becomes available again.
This kind of sounds like homework.. If the emphasis really is on making sure the question is never repeated until absolutely necessary, then I would suggest adding a column or another table with a TimesUsed field.
At first you would have all 3000 records with a TimesUsed of 0. Grab the top 10
SELECT TOP 10 QUESTIONS WHERE TimesUsed = (SELECT MIN(TimesUsed) FROM Questions)
and UPDATE the table setting these question's TimesUsed to TimesUsed +1.
This is a simple approach, and you'd need to account for if you didn't have an amount of questions evenly divisble by 10, but I'll leave that to you :)
Of course none of this really matters if you aren't deadset on never repeating one until they've all been used.
精彩评论