I've got about 25 tables that I'd like to update with random data that's picked from a subset of data. I'd like the data to be picked at random but meaningful -- like changing all the first names in a database to new first names at random. So I don't want random garbage in the fields, I'd like to pull from a temp table that's populated ahead of time.
The only way I can think of to do this is with a loop and some dynamic sql.
- insert pick-from names into temp table with id field
- foreach table name in a list of
tables:
- build开发者_Python百科 a dynamic sql that updates all first name fields to be a name picked at random from the temp table based on rand() * max(id) from temp table
But anytime I think "loop" in SQL I figure I'm doing something wrong.
The database in question has a lot of denormalized tables in it, so that's why I think I'd need a loop (the first name fields are scattered across the database).
Is there a better way?
Red Gate have a product called SQL Data Generator that can generate fake names and other fake data for testing purposes. It's not free, but they have a trial so you can test it out, and it might be faster than trying to do it yourself.
(Disclaimer: I have never used this product, but I've been very happy with some of their other products.)
I wrote a stored procedure to do something like this a while back. It is not as good as the Red Gate product and only does names, but if you need something quick and dirty, you can download it from
http://www.joebooth-consulting.com/products/
The script name is GenRandNames.sql
Hope this helps
Breaking the 4th wall a bit by answering my own question.
I did try this as a sql script. What I learned is that SQL pretty much sucks at random. The script was slow and weird -- functions that referenced views that were only created for the script and couldn't be made in tempdb.
So I made a console app.
- Generate your random data, easy to do with the Random class (just remember to only use one instance of Random).
- Figure out what columns and table names that you'd like to update via a script that looks at information_schema.
- Get the IDs for all the tables that you're going to update, if possible (and wow will it be slow if you have a large table that doesn't have any good PKs).
- Update each table 100 rows at a time. Why 100? No idea. Could be 1000. I just picked a number. Dictionary is handy here: pick a random ID from the dict using the Random class.
Wash, rinse, repeat. I updated about 2.2 million rows in an hour this way. Maybe it could be faster, but it was doing many small updates so it didn't get in anyone's way.
精彩评论