开发者

suggest a method for updating data in many tables with random data?

开发者 https://www.devze.com 2022-12-20 04:09 出处:网络
I\'ve got about 25 tables that I\'d like to update with random data that\'s picked from a subset of data. I\'d like the data to be picked at random but meaningful -- like changing all the first names

I've got about 25 tables that I'd like to update with random data that's picked from a subset of data. I'd like the data to be picked at random but meaningful -- like changing all the first names in a database to new first names at random. So I don't want random garbage in the fields, I'd like to pull from a temp table that's populated ahead of time.

The only way I can think of to do this is with a loop and some dynamic sql.

  1. insert pick-from names into temp table with id field
  2. foreach table name in a list of tables:
    1. build开发者_Python百科 a dynamic sql that updates all first name fields to be a name picked at random from the temp table based on rand() * max(id) from temp table

But anytime I think "loop" in SQL I figure I'm doing something wrong.

The database in question has a lot of denormalized tables in it, so that's why I think I'd need a loop (the first name fields are scattered across the database).

Is there a better way?


Red Gate have a product called SQL Data Generator that can generate fake names and other fake data for testing purposes. It's not free, but they have a trial so you can test it out, and it might be faster than trying to do it yourself.

(Disclaimer: I have never used this product, but I've been very happy with some of their other products.)


I wrote a stored procedure to do something like this a while back. It is not as good as the Red Gate product and only does names, but if you need something quick and dirty, you can download it from

http://www.joebooth-consulting.com/products/

The script name is GenRandNames.sql

Hope this helps


Breaking the 4th wall a bit by answering my own question.

I did try this as a sql script. What I learned is that SQL pretty much sucks at random. The script was slow and weird -- functions that referenced views that were only created for the script and couldn't be made in tempdb.

So I made a console app.

  1. Generate your random data, easy to do with the Random class (just remember to only use one instance of Random).
  2. Figure out what columns and table names that you'd like to update via a script that looks at information_schema.
  3. Get the IDs for all the tables that you're going to update, if possible (and wow will it be slow if you have a large table that doesn't have any good PKs).
  4. Update each table 100 rows at a time. Why 100? No idea. Could be 1000. I just picked a number. Dictionary is handy here: pick a random ID from the dict using the Random class.

Wash, rinse, repeat. I updated about 2.2 million rows in an hour this way. Maybe it could be faster, but it was doing many small updates so it didn't get in anyone's way.

0

精彩评论

暂无评论...
验证码 换一张
取 消