This question is slightly similar to this one, but more specific. I would like to test an ETL process by getting a visualization of the differences in two dump files. The dump files contain the entire database. The differences are not going to be on the schema as such comparisons are easy to make manually, but rather slight differences in the data.
Are there any tools for doing this? The visualization I imagine could be something like:
Column1 has 0.02% difference in 10 rows.
It should of course al开发者_StackOverflowso be possible to verbose to see the actual differences in each row.
Does such a tool exist.
Text utilities are usually your best bet.
But if I were testing an ETL process, I wouldn't want to test the entire dump at once. (In my case, that would be millions of lines.) I'd rather automate dumping each table into a separate file. Then it's easy to tell whether two versions of the data from a table are identical.
cmp table.old table.new
cmp
produces no output if the files are identical. diff
will tell you where the differences are.
diff table.old table.new
I use Cygwin when I have to do this stuff under Windows.
精彩评论