I'm working with an incompletely documented DBMS system, and I am looking for a general purpose software tool that will examine the values in columns and return a description of what's in there.
I guess I'm looking for a cross between DESCRIBE
, SELECT DISTINCT col
, SELECT MIN(COL), MAX(COL)
, and other summary stat开发者_运维百科istics.
Ideally I'd like it to be able to do such things as detect a text column and announce things like "This column is UTF-8 text, 5% NULL, 15% one word, 30% two words, 35% three words, and the rest something else.
Or "This column is a datestamp. Values lie in the range 2001-02-01 : 2024-01-01. with no NULLs"
Does this tool exist anywhere? Any suggestions? Thank you.
It sounds like you're looking for a Data Profiling tool.
There's an open source product called Talend Open Profilier which can be used to profile data. There are also several commercial products available - Informatica, Microsoft.
Db Visualizer http://www.dbvis.com/
You can get a community edition for free.
精彩评论