开发者

Extract data from PostgreSQL DB without using pg_dump

开发者 https://www.devze.com 2022-12-29 09:10 出处:网络
There is a PostgreSQL database on which I only have limited access (e.g, I can\'t use pg_dump). I am trying to create a local \"mirror\" by exporting certain tables from the database. I do not have th

There is a PostgreSQL database on which I only have limited access (e.g, I can't use pg_dump). I am trying to create a local "mirror" by exporting certain tables from the database. I do not have the permissions needed to just dump a table as SQL from within psql. Right now, I just have a Python script that iterates through my table_names, selects all fields and then exports them as a CSV:

for table_name, file_name in zip(table_names, file_names):
    cmd = """echo "\\\copy (select * from %s)" to stdout WITH CSV HEADER | psql -d remote_db | gzip > ./%s/%s.gz"""%(table_n开发者_运维问答ame,dir_name,file_name)
    os.system(cmd)

I would like to not use CSV if possible, as I lose the field types and the encoding can get messed up. First best would probably be some way of getting the generating SQL code for the table using \copy. Next best would be XML, ideally with some way of preserving the field types. If that doesn't work, I think the final option might be two queries---one to get the field data types, the other to get the actual data.

Any thoughts or advice would be greatly appreciated - thanks!


It puzzles me the bit about "I do not have the permissions needed to just dump a table as SQL from within psql." pg_dump runs standalone, outside psql (both are clients) and if you have permission to connect to the database and select a table, I'd guess you'd also be able to dump it using pg_dump -t <table>. Am I missing something?


If you use psycopg2 you can use cursor.description to check column names, and use fetched data type to convert it to required string like data to acceptable format.

This code creates INSERT statements that you can use not only with PostgreSQL, but also with other databases (then you probably will have to change date format):

cursor.execute("SELECT * FROM %s" % (table_name))
column_names = []
columns_descr = cursor.description
for c in columns_descr:
    column_names.append(c[0])
insert_prefix = 'insert into %s (%s) values ' % (table_name, ', '.join(column_names))
rows = cursor.fetchall()
for row in rows:
    row_data = []
    for rd in row:
        if rd is None:
            row_data.append('NULL')
        elif isinstance(rd, datetime.datetime):
            row_data.append("'%s'" % (rd.strftime('%Y-%m-%d %H:%M:%S') ))
        else:
            row_data.append(repr(rd))
    print('%s (%s);' % (insert_prefix, ', '.join(row_data)))

In psycopg2 there is even support for COPY. Look at: COPY-related methods on their docs

If you prefer using metadata then you can use my recipe: Dump PostgreSQL db schema to text. It is based on Extracting META information from PostgreSQL by Lorenzo Alberton


You could use these queries (gotten by using "psql --echo-hidden" and "\d ") to get the base metadata:

-- GET OID
SET oid FROM pg_class WHERE relname = <YOUR_TABLE_NAME>

-- GET METADATA
SELECT a.attname,
  pg_catalog.format_type(a.atttypid, a.atttypmod),
  (SELECT substring(pg_catalog.pg_get_expr(d.adbin, d.adrelid) for 128)
   FROM pg_catalog.pg_attrdef d
   WHERE d.adrelid = a.attrelid AND d.adnum = a.attnum AND a.atthasdef),
   a.attnotnull, a.attnum
FROM pg_catalog.pg_attribute a
WHERE a.attrelid = <YOUR_TABLES_OID_FROM_PG_CLASS> AND a.attnum > 0 AND NOT a.attisdropped
ORDER BY a.attnum;

This gives you the name, data type, default, null flag and field order within the row. To get the actual data, your best bet is still CSV--the built in COPY table TO STDOUT WITH CSV HEADER is very robust. But if you are worried about encoding, be sure to get the value of server_encoding and client_encoding just before dumping the CSV data. That combined with the metadata from the above query should give enough information to properly interpret a CSV dump.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号