Experience and feedback on pg_restore --data-only - Mailing list pgsql-general

From Dimitrios Apostolou
Subject Experience and feedback on pg_restore --data-only
Date
Msg-id 53760c70-4a87-a453-9e02-57abc9cb2e54@gmx.net
Whole thread Raw
Responses Re: Experience and feedback on pg_restore --data-only
Re: Experience and feedback on pg_restore --data-only
Re: Experience and feedback on pg_restore --data-only
List pgsql-general
Rationale:

When restoring a backup in an emergency situation, it's fine to run
pg_restore as superuser and get an exact replica of the dumped db.
AFAICT pg_restore (without --data-only) is optimised for such case.

But pg_dump/restore can be used as a generic data-copying utility, and in
those cases it makes often sense to get rid of the churn and create a
clean database by running the SQL schema definition from version control,
and then copy the data for only the tables created.

For this case, I choose to run pg_restore --data-only, and run it as the
user who owns the database (dbowner), not as a superuser, in order to
avoid changes being introduced under the radar.

Things that made my life hard:

* plenty of permission denials for both ALTER OWNER or SET SESSION
   AUTHORIZATION (depending on command line switches).  Both of these
   require superuser privilege, but in my case this is not really needed.
   Dbowner has CREATEROLE and is the one who creates all the roles (WITH
   SET TRUE), and their private schemata in the specific database.  Things
   would work if pg_restore did "SET ROLE" instead of "SET SESSION
   AUTHORIZATION" to switch user. Is this a straightforward change or there
   are issues I don't see?

* After each failed attempt, I need to issue a TRUNCATE table1,table2,...
   before I try again.  I wrote my own function for that. It would help if
   pg_restore would optionally truncate before COPY.  I believe it would
   require superuser privilege for it, that could achieve using the
   --superuser=username option used today for disabling the triggers.

Performance issues: (important as my db size is >5TB)

* WAL writes: I didn't manage to avoid writing to the WAL, despite having
   setting wal_level=minimal. I even wrote my own function to ALTER all
   tables to UNLOGGED, but failed with "could not change table T to
   unlogged because it references logged table".  I'm out of ideas on this
   one.

* Indices: Could pg_restore have a switch to DROP indices before each
   COPY, and re-CREATE them after, exactly as they were?  This would speed
   up the process quite a bit.


Any feedback for improving my process? Should I put these ideas somewhere
as ideas for improvement on pg_restore?

Thank you in advance,
Dimitris




pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: After upgrading libpq, the same function(PQftype) call returns a different OID
Next
From: Guillaume Lelarge
Date:
Subject: Re: Export operation efficiency in read replica