Thread: Re: Experience and feedback on pg_restore --data-only
On Thu, 20 Mar 2025, Dimitrios Apostolou wrote: > Rationale: > > When restoring a backup in an emergency situation, it's fine to run > pg_restore as superuser and get an exact replica of the dumped db. > AFAICT pg_restore (without --data-only) is optimised for such case. > > But pg_dump/restore can be used as a generic data-copying utility, and in > those cases it makes often sense to get rid of the churn and create a clean > database by running the SQL schema definition from version control, and then > copy the data for only the tables created. > > For this case, I choose to run pg_restore --data-only, and run it as the user > who owns the database (dbowner), not as a superuser, in order to avoid > changes being introduced under the radar. Another important reason I use --data-only: tablespaces. The new host has different storage configuration and tablespaces and the schema has been updated to reflect those. Therefore I must create the database using the updated schema definitions on the new host. > > Things that made my life hard: > > * plenty of permission denials for both ALTER OWNER or SET SESSION > AUTHORIZATION (depending on command line switches). Both of these > require superuser privilege, but in my case this is not really needed. > Dbowner has CREATEROLE and is the one who creates all the roles (WITH > SET TRUE), and their private schemata in the specific database. Things > would work if pg_restore did "SET ROLE" instead of "SET SESSION > AUTHORIZATION" to switch user. Is this a straightforward change or there > are issues I don't see? > > * After each failed attempt, I need to issue a TRUNCATE table1,table2,... > before I try again. I wrote my own function for that. It would help if > pg_restore would optionally truncate before COPY. I believe it would > require superuser privilege for it, that could achieve using the > --superuser=username option used today for disabling the triggers. > > Performance issues: (important as my db size is >5TB) > > * WAL writes: I didn't manage to avoid writing to the WAL, despite having > setting wal_level=minimal. I even wrote my own function to ALTER all > tables to UNLOGGED, but failed with "could not change table T to > unlogged because it references logged table". I'm out of ideas on this > one. > > * Indices: Could pg_restore have a switch to DROP indices before each > COPY, and re-CREATE them after, exactly as they were? This would speed > up the process quite a bit. > > > Any feedback for improving my process? Should I put these ideas somewhere as > ideas for improvement on pg_restore? > > Thank you in advance, > Dimitris > >
On Fri, Mar 21, 2025 at 2:36 PM Dimitrios Apostolou <jimis@gmx.net> wrote:
On Thu, 20 Mar 2025, Dimitrios Apostolou wrote:
> Rationale:
>
> When restoring a backup in an emergency situation, it's fine to run
> pg_restore as superuser and get an exact replica of the dumped db.
How often do you have emergencies requiring database restore???? In my seven years managing PG systems, I've had TWO.
> AFAICT pg_restore (without --data-only) is optimised for such case.
[snip]
> Any feedback for improving my process?
Yes: don't use a logical backup tool like pg_dump to backup production databases.
PgBackRest (the tool I have experience with; there are others, though) has mandatory features like PITR, incremental and differential backups, delta restores and encryption. Use that instead.
To get rid of the cruft in your database, go through the schema and manually drop unused tables.
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!