Thread: The trap when using pg_dumpall
Years ago, I started backing up my table data in individual groups (typically SCHEMAs). More recently, I have also been backing up the entire cluster using pg_dumpall.
Today I thought:
That sounded good until I did a bit of research. Despite this site https://www.postgresqltutorial.com/postgresql-restore-database/ saying that you could use pg_restore with pg_dumpall, the authoritative documentation says that you can't.
So, assuming that's true, what is the best way to accomplish what I want? I see no effective way of filtering out individual SCHEMAs when restoring from the output of a pg_dumpall. That means:
Unfortunately, as far as I know, pg_dump doesn't dump roles & possibly other data that I need. I presently have script files that I keep updated (hopefully) in parallel, for the compute manual recreation of the database, but there's nothing like backing up up everything.
So, here's my ultimate question(s):
I'm thinking the following will work, but an authoritative answer would be nice:
pg_dumpall -rs (I don't use tablespaces)
Today I thought:
Why not stop the dumps of individual groups? Why not use continue to dump with pg_dumpall, & then use pg_restore to restore all or just portions of the database, as needed?
That sounded good until I did a bit of research. Despite this site https://www.postgresqltutorial.com/postgresql-restore-database/ saying that you could use pg_restore with pg_dumpall, the authoritative documentation says that you can't.
So, assuming that's true, what is the best way to accomplish what I want? I see no effective way of filtering out individual SCHEMAs when restoring from the output of a pg_dumpall. That means:
- Using pg_restore, which has the capability of filtering the restore.
- Using pg_dump with archive output.
Unfortunately, as far as I know, pg_dump doesn't dump roles & possibly other data that I need. I presently have script files that I keep updated (hopefully) in parallel, for the compute manual recreation of the database, but there's nothing like backing up up everything.
So, here's my ultimate question(s):
- Can I do a pg_dumpall to backup just the parts that pg_dump omits, & then be able to do a complete restore by restoring the non-data with psql, & then restoring the data with pg_restore?
- If so, what are the appropriate options to pg_dumpall?
I'm thinking the following will work, but an authoritative answer would be nice:
pg_dumpall -rs (I don't use tablespaces)
On 2021-07-03 13:18, Dean Gibson (DB Administrator) wrote:
Years ago, I started backing up my table data in individual groups (typically SCHEMAs). More recently, I have also been backing up the entire cluster using pg_dumpall.
Today I thought:Why not stop the dumps of individual groups? Why not use continue to dump with pg_dumpall, & then use pg_restore to restore all or just portions of the database, as needed?
That sounded good until I did a bit of research. Despite this site https://www.postgresqltutorial.com/postgresql-restore-database/ saying that you could use pg_restore with pg_dumpall, the authoritative documentation says that you can't.
So, assuming that's true, what is the best way to accomplish what I want? I see no effective way of filtering out individual SCHEMAs when restoring from the output of a pg_dumpall. That means:
- Using pg_restore, which has the capability of filtering the restore.
- Using pg_dump with archive output.
Unfortunately, as far as I know, pg_dump doesn't dump roles & possibly other data that I need. I presently have script files that I keep updated (hopefully) in parallel, for the compute manual recreation of the database, but there's nothing like backing up up everything.
So, here's my ultimate question(s):
- Can I do a pg_dumpall to backup just the parts that pg_dump omits, & then be able to do a complete restore by restoring the non-data with psql, & then restoring the data with pg_restore?
- If so, what are the appropriate options to pg_dumpall?
I'm thinking the following will work, but an authoritative answer would be nice:
pg_dumpall -rs (I don't use tablespaces)
Turns out "-rs" is a mistake. "-r" & "-g" override "-s", & "-s" contains everything that "-g" does, & more.
I NEVER use pg_dumpall for my databases. No way I want to dump everything in text format. For big databases that is excruciatingly slow.
I use pg_dumpall -g to get the globals and then use the -Fd directive of pg_dump/pg_restore to take advantage of parallel processing of dumps and loads for the databases. This is SOOOOoooo much faster and takes up much less disk space.
When clusters were relatively small a couple decades ago, I reckon pg_dumpall sufficed, but not today in my real-world experiences.
Regards,
Michael Vitale
Dean Gibson (DB Administrator) wrote on 7/3/2021 5:13 PM:
I use pg_dumpall -g to get the globals and then use the -Fd directive of pg_dump/pg_restore to take advantage of parallel processing of dumps and loads for the databases. This is SOOOOoooo much faster and takes up much less disk space.
When clusters were relatively small a couple decades ago, I reckon pg_dumpall sufficed, but not today in my real-world experiences.
Regards,
Michael Vitale
Dean Gibson (DB Administrator) wrote on 7/3/2021 5:13 PM:
On 2021-07-03 13:18, Dean Gibson (DB Administrator) wrote:Years ago, I started backing up my table data in individual groups (typically SCHEMAs). More recently, I have also been backing up the entire cluster using pg_dumpall.
Today I thought:Why not stop the dumps of individual groups? Why not use continue to dump with pg_dumpall, & then use pg_restore to restore all or just portions of the database, as needed?
That sounded good until I did a bit of research. Despite this site https://www.postgresqltutorial.com/postgresql-restore-database/ saying that you could use pg_restore with pg_dumpall, the authoritative documentation says that you can't.
So, assuming that's true, what is the best way to accomplish what I want? I see no effective way of filtering out individual SCHEMAs when restoring from the output of a pg_dumpall. That means:
- Using pg_restore, which has the capability of filtering the restore.
- Using pg_dump with archive output.
Unfortunately, as far as I know, pg_dump doesn't dump roles & possibly other data that I need. I presently have script files that I keep updated (hopefully) in parallel, for the compute manual recreation of the database, but there's nothing like backing up up everything.
So, here's my ultimate question(s):
- Can I do a pg_dumpall to backup just the parts that pg_dump omits, & then be able to do a complete restore by restoring the non-data with psql, & then restoring the data with pg_restore?
- If so, what are the appropriate options to pg_dumpall?
I'm thinking the following will work, but an authoritative answer would be nice:
pg_dumpall -rs (I don't use tablespaces)
Turns out "-rs" is a mistake. "-r" & "-g" override "-s", & "-s" contains everything that "-g" does, & more.
Well, I never store the output of pg_dumpall directly; I pipe it through gzip, & the resultant size differs by about 1% from the size from pg_dump in custom-archive format.
I also found that pg_dumpall -g doesn't get the triggers; pg_dumpall -s does. I don't know if pg-dump gets the triggers.
On 2021-07-03 17:42, MichaelDBA wrote:
I also found that pg_dumpall -g doesn't get the triggers; pg_dumpall -s does. I don't know if pg-dump gets the triggers.
On 2021-07-03 17:42, MichaelDBA wrote:
I NEVER use pg_dumpall for my databases. No way I want to dump everything in text format. For big databases that is excruciatingly slow.
I use pg_dumpall -g to get the globals and then use the -Fd directive of pg_dump/pg_restore to take advantage of parallel processing of dumps and loads for the databases. This is SOOOOoooo much faster and takes up much less disk space.
When clusters were relatively small a couple decades ago, I reckon pg_dumpall sufficed, but not today in my real-world experiences.
Regards,
Michael Vitale
Dean Gibson (DB Administrator) wrote on 7/3/2021 5:13 PM:On 2021-07-03 13:18, Dean Gibson (DB Administrator) wrote:Years ago, I started backing up my table data in individual groups (typically SCHEMAs). More recently, I have also been backing up the entire cluster using pg_dumpall.
Today I thought:Why not stop the dumps of individual groups? Why not use continue to dump with pg_dumpall, & then use pg_restore to restore all or just portions of the database, as needed?
That sounded good until I did a bit of research. Despite this site https://www.postgresqltutorial.com/postgresql-restore-database/ saying that you could use pg_restore with pg_dumpall, the authoritative documentation says that you can't.
So, assuming that's true, what is the best way to accomplish what I want? I see no effective way of filtering out individual SCHEMAs when restoring from the output of a pg_dumpall. That means:
- Using pg_restore, which has the capability of filtering the restore.
- Using pg_dump with archive output.
Unfortunately, as far as I know, pg_dump doesn't dump roles & possibly other data that I need. I presently have script files that I keep updated (hopefully) in parallel, for the compute manual recreation of the database, but there's nothing like backing up up everything.
So, here's my ultimate question(s):
- Can I do a pg_dumpall to backup just the parts that pg_dump omits, & then be able to do a complete restore by restoring the non-data with psql, & then restoring the data with pg_restore?
- If so, what are the appropriate options to pg_dumpall?
I'm thinking the following will work, but an authoritative answer would be nice:
pg_dumpall -rs (I don't use tablespaces)
Turns out "-rs" is a mistake. "-r" & "-g" override "-s", & "-s" contains everything that "-g" does, & more.
On Sat, Jul 03, 2021 at 11:04:26PM -0700, Dean Gibson (DB Administrator) wrote: > Well, I never store the output of pg_dumpall directly; I pipe it through > gzip, & the resultant size differs by about 1% from the size from pg_dump in > custom-archive format. > > I also found that pg_dumpall -g doesn't get the triggers; pg_dumpall -s > does. I don't know if pg-dump gets the triggers. Triggers are inside database, so normal pg_dump handles them. Generally, I stand firmly in position that one should never use pg_dumpall. Size of dump is one thing, but inability to sanely filter what you will load is deal breaker. Plus - with modern pg_dump and -Fd, both dumping time, and loading time can be significantly reduced thanks to parallelism. Wrote about it: https://www.depesz.com/2019/12/10/how-to-effectively-dump-postgresql-databases/ depesz
> On Jul 4, 2021, at 2:04 AM, Dean Gibson (DB Administrator) <postgresql@mailpen.com> wrote: > > Well, I never store the Well, as long as we’re stating I never’s — let me state mine; I never consider a logical backup (pg_dump) a proper backup. They have their use cases and are a very useful tool; however, I would considered a production database without aphysical backup as not being backed up. Logical backups don’t allow for PITR recovery and are very slow in comparison. I only use logical backups for migrations, etc. - more as a tool rather than a backup strategy. A good practice would be to have a separate instance that can instantiate a point in time recovery. This would allow forthe reviewing/exporting required data needed to resolve a given issue.