Thread: The trap when using pg_dumpall

The trap when using pg_dumpall

From

"Dean Gibson (DB Administrator)"

Date:

03 July 2021, 20:18:57

Years ago, I started backing up my table data in individual groups (typically SCHEMAs). More recently, I have also been backing up the entire cluster using pg_dumpall.

Today I thought:

Why not stop the dumps of individual groups? Why not use continue to dump with pg_dumpall, & then use pg_restore to restore all or just portions of the database, as needed?

That sounded good until I did a bit of research. Despite this site https://www.postgresqltutorial.com/postgresql-restore-database/ saying that you could use pg_restore with pg_dumpall, the authoritative documentation says that you can't.

So, assuming that's true, what is the best way to accomplish what I want? I see no effective way of filtering out individual SCHEMAs when restoring from the output of a pg_dumpall. That means:

Using pg_restore, which has the capability of filtering the restore.
Using pg_dump with archive output.

Unfortunately, as far as I know, pg_dump doesn't dump roles & possibly other data that I need. I presently have script files that I keep updated (hopefully) in parallel, for the compute manual recreation of the database, but there's nothing like backing up up everything.

So, here's my ultimate question(s):

Can I do a pg_dumpall to backup just the parts that pg_dump omits, & then be able to do a complete restore by restoring the non-data with psql, & then restoring the data with pg_restore?
If so, what are the appropriate options to pg_dumpall?

I'm thinking the following will work, but an authoritative answer would be nice:

pg_dumpall -rs (I don't use tablespaces)

Re: The trap when using pg_dumpall

From

"Dean Gibson (DB Administrator)"

Date:

03 July 2021, 21:13:18

On 2021-07-03 13:18, Dean Gibson (DB Administrator) wrote:

Years ago, I started backing up my table data in individual groups (typically SCHEMAs). More recently, I have also been backing up the entire cluster using pg_dumpall.

Today I thought:

Why not stop the dumps of individual groups? Why not use continue to dump with pg_dumpall, & then use pg_restore to restore all or just portions of the database, as needed?

That sounded good until I did a bit of research. Despite this site https://www.postgresqltutorial.com/postgresql-restore-database/ saying that you could use pg_restore with pg_dumpall, the authoritative documentation says that you can't.

So, assuming that's true, what is the best way to accomplish what I want? I see no effective way of filtering out individual SCHEMAs when restoring from the output of a pg_dumpall. That means:

Using pg_restore, which has the capability of filtering the restore.
Using pg_dump with archive output.

Unfortunately, as far as I know, pg_dump doesn't dump roles & possibly other data that I need. I presently have script files that I keep updated (hopefully) in parallel, for the compute manual recreation of the database, but there's nothing like backing up up everything.

So, here's my ultimate question(s):

Can I do a pg_dumpall to backup just the parts that pg_dump omits, & then be able to do a complete restore by restoring the non-data with psql, & then restoring the data with pg_restore?
If so, what are the appropriate options to pg_dumpall?

I'm thinking the following will work, but an authoritative answer would be nice:

pg_dumpall -rs (I don't use tablespaces)

Turns out "-rs" is a mistake. "-r" & "-g" override "-s", & "-s" contains everything that "-g" does, & more.

Re: The trap when using pg_dumpall

From

MichaelDBA

Date:

04 July 2021, 00:42:29

I NEVER use pg_dumpall for my databases. No way I want to dump everything in text format. For big databases that is excruciatingly slow.

I use pg_dumpall -g to get the globals and then use the -Fd directive of pg_dump/pg_restore to take advantage of parallel processing of dumps and loads for the databases. This is SOOOOoooo much faster and takes up much less disk space.

When clusters were relatively small a couple decades ago, I reckon pg_dumpall sufficed, but not today in my real-world experiences.

Regards,
Michael Vitale

Dean Gibson (DB Administrator) wrote on 7/3/2021 5:13 PM:

On 2021-07-03 13:18, Dean Gibson (DB Administrator) wrote:
Years ago, I started backing up my table data in individual groups (typically SCHEMAs). More recently, I have also been backing up the entire cluster using pg_dumpall.

Today I thought:

Why not stop the dumps of individual groups? Why not use continue to dump with pg_dumpall, & then use pg_restore to restore all or just portions of the database, as needed?

That sounded good until I did a bit of research. Despite this site https://www.postgresqltutorial.com/postgresql-restore-database/ saying that you could use pg_restore with pg_dumpall, the authoritative documentation says that you can't.

So, assuming that's true, what is the best way to accomplish what I want? I see no effective way of filtering out individual SCHEMAs when restoring from the output of a pg_dumpall. That means:

Using pg_restore, which has the capability of filtering the restore.
Using pg_dump with archive output.

Unfortunately, as far as I know, pg_dump doesn't dump roles & possibly other data that I need. I presently have script files that I keep updated (hopefully) in parallel, for the compute manual recreation of the database, but there's nothing like backing up up everything.

So, here's my ultimate question(s):

Can I do a pg_dumpall to backup just the parts that pg_dump omits, & then be able to do a complete restore by restoring the non-data with psql, & then restoring the data with pg_restore?
If so, what are the appropriate options to pg_dumpall?

I'm thinking the following will work, but an authoritative answer would be nice:

pg_dumpall -rs (I don't use tablespaces)

Turns out "-rs" is a mistake. "-r" & "-g" override "-s", & "-s" contains everything that "-g" does, & more.

Re: The trap when using pg_dumpall

From

"Dean Gibson (DB Administrator)"

Date:

04 July 2021, 06:04:26

Well, I never store the output of pg_dumpall directly; I pipe it through gzip, & the resultant size differs by about 1% from the size from pg_dump in custom-archive format.

I also found that pg_dumpall -g doesn't get the triggers; pg_dumpall -s does. I don't know if pg-dump gets the triggers.

On 2021-07-03 17:42, MichaelDBA wrote:

I NEVER use pg_dumpall for my databases. No way I want to dump everything in text format. For big databases that is excruciatingly slow.

I use pg_dumpall -g to get the globals and then use the -Fd directive of pg_dump/pg_restore to take advantage of parallel processing of dumps and loads for the databases. This is SOOOOoooo much faster and takes up much less disk space.

When clusters were relatively small a couple decades ago, I reckon pg_dumpall sufficed, but not today in my real-world experiences.

Regards,
Michael Vitale

Dean Gibson (DB Administrator) wrote on 7/3/2021 5:13 PM:
On 2021-07-03 13:18, Dean Gibson (DB Administrator) wrote:
Years ago, I started backing up my table data in individual groups (typically SCHEMAs). More recently, I have also been backing up the entire cluster using pg_dumpall.

Today I thought:

Why not stop the dumps of individual groups? Why not use continue to dump with pg_dumpall, & then use pg_restore to restore all or just portions of the database, as needed?

That sounded good until I did a bit of research. Despite this site https://www.postgresqltutorial.com/postgresql-restore-database/ saying that you could use pg_restore with pg_dumpall, the authoritative documentation says that you can't.

So, assuming that's true, what is the best way to accomplish what I want? I see no effective way of filtering out individual SCHEMAs when restoring from the output of a pg_dumpall. That means:

Using pg_restore, which has the capability of filtering the restore.
Using pg_dump with archive output.

Unfortunately, as far as I know, pg_dump doesn't dump roles & possibly other data that I need. I presently have script files that I keep updated (hopefully) in parallel, for the compute manual recreation of the database, but there's nothing like backing up up everything.

So, here's my ultimate question(s):

Can I do a pg_dumpall to backup just the parts that pg_dump omits, & then be able to do a complete restore by restoring the non-data with psql, & then restoring the data with pg_restore?
If so, what are the appropriate options to pg_dumpall?

I'm thinking the following will work, but an authoritative answer would be nice:

pg_dumpall -rs (I don't use tablespaces)

Turns out "-rs" is a mistake. "-r" & "-g" override "-s", & "-s" contains everything that "-g" does, & more.

Re: The trap when using pg_dumpall

From

hubert depesz lubaczewski

Date:

04 July 2021, 08:35:27

On Sat, Jul 03, 2021 at 11:04:26PM -0700, Dean Gibson (DB Administrator) wrote:
> Well, I never store the output of pg_dumpall directly; I pipe it through
> gzip, & the resultant size differs by about 1% from the size from pg_dump in
> custom-archive format.
> 
> I also found that pg_dumpall -g doesn't get the triggers; pg_dumpall -s
> does.  I don't know if pg-dump gets the triggers.

Triggers are inside database, so normal pg_dump handles them.

Generally, I stand firmly in position that one should never use
pg_dumpall.

Size of dump is one thing, but inability to sanely filter what you will
load is deal breaker.

Plus - with modern pg_dump and -Fd, both dumping time, and loading time
can be significantly reduced thanks to parallelism.

Wrote about it:
https://www.depesz.com/2019/12/10/how-to-effectively-dump-postgresql-databases/

depesz

Re: The trap when using pg_dumpall

From

Rui DeSousa

Date:

04 July 2021, 21:32:46

> On Jul 4, 2021, at 2:04 AM, Dean Gibson (DB Administrator) <postgresql@mailpen.com> wrote:
>
> Well, I never store the

Well, as long as we’re stating I never’s — let me state mine; I never consider a logical backup (pg_dump) a proper
backup. They have their use cases and are a very useful tool; however, I would considered a production database without
aphysical backup as not being backed up.  Logical backups don’t allow for PITR recovery and are very slow in
comparison. I only use logical backups for migrations, etc.  - more as a tool rather than a backup strategy. 

A good practice would be to have a separate instance that can instantiate a point in time recovery.  This would allow
forthe reviewing/exporting required data needed to resolve a given issue.