Home > mailing lists

Re: Assorted improvements in pg_dump - Mailing list pgsql-hackers

From	Hans Buschmann
Subject	Re: Assorted improvements in pg_dump
Date	October 22, 2021 19:36:27
Msg-id	7d7eb6128f40401d81b3b7a898b6b4de@W2012-02.nidsa.loc Whole thread Raw
In response to	Assorted improvements in pg_dump (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Assorted improvements in pg_dump
List	pgsql-hackers

Tree view

Hello Tom!

I noticed you are improving pg_dump just now.

Some time ago I experimented with a customer database dump in parallel directory mode -F directory -j (2-4)

I noticed it took quite long to complete.

Further investigation showed that in this mode with multiple jobs the tables are processed in decreasing size order,
whichmakes sense to avoid a long tail of a big table in one of the jobs prolonging overall dump time. 

Exactly one table took very long, but seemed to be of moderate size.

But the size-determination fails to consider the size of toast tables and this table had a big associated toast-table
ofbytea column(s). 
Even with an analyze at loading time there where no size information of the toast-table in the catalog tables.

I thought of the following alternatives to ameliorate:

1. Using pg_table_size() function in the catalog query
Pos: This reflects the correct size of every relation
Neg: This goes out to disk and may take a huge impact on databases with very many tables

2. Teaching vacuum to set the toast-table size like it sets it on normal tables

3. Have a command/function for occasionly setting the (approximate) size of toast tables

I think with further work under the way (not yet ready), pg_dump can really profit from parallel/not compressing mode,
especiallyconsidering the huge amount of bytea/blob/string data in many big customer scenarios. 

Thoughts?

Hans Buschmann

pgsql-hackers by date:

From: Zhihong Yu
Date: 22 October 2021, 19:01:35
Subject: Re: Multi-Column List Partitioning

From: Japin Li
Date: 22 October 2021, 19:40:42
Subject: Re: [Bug] Logical Replication failing if the DateStyle is different in Publisher & Subscriber

Re: Assorted improvements in pg_dump - Mailing list pgsql-hackers

Previous

Next