Re: Patch: dumping tables data in multiple chunks in pg_dump - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: Patch: dumping tables data in multiple chunks in pg_dump
Date
Msg-id CAFj8pRAp1Feh3O6NJkXuJQnG62YK-RJfgMfGQkXpe2PiPJS-Bg@mail.gmail.com
Whole thread Raw
In response to Patch: dumping tables data in multiple chunks in pg_dump  (Hannu Krosing <hannuk@google.com>)
List pgsql-hackers
Hi

pá 12. 12. 2025 v 9:02 odesílatel Hannu Krosing <hannuk@google.com> napsal:
Attached is a patch that adds the ability to dump table data in multiple chunks.

Looking for feedback at this point:
 1) what have I missed
 2) should I implement something to avoid single-page chunks

The flag --huge-table-chunk-pages which tells the directory format
dump to dump tables where the main fork has more pages than this in
multiple chunks of given number of pages,

The main use case is speeding up parallel dumps in case of one or a
small number of HUGE tables so parts of these can be dumped in
parallel.

It will also help in case the target file system has some limitations
on file sizes (4GB for FAT, 5TB for GCS).

Currently no tests are included in the patch and also no extra
documentation outside what is printed out by pg_dump --help . Also any
pg_log_warning lines with "CHUNKING" is there for debugging and needs
to be removed before committing.

As  implemented no changes are needed for pg_restore as all chunks are
already associated with the table in .toc and thus are restored into
this table

the attached README shows how I verified it works  and the textual
file created from the directory format dump in the last step there

I did first look on this patch and there are some white space issues

Regards

Pavel
 

--
Hannu

pgsql-hackers by date:

Previous
From: Henson Choi
Date:
Subject: Re: Row pattern recognition
Next
From: shveta malik
Date:
Subject: Re: Proposal: Conflict log history table for Logical Replication