Re: Patch: dumping tables data in multiple chunks in pg_dump - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: Patch: dumping tables data in multiple chunks in pg_dump
Date
Msg-id CAMT0RQQr7KtPAY903+F42csiHc1EPHo70Xji-znkxEhwdoKa6w@mail.gmail.com
Whole thread Raw
In response to Re: Patch: dumping tables data in multiple chunks in pg_dump  (Hannu Krosing <hannuk@google.com>)
Responses Re: Patch: dumping tables data in multiple chunks in pg_dump
List pgsql-hackers
Ran another test with a 53GB database where most of the data is in TOAST

CREATE TABLE just_toasted(
  id serial primary key,
  toasted1 char(2200) STORAGE EXTERNAL,
  toasted2 char(2200) STORAGE EXTERNAL,
  toasted3 char(2200) STORAGE EXTERNAL,
  toasted4 char(2200) STORAGE EXTERNAL
);

and the toast fields were added in somewhat randomised order.

Here the results are as follows

Parallelism   |   chunk size (pages)   |    time (sec)
 1        |    -         |     240
 2        |  1000    |     129
 4        |  1000    |      64
 8        |  1000    |      36
16       |  1000    |      30

 4        |  9095    |      78
 8        |  9095    |      42
16       |  9095    |      42

The reason larger chunk sizes performed worse was that they often had
one or two stragglers left behind which

Detailed run results below:

hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
--format=directory -h 10.58.80.2 -U postgres -f
/tmp/ltoastdb-1-plain.dump largetoastdb
real    3m59.465s
user    3m43.304s
sys     0m15.844s

hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
--format=directory -h 10.58.80.2 -U postgres
--huge-table-chunk-pages=9095 -j 4 -f /tmp/ltoastdb-4.dump
largetoastdb
real    1m18.320s
user    3m49.236s
sys     0m19.422s

hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
--format=directory -h 10.58.80.2 -U postgres
--huge-table-chunk-pages=9095 -j 8 -f /tmp/ltoastdb-8.dump
largetoastdb
real    0m42.028s
user    3m55.299s
sys     0m24.657s

hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
--format=directory -h 10.58.80.2 -U postgres
--huge-table-chunk-pages=9095 -j 16 -f /tmp/ltoastdb-16.dump
largetoastdb
real    0m42.575s
user    4m11.011s
sys     0m26.110s

hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
--format=directory -h 10.58.80.2 -U postgres
--huge-table-chunk-pages=1000 -j 16 -f /tmp/ltoastdb-16-1kpages.dump
largetoastdb
real    0m29.641s
user    6m16.321s
sys     0m49.345s

hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
--format=directory -h 10.58.80.2 -U postgres
--huge-table-chunk-pages=1000 -j 8 -f /tmp/ltoastdb-8-1kpages.dump
largetoastdb
real    0m35.685s
user    3m58.528s
sys     0m26.729s

hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
--format=directory -h 10.58.80.2 -U postgres
--huge-table-chunk-pages=1000 -j 4 -f /tmp/ltoastdb-4-1kpages.dump
largetoastdb
real    1m3.737s
user    3m50.251s
sys     0m18.507s

hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
--format=directory -h 10.58.80.2 -U postgres
--huge-table-chunk-pages=1000 -j 2 -f /tmp/ltoastdb-2-1kpages.dump
largetoastdb
real    2m8.708s
user    3m57.018s
sys     0m18.499s

On Thu, Nov 13, 2025 at 7:39 PM Hannu Krosing <hannuk@google.com> wrote:
>
> Going up to 16 workers did not improve performance , but this is
> expected, as the disk behind the database can only do 4TB/hour of
> reads, which is now the bottleneck. (408/352/*3600 = 4172 GB/h)
>
> $ time ./pg_dump --format=directory -h 10.58.80.2 -U postgres
> --huge-table-chunk-pages=131072 -j 16 -f /tmp/parallel16.dump largedb
> real    5m44.900s
> user    53m50.491s
> sys     5m47.602s
>
> And 4 workers showed near-linear speedup from single worker
>
> hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> --format=directory -h 10.58.80.2 -U postgres
> --huge-table-chunk-pages=131072 -j 4 -f /tmp/parallel4.dump largedb
> real    10m32.074s
> user    38m54.436s
> sys     2m58.216s
>
> The database runs on a 64vCPU VM with 128GB RAM, so most of the table
> will be read in from the disk
>
>
>
>
>
>
> On Thu, Nov 13, 2025 at 7:02 PM Hannu Krosing <hannuk@google.com> wrote:
> >
> > I just ran a test by generating a 408GB table and then dumping it both ways
> >
> > $ time pg_dump --format=directory -h 10.58.80.2 -U postgres -f
> > /tmp/plain.dump largedb
> >
> > real    39m54.968s
> > user    37m21.557s
> > sys     2m32.422s
> >
> > $ time ./pg_dump --format=directory -h 10.58.80.2 -U postgres
> > --huge-table-chunk-pages=131072 -j 8 -f /tmp/parallel8.dump largedb
> >
> > real    5m52.965s
> > user    40m27.284s
> > sys     3m53.339s
> >
> > So parallel dump with 8 workers using 1GB (128k pages) chunks runs
> > almost 7 times faster than the sequential dump.
> >
> > this was a table that had no TOAST part. I will run some more tests
> > with TOASTed tables next and expect similar or better improvements.
> >
> >
> >
> > On Wed, Nov 12, 2025 at 1:59 PM Ashutosh Bapat
> > <ashutosh.bapat.oss@gmail.com> wrote:
> > >
> > > Hi Hannu,
> > >
> > > On Tue, Nov 11, 2025 at 9:00 PM Hannu Krosing <hannuk@google.com> wrote:
> > > >
> > > > Attached is a patch that adds the ability to dump table data in multiple chunks.
> > > >
> > > > Looking for feedback at this point:
> > > >  1) what have I missed
> > > >  2) should I implement something to avoid single-page chunks
> > > >
> > > > The flag --huge-table-chunk-pages which tells the directory format
> > > > dump to dump tables where the main fork has more pages than this in
> > > > multiple chunks of given number of pages,
> > > >
> > > > The main use case is speeding up parallel dumps in case of one or a
> > > > small number of HUGE tables so parts of these can be dumped in
> > > > parallel.
> > >
> > > Have you measured speed up? Can you please share the numbers?
> > >
> > > --
> > > Best Wishes,
> > > Ashutosh Bapat



pgsql-hackers by date:

Previous
From: "Vitaly Davydov"
Date:
Subject: Re: Newly created replication slot may be invalidated by checkpoint
Next
From: Hannu Krosing
Date:
Subject: Re: Patch: dumping tables data in multiple chunks in pg_dump