Re: Patch: dumping tables data in multiple chunks in pg_dump - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: Patch: dumping tables data in multiple chunks in pg_dump
Date
Msg-id CAMT0RQRtLwi_CrOcD7KxYL0Gm1nGXb-HWmerVg=ajEs6JP7m+w@mail.gmail.com
Whole thread
In response to Patch: dumping tables data in multiple chunks in pg_dump  (Hannu Krosing <hannuk@google.com>)
Responses Re: Patch: dumping tables data in multiple chunks in pg_dump
List pgsql-hackers
The issue is that currently the value is given in "main table pages"
and it would be somewhat deceptive, or at least confusing, to try to
express this in any other unit.

As I explained in the commit message:

---------8<-------------------8<-------------------8<----------------
This --max-table-segment-pages number specifically applies to main table
pages which does not guarantee anything about output size.
The output could be empty if there are no live tuples in the page range.
Or it can be almost 200 GB if the page has just pointers to 1GB TOAST items.
---------8<-------------------8<-------------------8<----------------

And I can think of no cheap and reliable way to change that equation.

I'll be very happy if you have any good ideas for either improving the
flag name, or even propose a way to better estimate the resulting dump
file size so we could give the chunk size in better units

---
Hannu





On Sat, Mar 28, 2026 at 12:26 PM Michael Banck <mbanck@gmx.net> wrote:
>
> Hi,
>
> On Tue, Jan 13, 2026 at 03:27:25PM +1300, David Rowley wrote:
> > Perhaps --max-table-segment-pages is a better name than
> > --huge-table-chunk-pages as it's quite subjective what the minimum
> > number of pages required to make a table "huge".
>
> I'm not sure that's better - without looking at the documentation,
> people might confuse segment here with the 1GB split of tables into
> segments. As pg_dump is a very common and basic user tool, I don't think
> implementation details like pages/page sizes and blocks should be part
> of its UX.
>
> Can't we just make it a storage size, like '10GB' and then rename it to
> --table-parallel-threshold or something? I agree it's bikeshedding, but
> I personally don't like either --max-table-segment-pages or
> --huge-table-chunk-pages.
>
>
> Michael



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: astreamer fixes
Next
From: Hannu Krosing
Date:
Subject: Re: Patch: dumping tables data in multiple chunks in pg_dump