Home > mailing lists

Re: Patch: dumping tables data in multiple chunks in pg_dump - Mailing list pgsql-hackers

From	Hannu Krosing
Subject	Re: Patch: dumping tables data in multiple chunks in pg_dump
Date	November 25 00:02:15
Msg-id	CAMT0RQQPMj3=EZ-4z6qRs_TmBHoyv2VHAdMrfDuwa5ZUY6XtHQ@mail.gmail.com Whole thread Raw
In response to	Re: Patch: dumping tables data in multiple chunks in pg_dump (Dilip Kumar <dilipbalaut@gmail.com>)
Responses	Re: Patch: dumping tables data in multiple chunks in pg_dump
List	pgsql-hackers

Tree view

The expectation was that as chunking is useful mainly in case of
really huge tables the analyze should have been run "recently enough".

Maybe we should use pg_relation_size() in case we have already
determined that the table is large enough to warrant chunking? Maybe
at least 1/2 of the requested chunk size?

My reasoning was to not put too much extra load on pg_dump in case
chunking is not required. But of course we can use the presence of a
chunking request to decide to run pg_relation_size(), assuming the
overhead won't be too large in this case.

On Mon, Nov 17, 2025 at 5:15 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Nov 11, 2025 at 9:00 PM Hannu Krosing <hannuk@google.com> wrote:
> >
> > Attached is a patch that adds the ability to dump table data in multiple chunks.
> >
> > Looking for feedback at this point:
> >  1) what have I missed
> >  2) should I implement something to avoid single-page chunks
> >
> > The flag --huge-table-chunk-pages which tells the directory format
> > dump to dump tables where the main fork has more pages than this in
> > multiple chunks of given number of pages,
> >
> > The main use case is speeding up parallel dumps in case of one or a
> > small number of HUGE tables so parts of these can be dumped in
> > parallel.
> >
>
> +1 for the idea, I haven't done the detailed review but I was just
> going through the patch, I noticed that we use pg_class->relpages to
> identify whether to chunk the table or not, which should be fine but
> don't you think if we use direct size calculation function like
> pg_relation_size() we might get better idea and not dependent upon
> whether the stats are updated or not?  This will make chunking
> behavior more deterministic.
>
> --
> Regards,
> Dilip Kumar
> Google

pgsql-hackers by date:

From: Andres Freund
Date: 24 November, 23:57:59
Subject: Re: Buffer locking is special (hints, checksums, AIO writes)

From: Robert Haas
Date: 25 November, 00:02:18
Subject: Re: pgsql: Teach DSM registry to ERROR if attaching to an uninitialized ent

Re: Patch: dumping tables data in multiple chunks in pg_dump - Mailing list pgsql-hackers

Previous

Next