Rebased and attached new patch. Should I add it to July's commitfest?
On Fri, 4 Apr 2025, Dimitrios Apostolou wrote:
> Hello list,
>
> based on the delays I experienced in pg_restore, as described at:
>
> https://www.postgresql.org/message-id/flat/6bd16bdb-aa5e-0512-739d-b84100596035@gmx.net
>
> I noticed that the seeking-reading behaviour was manifested by every one of
> the pg_restore worker processes, in parallel, making the situation even
> worse. With this patch I moved this phase to the parent process before
> fork(), so that the children have the necessary information from birth.
>
> Copying the commit message:
>
> A pg_dump custom format archive without offsets in the table of
> contents, is usually generated when pg_dump writes to stdout instead of
> a file. When doing parallel pg_restore (-j) from such a file, every
> worker process was scanning the full archive sequentially, in order to
> build the offset table and find the parts assigned to restore. This led
> to the worker processes competing for I/O.
>
> This patch moves this offset-table building phase to the parent process,
> before forking the worker processes.
>
> The upside is that we now have only one extra scan of the file.
> And this scan happens without other competing I/O, so it completes
> faster.
>
> The downside is that there is a delay before spawning the children and
> starting assigning jobs to them.
>
>
> What do you think?
>
> Thanks,
> Dimitris
>