Hello list,
based on the delays I experienced in pg_restore, as described at:
https://www.postgresql.org/message-id/flat/6bd16bdb-aa5e-0512-739d-b84100596035@gmx.net
I noticed that the seeking-reading behaviour was manifested by every one
of the pg_restore worker processes, in parallel, making the situation even
worse. With this patch I moved this phase to the parent process before
fork(), so that the children have the necessary information from birth.
Copying the commit message:
A pg_dump custom format archive without offsets in the table of
contents, is usually generated when pg_dump writes to stdout instead of
a file. When doing parallel pg_restore (-j) from such a file, every
worker process was scanning the full archive sequentially, in order to
build the offset table and find the parts assigned to restore. This led
to the worker processes competing for I/O.
This patch moves this offset-table building phase to the parent process,
before forking the worker processes.
The upside is that we now have only one extra scan of the file.
And this scan happens without other competing I/O, so it completes
faster.
The downside is that there is a delay before spawning the children and
starting assigning jobs to them.
What do you think?
Thanks,
Dimitris