[PATCH] parallel pg_restore: move offset-building phase to before forking - Mailing list pgsql-hackers

From Dimitrios Apostolou
Subject [PATCH] parallel pg_restore: move offset-building phase to before forking
Date
Msg-id b51f7c7a-f31b-f0e1-fc17-5bb4c3057ef5@gmx.net
Whole thread Raw
List pgsql-hackers
Hello list,

based on the delays I experienced in pg_restore, as described at:

https://www.postgresql.org/message-id/flat/6bd16bdb-aa5e-0512-739d-b84100596035@gmx.net

I noticed that the seeking-reading behaviour was manifested by every one
of the pg_restore worker processes, in parallel, making the situation even
worse. With this patch I moved this phase to the parent process before
fork(), so that the children have the necessary information from birth.

Copying the commit message:

A pg_dump custom format archive without offsets in the table of
contents, is usually generated when pg_dump writes to stdout instead of
a file. When doing parallel pg_restore (-j) from such a file, every
worker process was scanning the full archive sequentially, in order to
build the offset table and find the parts assigned to restore. This led
to the worker processes competing for I/O.

This patch moves this offset-table building phase to the parent process,
before forking the worker processes.

The upside is that we now have only one extra scan of the file.
And this scan happens without other competing I/O, so it completes
faster.

The downside is that there is a delay before spawning the children and
starting assigning jobs to them.


What do you think?

Thanks,
Dimitris

Attachment

pgsql-hackers by date:

Previous
From: Jakub Wartak
Date:
Subject: Re: Draft for basic NUMA observability
Next
From: Marcos Pegoraro
Date:
Subject: Re: Exponential notation bug