Re: [PATCH] parallel pg_restore: move offset-building phase to before forking - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [PATCH] parallel pg_restore: move offset-building phase to before forking
Date
Msg-id 674858.1761083905@sss.pgh.pa.us
Whole thread Raw
In response to [PATCH] parallel pg_restore: move offset-building phase to before forking  (Dimitrios Apostolou <jimis@gmx.net>)
List pgsql-hackers
Dimitrios Apostolou <jimis@gmx.net> writes:
> A pg_dump custom format archive without offsets in the table of
> contents, is usually generated when pg_dump writes to stdout instead of
> a file. When doing parallel pg_restore (-j) from such a file, every
> worker process was scanning the full archive sequentially, in order to
> build the offset table and find the parts assigned to restore. This led
> to the worker processes competing for I/O.

> This patch moves this offset-table building phase to the parent process,
> before forking the worker processes.

> The upside is that we now have only one extra scan of the file.
> And this scan happens without other competing I/O, so it completes
> faster.

> The downside is that there is a delay before spawning the children and
> starting assigning jobs to them.

> What do you think?

I can't say that I love this proposal: it's basically doing more work
earlier in hopes of avoiding work later, so it is sometimes going to
lose, maybe badly.  (Worst case would be a selective restore where all
the wanted objects appear early in a large archive file.)

I think also that the potential performance win might have changed
fairly drastically in the wake of the performance fixes we just
finished up in the other thread [1].  So at the very least we need
some new testing based on current pg_dump and pg_restore.

I wonder whether there isn't a less brute-force way to get the
same effect.  I agree there is probably no value in having several
different workers scanning through the same data, but could we
nominate one worker to find the next TOC object start position
and let the rest sleep until it's available?  We'd probably
have to transmit the discovered positions via the leader process,
but we already have a communications mechanism that could be used.
Keeping the TOC in shared memory might be a way too.  (Or, dare I
say it, maybe parallel dump/restore should be converted to use
threads everywhere not just on Windows?)

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/2edb7a57-b225-3b23-a680-62ba90658fec@gmx.net



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: abi-compliance-check failure due to recent changes to pg_{clear,restore}_{attribute,relation}_stats()
Next
From: Jim Jones
Date:
Subject: Re: [PATCH] Add pg_get_trigger_ddl() to retrieve the CREATE TRIGGER statement