Re: patch for parallel pg_dump - Mailing list pgsql-hackers

From Joachim Wieland
Subject Re: patch for parallel pg_dump
Date
Msg-id CACw0+11UzETW2PHyY=Rq2MB1a_-WCoO3m0q8--ELVO5QdiJjyQ@mail.gmail.com
Whole thread Raw
In response to Re: patch for parallel pg_dump  (Andrew Dunstan <adunstan@postgresql.org>)
List pgsql-hackers
On Wed, Mar 14, 2012 at 4:39 PM, Andrew Dunstan <adunstan@postgresql.org> wrote:
> I've just started looking at the patch, and I'm curious to know why it
> didn't follow the pattern of parallel pg_restore which created a new worker
> for each table rather than passing messages to looping worker threads as
> this appears to do. That might have avoided a lot of the need for this
> message passing infrastructure, if it could have been done. But maybe I just
> need to review the patch and the discussions some more.

The main reason for this design has now been overcome by the
flexibility of the synchronized snapshot feature, which allows to get
the snapshot of a transaction even if this other transaction has been
running for quite some time already. In other previously proposed
implementations of this feature, workers had to connect at the same
time and then could not close their transactions without losing the
snapshot.

The other drawback of the fork-per-tocentry-approach is the somewhat
limited bandwith of information from the worker back to the master,
it's basically just the return code. That's fine if there is no error,
but if there is, then the master can't tell any further details (e.g.
"could not get lock on table foo", or "could not write to file bar: no
space left on device").

This restriction does not only apply to error messages. For example,
what I'd also like to have in pg_dump would be checksums on a
per-TocEntry basis. The individual workers would calculate the
checksums when writing the file and then send them back to the master
for integration into the TOC. I don't see how such a feature could be
implemented in a straightforward way without a message passing
infrastructure.


pgsql-hackers by date:

Previous
From: Joachim Wieland
Date:
Subject: Re: patch for parallel pg_dump
Next
From: David Fetter
Date:
Subject: Re: CREATE FOREGIN TABLE LACUNA