Re: WIP patch for parallel pg_dump - Mailing list pgsql-hackers

From Tom Lane
Subject Re: WIP patch for parallel pg_dump
Date
Msg-id 4881.1291681329@sss.pgh.pa.us
Whole thread Raw
In response to Re: WIP patch for parallel pg_dump  (Josh Berkus <josh@agliodbs.com>)
Responses Re: WIP patch for parallel pg_dump  (Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>)
Re: WIP patch for parallel pg_dump  (Gurjeet Singh <singh.gurjeet@gmail.com>)
List pgsql-hackers
Josh Berkus <josh@agliodbs.com> writes:
>> However, if you were doing something like parallel pg_dump you could
>> just run the parent and child instances all against the slave, so the
>> pg_dump scenario doesn't seem to offer much of a supporting use-case for
>> worrying about this.  When would you really need to be able to do it?

> If you had several standbys, you could distribute the work of the
> pg_dump among them.  This would be a huge speedup for a large database,
> potentially, thanks to parallelization of I/O and network.  Imagine
> doing a pg_dump of a 300GB database in 10min.

That does sound kind of attractive.  But to do that I think we'd have to
go with the pass-the-snapshot-through-the-client approach.  Shipping
internal snapshot files through the WAL stream doesn't seem attractive
to me.

While I see Robert's point about preferring not to expose the snapshot
contents to clients, I don't think it outweighs all other considerations
here; and every other one is pointing to doing it the other way.
        regards, tom lane


pgsql-hackers by date:

Previous
From: flyusa2010 fly
Date:
Subject: the number of file descriptors when using POSIX semaphore
Next
From: Josh Berkus
Date:
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+