Re: Parallel pg_restore versus old dump files - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Parallel pg_restore versus old dump files
Date
Msg-id AANLkTimu-yJJBB_AhQ9o_wojjfIy4ef_X-LlwE1AERGo@mail.gmail.com
Whole thread Raw
In response to Parallel pg_restore versus old dump files  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, Jun 22, 2010 at 9:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 3. Perhaps pg_dump ought to emit a warning when it can't seek, instead
> of just silently not writing the data offsets.  That behavior was okay
> before when lack of data offsets didn't really matter that much, but
> lack of data offsets is a serious performance handicap for parallel
> restore even after we fix the outright failure condition (because each
> worker is going to read through a lot of data to find what it needs).
>

I'm not terribly familiar with the pg_dump format, but... the usual
strategy for storing a TOC on a non-seekable output stream is to store
it at the end of the file. So you just accumulate all the offsets in
memory as you generate the file and then write the TOC at the end. Of
course you need a seekable input stream when you load it then but it
would narrow the slow case to when you have a non-seekable output
stream when dumping *and* a non-seekable input stream on restore.

On the other hand if we didn't notice this dependency when there was
only one variable making it depend on two variables would make it that
much more obscure when the slow case hits and users wonder why the
restore is taking so long.

--
greg


pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: Using multidimensional indexes in ordinal queries
Next
From: Andrew Dunstan
Date:
Subject: Re: Parallel pg_restore versus old dump files