Home > mailing lists

Re: Parallel pg_restore versus old dump files - Mailing list pgsql-hackers

From	Greg Stark
Subject	Re: Parallel pg_restore versus old dump files
Date	June 22, 2010 22:52:58
Msg-id	AANLkTimu-yJJBB_AhQ9o_wojjfIy4ef_X-LlwE1AERGo@mail.gmail.com Whole thread Raw
In response to	Parallel pg_restore versus old dump files (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

On Tue, Jun 22, 2010 at 9:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 3. Perhaps pg_dump ought to emit a warning when it can't seek, instead
> of just silently not writing the data offsets.  That behavior was okay
> before when lack of data offsets didn't really matter that much, but
> lack of data offsets is a serious performance handicap for parallel
> restore even after we fix the outright failure condition (because each
> worker is going to read through a lot of data to find what it needs).
>

I'm not terribly familiar with the pg_dump format, but... the usual
strategy for storing a TOC on a non-seekable output stream is to store
it at the end of the file. So you just accumulate all the offsets in
memory as you generate the file and then write the TOC at the end. Of
course you need a seekable input stream when you load it then but it
would narrow the slow case to when you have a non-seekable output
stream when dumping *and* a non-seekable input stream on restore.

On the other hand if we didn't notice this dependency when there was
only one variable making it depend on two variables would make it that
much more obscure when the slow case hits and users wonder why the
restore is taking so long.

--
greg

pgsql-hackers by date:

From: Alexander Korotkov
Date: 22 June 2010, 22:04:23
Subject: Re: Using multidimensional indexes in ordinal queries

From: Andrew Dunstan
Date: 23 June 2010, 01:02:38
Subject: Re: Parallel pg_restore versus old dump files

Re: Parallel pg_restore versus old dump files - Mailing list pgsql-hackers

Previous

Next