Re: pg_dump additional options for performance - Mailing list pgsql-patches

From Stephen Frost
Subject Re: pg_dump additional options for performance
Date
Msg-id 20080726164355.GH16005@tamriel.snowman.net
Whole thread Raw
In response to Re: pg_dump additional options for performance  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: pg_dump additional options for performance  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-patches
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > On Fri, 2008-07-25 at 19:16 -0400, Tom Lane wrote:
> >> The key problem is that pg_restore is broken:
>
> > The key capability here is being able to split the dump into multiple
> > pieces. The equivalent capability on restore is *not* required, because
> > once the dump has been split the restore never needs to be.
>
> This argument is nonsense.  The typical usage of this capability, IMHO,
> will be
>
>     pg_dump -Fc >whole.dump
>     pg_restore --schema-before-data whole.dump >before.sql
>     pg_restore --data-only whole.dump >data.sql
>     pg_restore --schema-after-data whole.dump >after.sql
>
> followed by editing the schema pieces and then loading.

I dislike, and doubt that I'd use, this approach.  At the end of the
day, it ends up processing the same (very large amount of data) multiple
times.  We have >60G dump files sometimes, and there's no way I'm going
to dump that into a single file first if I can avoid it.  What we end up
doing today is --schema-only followed by vi'ing it and splitting it up
by hand, etc, then doing a seperate --data-only dump.

> One reason
> is that this gives you a consistent dump, whereas three successive
> pg_dump runs could never guarantee any such thing.

While this is technically true, in most cases people have control over
the schema bits and would likely be able to ensure that the schema
doesn't change during the time.  At that point it's only the data, which
is still done in a transactional way.

> Another reason is that you may well not know when you prepare the
> dump that you will need split output, because the requirement to edit
> the dump is likely to be realized only when you go to load it.

This is a good point.  My gut reaction is that, at least in my usage, it
would be more about "if it's larger than a gig, I might as well split it
out, just in case I need to touch something".  Honestly, it's rare that
I don't have to make *some* change.  Often that's the point of dumping
it out.

    Thanks,

        Stephen

Attachment

pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_dump additional options for performance
Next
From: Simon Riggs
Date:
Subject: Re: pg_dump additional options for performance