Re: pg_dump additional options for performance - Mailing list pgsql-patches

From Stephen Frost
Subject Re: pg_dump additional options for performance
Date
Msg-id 20080720214335.GB18846@tamriel.snowman.net
Whole thread Raw
In response to Re: pg_dump additional options for performance  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: pg_dump additional options for performance  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-patches
Simon,

* Simon Riggs (simon@2ndquadrant.com) wrote:
> On Sun, 2008-07-20 at 05:47 +0100, Simon Riggs wrote:
> > On Sat, 2008-07-19 at 23:07 -0400, Stephen Frost wrote:
[...]
> > >   - Conflicting option handling

Thanks for putting in the extra code to explicitly indicate which
conflicting options were used.

> > >   - Documentation
> > >     When writing the documentation I would stress that "pre-schema" and
> > >     "post-schema" be defined in terms of PostgreSQL objects and why they
> > >     are pre vs. post.

Perhaps this is up for some debate, but I find the documentation added
for these options to be lacking the definitions I was looking for, and
the explanation of why they are what they are.  I'm also not sure I
agree with the "Pre-Schema" and "Post-Schema" nomenclature as it doesn't
really fit with the option names or what they do.  Would you consider:

<term><option>--schema-pre-load</option></term>
<listitem>
 <para>
   Pre-Data Load - Minimum amount of the schema required before data
   loading can begin.  This consists mainly of creating the tables
   using the <command>CREATE TABLE</command>.
   This part can be requested using <option>--schema-pre-load</>.
 </para>
</listitem>

<term><option>--schema-post-load</option></term>
<listitem>
 <para>
   Post-Data Load - The rest of the schema definition, including keys,
   indexes, etc.  By putting keys and indexes after the data has been
   loaded the whole process of restoring data is much faster.  This is
   because it is faster to build indexes and check keys in bulk than
   piecemeal as the data is loaded.
   This part can be requested using <option>--schema-post-load</>.
 </para>
</listitem>

Even this doesn't cover everything though- it's too focused on tables
and data loading.  Where do functions go?  What about types?

A couple of additional points:

  - The command-line help hasn't been updated.  Clearly, that also needs
    to be done to consider the documentation aspect complete.

  - There appears to be a bit of mistakenly included additions.  The
    patch to pg_restore.sgml attempts to add in documentation for
    --superuser.  I'm guessing that was unintentional, and looks like
    just a mistaken extra copy&paste.

> > >   - Technically, the patch needs to be updated slightly since another
> > >     pg_dump-related patch was committed recently which also added
> > >     options and thus causes a conflict.

I think this might have just happened again, funny enough.  It's
something that a committer could perhaps fix, but if you're reworking
the patch anyway...

    Thanks,

        Stephen

Attachment

pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_dump lock timeout
Next
From: Simon Riggs
Date:
Subject: Re: pg_dump additional options for performance