Re: pg_dump additional options for performance - Mailing list pgsql-patches

From Simon Riggs
Subject Re: pg_dump additional options for performance
Date
Msg-id 1216621040.19656.867.camel@ebony.2ndQuadrant
Whole thread Raw
In response to Re: pg_dump additional options for performance  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: pg_dump additional options for performance
Re: pg_dump additional options for performance
List pgsql-patches
On Sun, 2008-07-20 at 23:34 -0400, Tom Lane wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * daveg (daveg@sonic.net) wrote:
> >> One observation, indexes should be built right after the table data
> >> is loaded for each table, this way, the index build gets a hot cache
> >> for the table data instead of having to re-read it later as we do now.
>
> > That's not how pg_dump has traditionally worked, and the point of this
> > patch is to add options to easily segregate the main pieces of the
> > existing pg_dump output (main schema definition, data dump, key/index
> > building).  You suggestion brings up an interesting point that should
> > pg_dump's traditional output structure change the "--schema-post-load"
> > set of objects wouldn't be as clear to newcomers since the load and the
> > indexes would be interleaved in the regular output.

Stephen: Agreed.

> Yeah.  Also, that is pushing into an entirely different line of
> development, which is to enable multithreaded pg_restore.  The patch
> at hand is necessarily incompatible with that type of operation, and
> wouldn't be used together with it.
>
> As far as the documentation/definition aspect goes, I think it should
> just say the parts are
>     * stuff needed before you can load the data
>     * the data
>     * stuff needed after loading the data
> and not try to be any more specific than that.  There are corner cases
> that will turn any simple breakdown into a lie, and I doubt that it's
> worth trying to explain them all.  (Take a close look at the dependency
> loop breaking logic in pg_dump if you doubt this.)

Tom: Agreed.

> I hadn't realized that Simon was using "pre-schema" and "post-schema"
> to name the first and third parts.  I'd agree that this is confusing
> nomenclature: it looks like it's trying to say that the data is the
> schema, and the schema is not!  How about "pre-data and "post-data"?

OK by me. Any other takers?



I also suggested having three options
--want-pre-schema
--want-data
--want-post-schema
so we could ask for any or all parts in the one dump. --data-only and
--schema-only are negative options so don't allow this.
(I don't like those names either, just thinking about capabilities)

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


pgsql-patches by date:

Previous
From: "Pavan Deolasee"
Date:
Subject: Re: [HACKERS] Hint Bits and Write I/O
Next
From: Tom Lane
Date:
Subject: Re: pg_dump additional options for performance