Re: pg_dump additional options for performance - Mailing list pgsql-hackers

From Tom Dunstan
Subject Re: pg_dump additional options for performance
Date
Msg-id ca33c0a30802250106w5c9d2b8ey56ea358c6c950f43@mail.gmail.com
Whole thread Raw
In response to Re: pg_dump additional options for performance  ("Jochem van Dieten" <jochemd@gmail.com>)
List pgsql-hackers
On Sun, Feb 24, 2008 at 6:52 PM, Jochem van Dieten <jochemd@gmail.com> wrote:
>  Or we could have a switch that specifies a directory and have pg_dump
>  split the dump not just in pre-schema, data and post-schema, but also
>  split the data in a file for each table. That would greatly facilitate
>  a parallel restore of the data through multiple connections.

<delurk>

I'll admit to thinking something similar while reading this thread,
mostly because having to specify multiple filenames just to do a dump
and then do them all on the way back in seemed horrible. My idea was
to stick the multiple streams into a structured container file rather
than a directory though - a zip file a la JAR/ODF leapt to mind. That
has the nice property of being a single dump file with optional built
in compression that could store all the data as separate streams and
would allow a smart restore program to do as much in parallel as makes
sense. Mucking around with directories or three different filenames or
whatever is a pain. I'll bet most users want to say "pg_dump
--dump-file=foo.zip foo", back up foo.zip as appropriate, and when
restoring saying "pg_restore --dump-file=foo.zip -j 4" or whatever and
having pg_restore do the rest. The other nice thing about using a zip
file as a container is that you can inspect it with standard tools if
you need to.

Another thought is that doing things this way would allow us to add
extra metadata to the dump in later versions without giving the user
yet another command line switch for an extra file. Or even, thinking a
bit more outside the box, allow us to store data in binary format if
that's what the user wants at some point (thinking of the output from
binary io rather than on disk representation, obviously). Exposing all
the internals of this stuff via n command line args is pretty
constraining - it would be nice if pg_dump just produced the most
efficient dump, and if we decide at a later date that that means doing
things a bit differently, then we bump the dump file version and just
do it.

Just a thought...

Cheers

Tom


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: insert ... delete ... returning ... ?
Next
From: "Hiroshi Saito"
Date:
Subject: Re: OSSP can be used in the windows environment now!