Home > mailing lists

Re: pg_dump additional options for performance - Mailing list pgsql-hackers

From	Tom Dunstan
Subject	Re: pg_dump additional options for performance
Date	February 25, 2008 08:06:23
Msg-id	ca33c0a30802250106w5c9d2b8ey56ea358c6c950f43@mail.gmail.com Whole thread Raw
In response to	Re: pg_dump additional options for performance ("Jochem van Dieten" <jochemd@gmail.com>)
List	pgsql-hackers

Tree view

On Sun, Feb 24, 2008 at 6:52 PM, Jochem van Dieten <jochemd@gmail.com> wrote:
>  Or we could have a switch that specifies a directory and have pg_dump
>  split the dump not just in pre-schema, data and post-schema, but also
>  split the data in a file for each table. That would greatly facilitate
>  a parallel restore of the data through multiple connections.

<delurk>

I'll admit to thinking something similar while reading this thread,
mostly because having to specify multiple filenames just to do a dump
and then do them all on the way back in seemed horrible. My idea was
to stick the multiple streams into a structured container file rather
than a directory though - a zip file a la JAR/ODF leapt to mind. That
has the nice property of being a single dump file with optional built
in compression that could store all the data as separate streams and
would allow a smart restore program to do as much in parallel as makes
sense. Mucking around with directories or three different filenames or
whatever is a pain. I'll bet most users want to say "pg_dump
--dump-file=foo.zip foo", back up foo.zip as appropriate, and when
restoring saying "pg_restore --dump-file=foo.zip -j 4" or whatever and
having pg_restore do the rest. The other nice thing about using a zip
file as a container is that you can inspect it with standard tools if
you need to.

Another thought is that doing things this way would allow us to add
extra metadata to the dump in later versions without giving the user
yet another command line switch for an extra file. Or even, thinking a
bit more outside the box, allow us to store data in binary format if
that's what the user wants at some point (thinking of the output from
binary io rather than on disk representation, obviously). Exposing all
the internals of this stuff via n command line args is pretty
constraining - it would be nice if pg_dump just produced the most
efficient dump, and if we decide at a later date that that means doing
things a bit differently, then we bump the dump file version and just
do it.

Just a thought...

Cheers

Tom

pgsql-hackers by date:

From: Tom Lane
Date: 25 February 2008, 05:22:17
Subject: Re: insert ... delete ... returning ... ?

From: "Hiroshi Saito"
Date: 25 February 2008, 08:44:12
Subject: Re: OSSP can be used in the windows environment now!

Re: pg_dump additional options for performance - Mailing list pgsql-hackers

Previous

Next