Home > mailing lists

Re: pg_dump additional options for performance - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Re: pg_dump additional options for performance
Date	February 26, 2008 10:15:39
Msg-id	1204035333.4252.285.camel@ebony.site Whole thread Raw
In response to	Re: pg_dump additional options for performance (Andrew Dunstan <andrew@dunslane.net>)
Responses	Re: pg_dump additional options for performance
List	pgsql-hackers

Tree view

On Tue, 2008-02-26 at 08:28 -0500, Andrew Dunstan wrote:
> 
> Simon Riggs wrote:
> > Separate files seems much simpler...
> >
> >   
> 
> Yes, We need to stick to the KISS principle.
> 
> ISTM that we could simply invent a new archive format of "d" for directory.

...and then dump each table to a separate file?

If we dumped all the tables to separate files we would then be able to
have a COPY statement *without* inline data - we just give a relative
address to the data file (I know we can't do that currently). So, yes, I
like it.

> BTW, parallel dumping might be important, but is really much less so 
> than parallel restoring in my book.

Yeh, but unload time creates a long running transaction that is a PITA,
so the unload time matters for two reasons.

Splitting up the dump is the enabler for splitting up the load.

If we had concurrent psql *and* out of line data then we would be able
to load many tables at once. But the single psql process would then be
the bottleneck for reading data, so we really need multiple client
processes, each with its own session to the database. Moving the
bottleneck isn't solving the problem longer term. Sure, psql could spawn
lots of additional processes, but thats a lot more work than just
splitting up the files.

My feeling that trying to do everything automagically will end up in a
long project that has some restrictions in the final result.

I would very much like to KISS and just split the files up at dump time
and then allow them to be executed concurrently on load. It's not really
a problem to put the various files in a directory manually. 

Most use cases of this are when we have a few really big tables and lots
of small ones and manually coding an unload/reload is not a big issue in
the scheme of things. It will be easy to use, performant, reliable and
we can work on other backend software in the time saved by doing this.

There's no real problem with allowing split-file unloads *and* later
having a automagical solution. 

So I'm saying:

* lets go for the 3 split-file options (called whatever...)

* we might find a really cool automagical solution *later*. That would
probably look like this
- parallel dumping from spawned sessions on pg_dump
- dumps to multiple files in same directory
- parallel reload using pg_restore

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com

pgsql-hackers by date:

From: "Roberts, Jon"
Date: 26 February 2008, 10:14:11
Subject: Re: pgAgent job limit

From: Alvaro Herrera
Date: 26 February 2008, 10:16:40
Subject: Re: pgAgent job limit

Re: pg_dump additional options for performance - Mailing list pgsql-hackers

Previous

Next