Re: pg_dump additional options for performance - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: pg_dump additional options for performance |
Date | |
Msg-id | 1204035333.4252.285.camel@ebony.site Whole thread Raw |
In response to | Re: pg_dump additional options for performance (Andrew Dunstan <andrew@dunslane.net>) |
Responses |
Re: pg_dump additional options for performance
(Greg Smith <gsmith@gregsmith.com>)
|
List | pgsql-hackers |
On Tue, 2008-02-26 at 08:28 -0500, Andrew Dunstan wrote: > > Simon Riggs wrote: > > Separate files seems much simpler... > > > > > > Yes, We need to stick to the KISS principle. > > ISTM that we could simply invent a new archive format of "d" for directory. ...and then dump each table to a separate file? If we dumped all the tables to separate files we would then be able to have a COPY statement *without* inline data - we just give a relative address to the data file (I know we can't do that currently). So, yes, I like it. > BTW, parallel dumping might be important, but is really much less so > than parallel restoring in my book. Yeh, but unload time creates a long running transaction that is a PITA, so the unload time matters for two reasons. Splitting up the dump is the enabler for splitting up the load. If we had concurrent psql *and* out of line data then we would be able to load many tables at once. But the single psql process would then be the bottleneck for reading data, so we really need multiple client processes, each with its own session to the database. Moving the bottleneck isn't solving the problem longer term. Sure, psql could spawn lots of additional processes, but thats a lot more work than just splitting up the files. My feeling that trying to do everything automagically will end up in a long project that has some restrictions in the final result. I would very much like to KISS and just split the files up at dump time and then allow them to be executed concurrently on load. It's not really a problem to put the various files in a directory manually. Most use cases of this are when we have a few really big tables and lots of small ones and manually coding an unload/reload is not a big issue in the scheme of things. It will be easy to use, performant, reliable and we can work on other backend software in the time saved by doing this. There's no real problem with allowing split-file unloads *and* later having a automagical solution. So I'm saying: * lets go for the 3 split-file options (called whatever...) * we might find a really cool automagical solution *later*. That would probably look like this - parallel dumping from spawned sessions on pg_dump - dumps to multiple files in same directory - parallel reload using pg_restore -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
pgsql-hackers by date: