On Tue, 2008-02-26 at 12:27 -0500, Greg Smith wrote:
> On Tue, 26 Feb 2008, Simon Riggs wrote:
>
> > Splitting up the dump is the enabler for splitting up the load.
>
> While the pg_dump split train seems to be leaving the station, I feel
> compelled to point out that focus does nothing to help people who are
> bulk-loading data that came from somewhere else. If my data is already in
> PostgreSQL, and I'm doing a dump/load, I can usually split the data easily
> enough with existing tools to handle that right now via COPY (SELECT...)
> TO. Some tools within pg_dump would be nice, but I don't need them that
> much. It's gigantic files that came from some other DB I don't even have
> access to that I struggle with loading efficiently.
Don't write them as gigantic files... :-)
> The work Dimitri is doing is wandering in that direction and that may be
> enough. I note that something that addresses loading big files regardless
> of source could also work on PostgreSQL dumps, while a pg_dump focused
> effort helps nothing but that specific workflow. I wonder if doing too
> much work on the pg_dump path is the best use of someone's time when the
> more general case will need to be addressed one day anyway.
I take your points. I've spent as much time discussing this as coding
it, which is OK cos coding the wrong thing is also a waste of time.
If we have individual load files then we will be able to use a parallel
load utility on them, whether they come from pg_dump or otherwise.
Neither of the current pg_dump output formats would allow that.
--data-only does allow that but there's no way to dump a sensible schema
to go with that, which is what this thread is about.
Dimitri is working on a parallel loader that allows error handling. I
don't think its worth duplicating that effort by attempting to make COPY
work in parallel. Even if we did, the single client could easily become
a bottleneck. So an external utility that spawns multiple processes and
runs multiple concurrent COPY statements sounds quite good to me.
He's asked for help, if anybody can assist.
-- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com