Re: pg_dump additional options for performance - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: pg_dump additional options for performance
Date
Msg-id 1204099767.4252.466.camel@ebony.site
Whole thread Raw
In response to Re: pg_dump additional options for performance  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-hackers
On Tue, 2008-02-26 at 20:14 +0000, Gregory Stark wrote:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
> 
> > Simon Riggs <simon@2ndquadrant.com> writes:
> >> I've not been advocating improving pg_restore, which is where the -Fc
> >> tricks come in.
> >> ...
> >> I see you thought I meant pg_restore. I don't thinking extending
> >> pg_restore in that way is of sufficiently generic use to make it
> >> worthwhile. Extending psql would be worth it, since not all psql scripts
> >> come from pg_dump.
> >
> > OK, the reason I didn't grasp what you are proposing is that it's insane.
> >
> > We can easily, and backwards-compatibly, improve pg_restore to do
> > concurrent restores.  Trying to make psql do something like this will
> > require a complete rewrite, and there is no prospect that it will work
> > for any input that didn't come from (an updated version of) pg_dump
> > anyway.  
> 
> I didn't read everything in the thread previously so I'm not sure if this is
> what Simon had in mind. But I think one thing that could be done in parallel
> even in psql scripts is index builds. That doesn't help speed up COPY but it
> does speed up a case where we currently are limited by only occupying a single
> cpu. And I would expect it to play well With synchronized scans too.
> 
> The "complete rewrite" in this case would be the "concurrent psql" patch I
> submitted a while back. I think it's a bit of a mess right now because I was
> trying to chase down some bugs with sigint handling so I've been thinking of
> rewriting it.
> 
> I think this is a low-hanging fruit which would help a lot of users. The
> ability to load multiple COPY dumps would be the other piece of the puzzle but
> personally I'm not sure how to tackle that.

The current design for concurrent psql includes commands that say which
session a command should be run on. Switches between sessions are
explicit. That is good, but prevents us from easily saying "use N
sessions to make it go faster" because we already hardwired the commands
to the sessions.

If we able to express dependency info then we would be able to alter the
amount of parallelism. That would require us to 
* identify each command
* identify its dependents

possibly like this

<psql id="5" absolute-dependents="3,4">
Some SQL...
</psql>

the current default behaviour is this

<psql id="5" relative-dependents="-1"> ...

That's a leap ahead of concurrent psql.

I'd rather we had concurrent psql as it is now than attempt to leap
ahead too far, but the dependent definition approach seems likely to
yield benefits in the long run.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com 



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Required make version
Next
From: Dimitri Fontaine
Date:
Subject: Re: An idea for parallelizing COPY within one backend