Re: parallel pg_restore - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: parallel pg_restore
Date
Msg-id 48D7BC0A.406@dunslane.net
Whole thread Raw
In response to Re: parallel pg_restore  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: parallel pg_restore  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers

Simon Riggs wrote:
> On Mon, 2008-09-22 at 09:53 +0200, Dimitri Fontaine wrote:
>
>   
>>> My intention is to have single-thread restore remain the default, at
>>> least for this go round, and have the user be able to choose
>>> --multi-thread=nn to specify the number of concurrent connections to use.
>>>       
>> What about the make famous -j option?
>>
>>        -j [jobs], --jobs[=jobs]
>>             Specifies the number of jobs (commands) to run simultaneously.  If
>>             there  is  more than one -j option, the last one is effective.  If
>>             the -j option is given without an argument, make  will  not  limit
>>             the number of jobs that can run simultaneously.
>>     
>
> +1
>
>   

If that's the preferred name I have no problem. I'm not sure about the 
default argument part, though.

First, I'm not sure out getopt infrastructure actually provides for 
optional arguments, and I am not going to remove it in pg_restore to get 
around such a problem, at least now.

More importantly, I'm not convinced it's a good idea. It seems more like 
a footgun that will potentially try to launch thousands of simultaneous 
restore connections. I should have thought that optimal performance 
would be reached at some small multiple (say maybe 2?) of the number of 
CPUs on the server. You could achieve unlimited parallelism by saying 
something like --jobs=99999, but I'd rather that were done very 
explicitly instead of as the default value of the parameter.

cheers

andrew


pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: WIP patch: Collation support
Next
From: Zdenek Kotala
Date:
Subject: Re: FSM patch - performance test