Re: parallel pg_restore - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: parallel pg_restore
Date
Msg-id 48D6CB8D.2030502@dunslane.net
Whole thread Raw
In response to parallel pg_restore  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: parallel pg_restore  (Dimitri Fontaine <dfontaine@hi-media.com>)
List pgsql-hackers

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> I am working on getting parallel pg_restore working. I'm currently 
>> getting all the scaffolding working, and hope to have a naive prototype 
>> posted within about a week.
>>     
>
>   
>> The major question is how to choose the restoration order so as to 
>> maximize efficiency both on the server and in reading the archive.
>>     
>
> One of the first software design principles I ever learned was to
> separate policy from mechanism.  ISTM in this first cut you ought to
> concentrate on mechanism and let the policy just be something dumb
> (but coded separately from the infrastructure).  We can refine it after
> that.
>   


Indeed, that's exactly what I'm doing. However, given that time for the 
8.4 window is short, I thought it would be sensible to get people 
thinking about what the policy might be, while I get on with the mechanism.



>   
>> Another question is what we should do if the user supplies an explicit 
>> order with --use-list. I'm inclined to say we should stick strictly with 
>> the supplied order. Or maybe that should be an option.
>>     
>
> Hmm.  I think --use-list is used more for selecting a subset of items
> to restore than for forcing a nondefault restore order.  Forcing the
> order used to be a major purpose, but that was years ago before we
> had the dependency-driven-restore-order code working.  So I'd vote that
> the default behavior is to still allow parallel restore when this option
> is used, and we should provide an orthogonal option that disables use of
> parallel restore.
>
> You'd really want the latter anyway for some cases, ie, when you don't
> want the restore trying to hog the machine.  Maybe the right form for
> the extra option is just a limit on how many connections to use.  Set it
> to one to force the exact restore order, and to other values to throttle
> how much of the machine the restore tries to eat.
>   

My intention is to have single-thread restore remain the default, at 
least for this go round, and have the user be able to choose 
--multi-thread=nn to specify the number of concurrent connections to use.

> One problem here though is that you'd need to be sure you behave sanely
> when there is a dependency chain passing through an object that's not to
> be restored.  The ordering of the rest of the chain still ought to honor
> the dependencies I think.
>
>         
>   

Right. I think we'd need to fake doing a full restore and omit actually 
restoring items not on the passed in list. That should be simple enough.

cheers

andrew


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Proposal: move column defaults into pg_attribute along with attacl
Next
From: Tom Lane
Date:
Subject: Re: [patch] fix dblink security hole