Thread: parallel pg_restore

parallel pg_restore

From

Andrew Dunstan

Date:

21 September 2008, 16:29:05

I am working on getting parallel pg_restore working. I'm currently 
getting all the scaffolding working, and hope to have a naive prototype 
posted within about a week.

The major question is how to choose the restoration order so as to 
maximize efficiency both on the server and in reading the archive. My 
thoughts are currently running something like this:
   * when an item is completed, reduce the dependency count for each     item that depends on it by 1.   * when an item
hasa dependency count of 0 it is available for     execution, and gets moved to the head of the queue.   * when a new
workerspot becomes available, if there not currently a     data load running then pick the first available data load,
 otherwise pick the first available item.
 

This would mean that loading a table would probably be immediately 
followed by creation of its indexes, including PK and UNIQUE 
constraints, thus taking possible advantage of synchronised scans, data 
in file system buffers, etc.  
Another question is what we should do if the user supplies an explicit 
order with --use-list. I'm inclined to say we should stick strictly with 
the supplied order. Or maybe that should be an option.

Thoughts and comments welcome.

cheers

andrew

Re: parallel pg_restore

From

Andrew Dunstan

Date:

21 September 2008, 20:15:35


Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> I am working on getting parallel pg_restore working. I'm currently 
>> getting all the scaffolding working, and hope to have a naive prototype 
>> posted within about a week.
>>     
>
>   
>> The major question is how to choose the restoration order so as to 
>> maximize efficiency both on the server and in reading the archive.
>>     
>
> One of the first software design principles I ever learned was to
> separate policy from mechanism.  ISTM in this first cut you ought to
> concentrate on mechanism and let the policy just be something dumb
> (but coded separately from the infrastructure).  We can refine it after
> that.
>   


Indeed, that's exactly what I'm doing. However, given that time for the 
8.4 window is short, I thought it would be sensible to get people 
thinking about what the policy might be, while I get on with the mechanism.



>   
>> Another question is what we should do if the user supplies an explicit 
>> order with --use-list. I'm inclined to say we should stick strictly with 
>> the supplied order. Or maybe that should be an option.
>>     
>
> Hmm.  I think --use-list is used more for selecting a subset of items
> to restore than for forcing a nondefault restore order.  Forcing the
> order used to be a major purpose, but that was years ago before we
> had the dependency-driven-restore-order code working.  So I'd vote that
> the default behavior is to still allow parallel restore when this option
> is used, and we should provide an orthogonal option that disables use of
> parallel restore.
>
> You'd really want the latter anyway for some cases, ie, when you don't
> want the restore trying to hog the machine.  Maybe the right form for
> the extra option is just a limit on how many connections to use.  Set it
> to one to force the exact restore order, and to other values to throttle
> how much of the machine the restore tries to eat.
>   

My intention is to have single-thread restore remain the default, at 
least for this go round, and have the user be able to choose 
--multi-thread=nn to specify the number of concurrent connections to use.

> One problem here though is that you'd need to be sure you behave sanely
> when there is a dependency chain passing through an object that's not to
> be restored.  The ordering of the rest of the chain still ought to honor
> the dependencies I think.
>
>         
>   

Right. I think we'd need to fake doing a full restore and omit actually 
restoring items not on the passed in list. That should be simple enough.

cheers

andrew

Re: parallel pg_restore

From

Tom Lane

Date:

21 September 2008, 20:29:15

Andrew Dunstan <andrew@dunslane.net> writes:
> I am working on getting parallel pg_restore working. I'm currently 
> getting all the scaffolding working, and hope to have a naive prototype 
> posted within about a week.

> The major question is how to choose the restoration order so as to 
> maximize efficiency both on the server and in reading the archive.

One of the first software design principles I ever learned was to
separate policy from mechanism.  ISTM in this first cut you ought to
concentrate on mechanism and let the policy just be something dumb
(but coded separately from the infrastructure).  We can refine it after
that.

> Another question is what we should do if the user supplies an explicit 
> order with --use-list. I'm inclined to say we should stick strictly with 
> the supplied order. Or maybe that should be an option.

Hmm.  I think --use-list is used more for selecting a subset of items
to restore than for forcing a nondefault restore order.  Forcing the
order used to be a major purpose, but that was years ago before we
had the dependency-driven-restore-order code working.  So I'd vote that
the default behavior is to still allow parallel restore when this option
is used, and we should provide an orthogonal option that disables use of
parallel restore.

You'd really want the latter anyway for some cases, ie, when you don't
want the restore trying to hog the machine.  Maybe the right form for
the extra option is just a limit on how many connections to use.  Set it
to one to force the exact restore order, and to other values to throttle
how much of the machine the restore tries to eat.

One problem here though is that you'd need to be sure you behave sanely
when there is a dependency chain passing through an object that's not to
be restored.  The ordering of the rest of the chain still ought to honor
the dependencies I think.
        regards, tom lane

Re: parallel pg_restore

From

Dimitri Fontaine

Date:

22 September 2008, 04:53:44

Le lundi 22 septembre 2008, Andrew Dunstan a écrit :
> > You'd really want the latter anyway for some cases, ie, when you don't
> > want the restore trying to hog the machine.  Maybe the right form for
> > the extra option is just a limit on how many connections to use.  Set it
> > to one to force the exact restore order, and to other values to throttle
> > how much of the machine the restore tries to eat.
>
> My intention is to have single-thread restore remain the default, at
> least for this go round, and have the user be able to choose
> --multi-thread=nn to specify the number of concurrent connections to use.

What about the make famous -j option?
      -j [jobs], --jobs[=jobs]           Specifies the number of jobs (commands) to run simultaneously.  If
there is  more than one -j option, the last one is effective.  If           the -j option is given without an argument,
make will  not  limit           the number of jobs that can run simultaneously. 

Regards,
--
dim

Re: parallel pg_restore

From

Simon Riggs

Date:

22 September 2008, 07:04:09

On Mon, 2008-09-22 at 09:53 +0200, Dimitri Fontaine wrote:

> > My intention is to have single-thread restore remain the default, at
> > least for this go round, and have the user be able to choose
> > --multi-thread=nn to specify the number of concurrent connections to use.
> 
> What about the make famous -j option?
> 
>        -j [jobs], --jobs[=jobs]
>             Specifies the number of jobs (commands) to run simultaneously.  If
>             there  is  more than one -j option, the last one is effective.  If
>             the -j option is given without an argument, make  will  not  limit
>             the number of jobs that can run simultaneously.

+1

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: parallel pg_restore

From

Simon Riggs

Date:

22 September 2008, 07:08:14

On Sun, 2008-09-21 at 18:15 -0400, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
> > I am working on getting parallel pg_restore working. I'm currently 
> > getting all the scaffolding working, and hope to have a naive prototype 
> > posted within about a week.
> 
> > The major question is how to choose the restoration order so as to 
> > maximize efficiency both on the server and in reading the archive.
> 
> One of the first software design principles I ever learned was to
> separate policy from mechanism.  ISTM in this first cut you ought to
> concentrate on mechanism and let the policy just be something dumb
> (but coded separately from the infrastructure).  We can refine it after
> that.

Agreed. We musn't make too many built in assumptions about the best way
to parallelise the restore.

For example, running all CREATE INDEX at same time may help I/O on the
scan but it may also swamp memory and force additional I/O as a result.

We might need a setting for total memory available, so pg_restore can
try not to run tasks that will exceed that across settings. Preferably
this wouldn't be just a pg_restore setting.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: parallel pg_restore

From

Andrew Dunstan

Date:

22 September 2008, 12:39:09

Simon Riggs wrote:
> On Mon, 2008-09-22 at 09:53 +0200, Dimitri Fontaine wrote:
>
>   
>>> My intention is to have single-thread restore remain the default, at
>>> least for this go round, and have the user be able to choose
>>> --multi-thread=nn to specify the number of concurrent connections to use.
>>>       
>> What about the make famous -j option?
>>
>>        -j [jobs], --jobs[=jobs]
>>             Specifies the number of jobs (commands) to run simultaneously.  If
>>             there  is  more than one -j option, the last one is effective.  If
>>             the -j option is given without an argument, make  will  not  limit
>>             the number of jobs that can run simultaneously.
>>     
>
> +1
>
>   

If that's the preferred name I have no problem. I'm not sure about the 
default argument part, though.

First, I'm not sure out getopt infrastructure actually provides for 
optional arguments, and I am not going to remove it in pg_restore to get 
around such a problem, at least now.

More importantly, I'm not convinced it's a good idea. It seems more like 
a footgun that will potentially try to launch thousands of simultaneous 
restore connections. I should have thought that optimal performance 
would be reached at some small multiple (say maybe 2?) of the number of 
CPUs on the server. You could achieve unlimited parallelism by saying 
something like --jobs=99999, but I'd rather that were done very 
explicitly instead of as the default value of the parameter.

cheers

andrew

Re: parallel pg_restore

From

Simon Riggs

Date:

22 September 2008, 13:24:41

On Mon, 2008-09-22 at 11:38 -0400, Andrew Dunstan wrote:
> 
> Simon Riggs wrote:
> > On Mon, 2008-09-22 at 09:53 +0200, Dimitri Fontaine wrote:
> >
> >   
> >>> My intention is to have single-thread restore remain the default, at
> >>> least for this go round, and have the user be able to choose
> >>> --multi-thread=nn to specify the number of concurrent connections to use.
> >>>       
> >> What about the make famous -j option?
> >>
> >>        -j [jobs], --jobs[=jobs]
> >>             Specifies the number of jobs (commands) to run simultaneously.  If
> >>             there  is  more than one -j option, the last one is effective.  If
> >>             the -j option is given without an argument, make  will  not  limit
> >>             the number of jobs that can run simultaneously.
> >>     
> >
> > +1
> >
> >   
> 
> If that's the preferred name I have no problem. I'm not sure about the 
> default argument part, though.
> 
> First, I'm not sure out getopt infrastructure actually provides for 
> optional arguments, and I am not going to remove it in pg_restore to get 
> around such a problem, at least now.
> 
> More importantly, I'm not convinced it's a good idea. It seems more like 
> a footgun that will potentially try to launch thousands of simultaneous 
> restore connections. I should have thought that optimal performance 
> would be reached at some small multiple (say maybe 2?) of the number of 
> CPUs on the server. You could achieve unlimited parallelism by saying 
> something like --jobs=99999, but I'd rather that were done very 
> explicitly instead of as the default value of the parameter.

OK, sounds best.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: parallel pg_restore

From

Joshua Drake

Date:

22 September 2008, 13:29:03

On Mon, 22 Sep 2008 17:24:28 +0100
Simon Riggs <simon@2ndQuadrant.com> wrote:

> > More importantly, I'm not convinced it's a good idea. It seems more
> > like a footgun that will potentially try to launch thousands of
> > simultaneous restore connections. I should have thought that
> > optimal performance would be reached at some small multiple (say
> > maybe 2?) of the number of CPUs on the server. You could achieve
> > unlimited parallelism by saying something like --jobs=99999, but
> > I'd rather that were done very explicitly instead of as the default
> > value of the parameter.
> 
> OK, sounds best.
> 

I will not argue vehemently here but I will say that "jobs" doesn't
seem correct. The term "workers" seems more appropriate.

Sincerely,

Joshua D. Drake

-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

Re: parallel pg_restore

From

Dimitri Fontaine

Date:

22 September 2008, 13:36:38

Le lundi 22 septembre 2008, Joshua Drake a écrit :
> I will not argue vehemently here but I will say that "jobs" doesn't
> seem correct. The term "workers" seems more appropriate.

Mmmm, it sounds like it depends on the implementation (and how all workers
will share the same serializable transaction or just be independant jobs),
but my point here is more about giving the user a name they are used to.
Like in "oh, pg_restore -j, I see, thanks".

Now, if your argument is that the make concept of job does not match the
parallel pg_restore concept of workers, I'll simply bow to your choice:
baring other "limits", English not being my natural language makes it hard
for me to follow there ;)

Regards,
--
dim

Re: parallel pg_restore

From

Simon Riggs

Date:

22 September 2008, 15:02:49

On Mon, 2008-09-22 at 09:30 -0700, Joshua Drake wrote:
> On Mon, 22 Sep 2008 17:24:28 +0100
> Simon Riggs <simon@2ndQuadrant.com> wrote:
> 
> > > More importantly, I'm not convinced it's a good idea. It seems more
> > > like a footgun that will potentially try to launch thousands of
> > > simultaneous restore connections. I should have thought that
> > > optimal performance would be reached at some small multiple (say
> > > maybe 2?) of the number of CPUs on the server. You could achieve
> > > unlimited parallelism by saying something like --jobs=99999, but
> > > I'd rather that were done very explicitly instead of as the default
> > > value of the parameter.
> > 
> > OK, sounds best.
> > 
> 
> I will not argue vehemently here but I will say that "jobs" doesn't
> seem correct. The term "workers" seems more appropriate.

Agreed, but most utilities have "j" free but not w, p, t or other
letters that might be synonyms.

j is at least used for exactly this purpose in other tools.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: parallel pg_restore

From

Andrew Dunstan

Date:

22 September 2008, 16:06:02

Simon Riggs wrote:
>> I will not argue vehemently here but I will say that "jobs" doesn't
>> seem correct. The term "workers" seems more appropriate.
>>     
>
> Agreed, but most utilities have "j" free but not w, p, t or other
> letters that might be synonyms.
>
> j is at least used for exactly this purpose in other tools.
>
>   

There are in fact very few letters available, as we've been fairly 
profligate in our use of option letters in the pg_dump suite.

j and m happen to be two of those that are available.

I honestly don't have a terribly strong opinion about what it should be 
called. I can live with jobs or multi-threads.

cheers

andrew

Re: parallel pg_restore

From

"Joshua D. Drake"

Date:

23 September 2008, 02:41:12

Andrew Dunstan wrote:

> There are in fact very few letters available, as we've been fairly 
> profligate in our use of option letters in the pg_dump suite.
> 
> j and m happen to be two of those that are available.

--max-workers

Max makes sense because the number of workers won't be consistent, a 
worker may not have a job to do. It is also consistent with 
auto_vacuum_max_workers.

Joshua D. Drake

Sincerely,

Joshua D. Drake

Re: parallel pg_restore

From

"Stephen R. van den Berg"

Date:

23 September 2008, 04:14:49

Joshua D. Drake wrote:
>Andrew Dunstan wrote:
>>There are in fact very few letters available, as we've been fairly 
>>profligate in our use of option letters in the pg_dump suite.

>>j and m happen to be two of those that are available.

>--max-workers

Perhaps, but please do not use that as justification for using -m.
That would be equally silly as abbreviating "number of workers" to -n.
-- 
Sincerely,          Stephen R. van den Berg.

Experience is something you don't get until just after you need it.

Re: parallel pg_restore

From

Simon Riggs

Date:

23 September 2008, 04:44:19

On Mon, 2008-09-22 at 15:05 -0400, Andrew Dunstan wrote:

> j and m happen to be two of those that are available.
> 
> I honestly don't have a terribly strong opinion about what it should be 
> called. I can live with jobs or multi-threads.

Perhaps we can use -j for jobs and -m for memory, so we can set memory
available across all threads with a single total value.

I can live with jobs or multi-threads also, whichever we decide. Neither
one is confusing to explain.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: parallel pg_restore

From

Joshua Drake

Date:

23 September 2008, 16:41:42

On Tue, 23 Sep 2008 09:14:33 +0200
"Stephen R. van den Berg" <srb@cuci.nl> wrote:

> Joshua D. Drake wrote:
> >Andrew Dunstan wrote:
> >>There are in fact very few letters available, as we've been fairly 
> >>profligate in our use of option letters in the pg_dump suite.
> 
> >>j and m happen to be two of those that are available.
> 
> >--max-workers
> 
> Perhaps, but please do not use that as justification for using -m.
> That would be equally silly as abbreviating "number of workers" to -n.

Actually I came up with it because it coincides with existing
terminology. Autovacuum has the concept of max_workers.

Joshua D. Drake

-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/

Re: parallel pg_restore

From

Joshua Drake

Date:

23 September 2008, 16:42:02

On Tue, 23 Sep 2008 08:44:19 +0100
Simon Riggs <simon@2ndQuadrant.com> wrote:

> 
> On Mon, 2008-09-22 at 15:05 -0400, Andrew Dunstan wrote:
> 
> > j and m happen to be two of those that are available.
> > 
> > I honestly don't have a terribly strong opinion about what it
> > should be called. I can live with jobs or multi-threads.
> 
> Perhaps we can use -j for jobs and -m for memory, so we can set memory
> available across all threads with a single total value.
> 
> I can live with jobs or multi-threads also, whichever we decide.
> Neither one is confusing to explain.
> 

Memory? Where did that come from. Andrew is that in your spec?

Joshua D. Drake


-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/

Re: parallel pg_restore

From

Simon Riggs

Date:

23 September 2008, 16:53:58

On Tue, 2008-09-23 at 12:43 -0700, Joshua Drake wrote:
> On Tue, 23 Sep 2008 08:44:19 +0100
> Simon Riggs <simon@2ndQuadrant.com> wrote:
> 
> > 
> > On Mon, 2008-09-22 at 15:05 -0400, Andrew Dunstan wrote:
> > 
> > > j and m happen to be two of those that are available.
> > > 
> > > I honestly don't have a terribly strong opinion about what it
> > > should be called. I can live with jobs or multi-threads.
> > 
> > Perhaps we can use -j for jobs and -m for memory, so we can set memory
> > available across all threads with a single total value.
> > 
> > I can live with jobs or multi-threads also, whichever we decide.
> > Neither one is confusing to explain.
> > 
> 
> Memory? Where did that come from. Andrew is that in your spec?

No, but it's in mine. As I said upthread, no point in making it more
parallel than memory allows. Different operations need more/less memory
than others, so we must think about that also. We can quickly work out
how big a table is, so we can work out how much memory it will need to
perform sorts for index builds and thus how many parallel builds can
sensibly take place.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: parallel pg_restore

From

Andrew Dunstan

Date:

23 September 2008, 17:51:10


Simon Riggs wrote:
> On Tue, 2008-09-23 at 12:43 -0700, Joshua Drake wrote:
>   
>> On Tue, 23 Sep 2008 08:44:19 +0100
>> Simon Riggs <simon@2ndQuadrant.com> wrote:
>>
>>     
>>> On Mon, 2008-09-22 at 15:05 -0400, Andrew Dunstan wrote:
>>>
>>>       
>>>> j and m happen to be two of those that are available.
>>>>
>>>> I honestly don't have a terribly strong opinion about what it
>>>> should be called. I can live with jobs or multi-threads.
>>>>         
>>> Perhaps we can use -j for jobs and -m for memory, so we can set memory
>>> available across all threads with a single total value.
>>>
>>> I can live with jobs or multi-threads also, whichever we decide.
>>> Neither one is confusing to explain.
>>>
>>>       
>> Memory? Where did that come from. Andrew is that in your spec?
>>     
>
> No, but it's in mine. As I said upthread, no point in making it more
> parallel than memory allows. Different operations need more/less memory
> than others, so we must think about that also. We can quickly work out
> how big a table is, so we can work out how much memory it will need to
> perform sorts for index builds and thus how many parallel builds can
> sensibly take place.
>
>   

If that ever happens it will certainly not be in this go round.

In fact, we have some anecdotal evidence that the point of dimishing 
returns is not reached until a fairly high degree of parallelism is used 
(Joshua's and my client has been using 24 threads, I believe).

In any case, my agenda goes something like this:
   * get it working with a basic selection algorithm on Unix (nearly     done - keep your eyes open for a patch soon)
*start testing   * get it working on Windows   * improve the selection algorithm   * harden code
 

If we get all that done by November we'll have done well. And we know 
that in some cases just this much can lead to reductions in restore time 
of the order of 80%.

cheers

andrew

Re: parallel pg_restore

From

Simon Riggs

Date:

23 September 2008, 18:10:25

On Tue, 2008-09-23 at 16:50 -0400, Andrew Dunstan wrote:

> If we get all that done by November we'll have done well. And we know 
> that in some cases just this much can lead to reductions in restore
> time 
> of the order of 80%.

Agreed. Go for it.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: parallel pg_restore

From

Joshua Drake

Date:

23 September 2008, 20:49:54

On Tue, 23 Sep 2008 16:50:43 -0400
Andrew Dunstan <andrew@dunslane.net> wrote:

> 
> 
> Simon Riggs wrote:
> > On Tue, 2008-09-23 at 12:43 -0700, Joshua Drake wrote:
> >   
> >> On Tue, 23 Sep 2008 08:44:19 +0100
> >> Simon Riggs <simon@2ndQuadrant.com> wrote:
> >>
> >>     
> >>> On Mon, 2008-09-22 at 15:05 -0400, Andrew Dunstan wrote:
> >>>
> >>>       
> >>>> j and m happen to be two of those that are available.
> >>>>
> >>>> I honestly don't have a terribly strong opinion about what it
> >>>> should be called. I can live with jobs or multi-threads.
> >>>>         
> >>> Perhaps we can use -j for jobs and -m for memory, so we can set
> >>> memory available across all threads with a single total value.
> >>>
> >>> I can live with jobs or multi-threads also, whichever we decide.
> >>> Neither one is confusing to explain.
> >>>
> >>>       
> >> Memory? Where did that come from. Andrew is that in your spec?
> >>     
> >
> > No, but it's in mine. As I said upthread, no point in making it more
> > parallel than memory allows. Different operations need more/less
> > memory than others, so we must think about that also. We can
> > quickly work out how big a table is, so we can work out how much
> > memory it will need to perform sorts for index builds and thus how
> > many parallel builds can sensibly take place.
> >
> >   
> 
> If that ever happens it will certainly not be in this go round.
> 
> In fact, we have some anecdotal evidence that the point of dimishing 
> returns is not reached until a fairly high degree of parallelism is
> used (Joshua's and my client has been using 24 threads, I believe).

Against 8 cores but yes.

Joshua D. Drake

-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/

Re: parallel pg_restore

From

"Joshua D. Drake"

Date:

24 September 2008, 02:18:01

Simon Riggs wrote:
> On Tue, 2008-09-23 at 16:50 -0400, Andrew Dunstan wrote:
> 
>> If we get all that done by November we'll have done well. And we know 
>> that in some cases just this much can lead to reductions in restore
>> time 
>> of the order of 80%.
> 
> Agreed. Go for it.
> 

Just as an FYI, by far the number one bottle neck on the multiple work 
restores I was doing was CPU. RAM and IO were never the problem.

Sincerely,

Joshua D. Drake

Re: parallel pg_restore

From

Simon Riggs

Date:

24 September 2008, 03:52:42

On Tue, 2008-09-23 at 22:17 -0700, Joshua D. Drake wrote:
> Simon Riggs wrote:
> > On Tue, 2008-09-23 at 16:50 -0400, Andrew Dunstan wrote:
> > 
> >> If we get all that done by November we'll have done well. And we know 
> >> that in some cases just this much can lead to reductions in restore
> >> time 
> >> of the order of 80%.
> > 
> > Agreed. Go for it.

> Just as an FYI, by far the number one bottle neck on the multiple work 
> restores I was doing was CPU. RAM and IO were never the problem.

It would be useful to see a full breakdown of those results.

There's always a bottleneck on something for any particular task and we
shouldn't presume the problem is only on CPU, for all data on all
systems. CPU parallelism is the most pressing problem, I agree, but I
think we will quickly hit problems without memory limits. But I agree
with Andrew that this will be a nice problem to have and not everything
is possible by Nov 1.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: parallel pg_restore

From

Dimitri Fontaine

Date:

24 September 2008, 04:50:54

Hi,

Le mardi 23 septembre 2008, Andrew Dunstan a écrit :
> In any case, my agenda goes something like this:
>
>     * get it working with a basic selection algorithm on Unix (nearly
>       done - keep your eyes open for a patch soon)
>     * start testing
>     * get it working on Windows
>     * improve the selection algorithm
>     * harden code

I'm not sure whether your work will feature single table restore splitting,
but if it's the case, you could consider having a look at what I've done in
pgloader. The parallel loading work there was asked for by Simon Riggs and
Greg Smith and you could test two different parallel algorithms.
The aim was to have a "simple" testbed allowing PostgreSQL hackers to choose
what to implement in pg_restore, so I still hope it'll get usefull someday :)

Regards,
--
dim

Re: parallel pg_restore

From

Andrew Dunstan

Date:

24 September 2008, 12:48:44

Dimitri Fontaine wrote:
> Hi,
>
> Le mardi 23 septembre 2008, Andrew Dunstan a écrit :
>   
>> In any case, my agenda goes something like this:
>>
>>     * get it working with a basic selection algorithm on Unix (nearly
>>       done - keep your eyes open for a patch soon)
>>     * start testing
>>     * get it working on Windows
>>     * improve the selection algorithm
>>     * harden code
>>     
>
> I'm not sure whether your work will feature single table restore splitting, 
> but if it's the case, you could consider having a look at what I've done in 
> pgloader. The parallel loading work there was asked for by Simon Riggs and 
> Greg Smith and you could test two different parallel algorithms.
> The aim was to have a "simple" testbed allowing PostgreSQL hackers to choose 
> what to implement in pg_restore, so I still hope it'll get usefull someday :)
>
>
>   

No. The proposal will perform exactly the same set of steps as 
single-threaded pg_restore, but in parallel. The individual steps won't 
be broken up.

Quite apart from anything else, parallel data loading of individual 
tables will defeat clustering, as well as making it impossible to avoid 
WAL logging of the load (which I have made provision for).

The fact that custom archives are compressed by default would in fact 
make parallel loading of individual tables' data difficult with the 
present format. We'd have to do something like expanding it on the 
client (which might not even have enough disk space) and then split it 
before loading it to the server. That's pretty yucky. Alternatively, 
each loader thread would need to start decompressing the data from the 
start and thow away data until it got to the point it wanted to start 
restoring from. Also pretty yucky.

Far better would be to provide for multiple data members in the archive 
and teach pg_dump to split large tables as it writes the archive. Then 
pg_restore would need comparatively little adjustment.

Also, of course, you can split tables yourself by partitioning them. 
That would buy you parallel data load with what I am doing now, with no 
extra work.

In any case, data loading is very far from being the only problem. One 
of my clients has long running restores where the data load takes about 
20% or so of the time - the rest is in index creation and the like. No 
amount of table splitting will make a huge difference to them, but 
parallel processing will. As against that, if your problem is in loading 
one huge table, this won't help you much. However, this is not a pattern 
I see much - most of my clients seem to have several large tables plus a 
boatload of indexes. They will benefit a lot.

cheers

andrew

Re: parallel pg_restore

From

Joshua Drake

Date:

24 September 2008, 13:03:58

On Wed, 24 Sep 2008 07:52:52 +0100
Simon Riggs <simon@2ndQuadrant.com> wrote:

> > Just as an FYI, by far the number one bottle neck on the multiple
> > work restores I was doing was CPU. RAM and IO were never the
> > problem.
> 
> It would be useful to see a full breakdown of those results.

Its in the archives. We had a ginormous discussion about it about 6
months back (-hackers).

Joshua D. Drake

-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/

Re: parallel pg_restore

From

Dimitri Fontaine

Date:

24 September 2008, 13:17:04

Le mercredi 24 septembre 2008, Andrew Dunstan a écrit :
> No. The proposal will perform exactly the same set of steps as
> single-threaded pg_restore, but in parallel. The individual steps won't
> be broken up.

Ok, good for a solid trustworthy parallelism restore. Which is exactly what we
want. Just out of curiosity, do you plan to use Postgres-R helper backends
infrastructure?

> Quite apart from anything else, parallel data loading of individual
> tables will defeat clustering, as well as making it impossible to avoid
> WAL logging of the load (which I have made provision for).

Depends whether the different workers are able to work from the same
transaction or not, I imagine. Some work has been done to allow multiple
backends to be working in the exact same transaction (Simon's snapclone and
Postgres-R helper backend infrastructure), so one of them could TRUNCATE the
table and give a go signal to workers to fill the table. In the same
transaction.
Ok, I need to wake up now... :)

> The fact that custom archives are compressed by default would in fact
> make parallel loading of individual tables' data difficult with the
> present format. We'd have to do something like expanding it on the
> client (which might not even have enough disk space) and then split it
> before loading it to the server. That's pretty yucky. Alternatively,
> each loader thread would need to start decompressing the data from the
> start and thow away data until it got to the point it wanted to start
> restoring from. Also pretty yucky.

Another alternative is the round-robin reader implemented in pgloader, where
all the archive reading is done by a single worker, which then split what it
read to any number of coworkers, filling next queue(s) while previous one(s)
are busy COPYing to the server.

> Far better would be to provide for multiple data members in the archive
> and teach pg_dump to split large tables as it writes the archive. Then
> pg_restore would need comparatively little adjustment.

Well, that's another possibility, but I tend to prefer having the parallelism
mecanics into the restore side of things. It may be only an illusion, but
this way I have far more trust into my backups.

> Also, of course, you can split tables yourself by partitioning them.
> That would buy you parallel data load with what I am doing now, with no
> extra work.

And that's excellent :)

> In any case, data loading is very far from being the only problem. One
> of my clients has long running restores where the data load takes about
> 20% or so of the time - the rest is in index creation and the like. No
> amount of table splitting will make a huge difference to them, but
> parallel processing will.

Oh yes, I'm running into this too (not on the same level but still).
Parallel seqscan should help creating indexes in parallel without having the
disks going crazy for read - write - read - write etc sequences, and posix
advices should help a lot here too.
Does the dependancy tracker in pg_restore allows to consider FK creation are
dependant on matching PK being already there?

> As against that, if your problem is in loading
> one huge table, this won't help you much. However, this is not a pattern
> I see much - most of my clients seem to have several large tables plus a
> boatload of indexes. They will benefit a lot.

The use case given by Greg Smith at the time was loading a multi terabyte
table on a raid array with a lot of spindles. It then become impossible for a
single CPU to take full profit of the available write bandwith. No idea how
common this situation is in the field, though.

Regards,
--
dim

Re: parallel pg_restore

From

Andrew Dunstan

Date:

24 September 2008, 13:56:45


Dimitri Fontaine wrote:
> Le mercredi 24 septembre 2008, Andrew Dunstan a écrit :
>   
>> No. The proposal will perform exactly the same set of steps as
>> single-threaded pg_restore, but in parallel. The individual steps won't
>> be broken up.
>>     
>
> Ok, good for a solid trustworthy parallelism restore. Which is exactly what we 
> want. Just out of curiosity, do you plan to use Postgres-R helper backends 
> infrastructure?
>   

The is purely a patch to pg_restore. No backend changes at all (and if I 
did it would not use anything that isn't in core anyway).
>   
>> Quite apart from anything else, parallel data loading of individual
>> tables will defeat clustering, as well as making it impossible to avoid
>> WAL logging of the load (which I have made provision for).
>>     
>
> Depends whether the different workers are able to work from the same 
> transaction or not, I imagine. Some work has been done to allow multiple 
> backends to be working in the exact same transaction (Simon's snapclone and 
> Postgres-R helper backend infrastructure), so one of them could TRUNCATE the 
> table and give a go signal to workers to fill the table. In the same 
> transaction.
> Ok, I need to wake up now... :)
>
>   

Again, I am not doing anything on the backend. I am following Tom's 
original suggestion of simply having pg_restore run steps in parallel, 
with no backend changes.

Also, you ignored the point about clustered data. Maybe that doesn't 
matter to some people, but it does to others. This is designed to 
provide the same result as a single threaded pg_restore. Splitting data 
will break that.

>> The fact that custom archives are compressed by default would in fact
>> make parallel loading of individual tables' data difficult with the
>> present format. We'd have to do something like expanding it on the
>> client (which might not even have enough disk space) and then split it
>> before loading it to the server. That's pretty yucky. Alternatively,
>> each loader thread would need to start decompressing the data from the
>> start and thow away data until it got to the point it wanted to start
>> restoring from. Also pretty yucky.
>>     
>
> Another alternative is the round-robin reader implemented in pgloader, where 
> all the archive reading is done by a single worker, which then split what it 
> read to any number of coworkers, filling next queue(s) while previous one(s) 
> are busy COPYing to the server.
>
>   
>> Far better would be to provide for multiple data members in the archive
>> and teach pg_dump to split large tables as it writes the archive. Then
>> pg_restore would need comparatively little adjustment.
>>     
>
> Well, that's another possibility, but I tend to prefer having the parallelism 
> mecanics into the restore side of things. It may be only an illusion, but 
> this way I have far more trust into my backups.
>   

Having pg_dump do the split would mean you get it for free, pretty much. 
Rejecting that for a solution that could well be a bottleneck at restore 
time would require lots more than just a feeling. I don't see how it 
would give you any less reason to trust your backups.
>   
>> Also, of course, you can split tables yourself by partitioning them.
>> That would buy you parallel data load with what I am doing now, with no
>> extra work.
>>     
>
> And that's excellent :)
>
>   
>> In any case, data loading is very far from being the only problem. One
>> of my clients has long running restores where the data load takes about
>> 20% or so of the time - the rest is in index creation and the like. No
>> amount of table splitting will make a huge difference to them, but
>> parallel processing will. 
>>     
>
> Oh yes, I'm running into this too (not on the same level but still).
> Parallel seqscan should help creating indexes in parallel without having the 
> disks going crazy for read - write - read - write etc sequences, and posix 
> advices should help a lot here too.
> Does the dependancy tracker in pg_restore allows to consider FK creation are 
> dependant on matching PK being already there?
>   

I believe so. If not, that's a bug and we should fix it IMNSHO.

>   
>> As against that, if your problem is in loading 
>> one huge table, this won't help you much. However, this is not a pattern
>> I see much - most of my clients seem to have several large tables plus a
>> boatload of indexes. They will benefit a lot.
>>     
>
> The use case given by Greg Smith at the time was loading a multi terabyte 
> table on a raid array with a lot of spindles. It then become impossible for a 
> single CPU to take full profit of the available write bandwith. No idea how 
> common this situation is in the field, though.
>
>
>   

I still think the multiple data members of the archive approach would be 
best here. One that allowed you to tell pg_dump to split every nn rows, 
or every nn megabytes. Quite apart from any parallelism issues, that 
could help enormously when there is a data problem as happens from time 
to time, and can get quite annoying if it's in the middle of a humungous 
data load.

cheers

andrew

Re: parallel pg_restore

From

Dimitri Fontaine

Date:

24 September 2008, 16:11:03

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Le 24 sept. 08 à 18:56, Andrew Dunstan a écrit :
> The is purely a patch to pg_restore. No backend changes at all (and
> if I did it would not use anything that isn't in core anyway).

Ok, good.
I'm eager to see what -core hackers will want to do with Postgres-R
patches, but that shouldn't be a reason to distract them, sorry...

> Also, you ignored the point about clustered data. Maybe that doesn't
> matter to some people, but it does to others. This is designed to
> provide the same result as a single threaded pg_restore. Splitting
> data will break that.

I'm not sure I understand what you mean by "clustered data" here, in
fact...

> Having pg_dump do the split would mean you get it for free, pretty
> much. Rejecting that for a solution that could well be a bottleneck
> at restore time would require lots more than just a feeling. I don't
> see how it would give you any less reason to trust your backups.

Well, when pg_restore's COPY fail, the table is not loaded and you get
an ERROR, and if you're running with the -1 option, the restore stops
here and you get a nice ROLLBACK.
With this later option, even if pg_dump did split your tables, the
ROLLBACK still happens.

Now, what happens when only one part of the data cannot be restored
but you didn't pg_restore -1. I guess you're simply left with a
partially restored table. How will you know which part contains the
error? How will you replay the restoring of this part only?

It the answer is to play with the restore catalogue, ok, if that's not
it, I'm feeling the dumps are now less trustworthy with the split
option than they were before.

Of course all this remains hypothetical as your work is not including
such a feature, which as we see is yet to be designed.

> I still think the multiple data members of the archive approach
> would be best here. One that allowed you to tell pg_dump to split
> every nn rows, or every nn megabytes. Quite apart from any
> parallelism issues, that could help enormously when there is a data
> problem as happens from time to time, and can get quite annoying if
> it's in the middle of a humungous data load.

Agreed, but it depends a lot on the ways to control the part that
failed, IMVHO. And I think we'd prefer to have a version of COPY FROM
with the capability to continue loading on failure...

Regards,
- --
dim



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkjakLQACgkQlBXRlnbh1bm4jgCg0WenIOsaHwD9GDpI6C2mhVYB
pdwAoJYesvDYByQbSxqMjIEZOR9KiVXu
=AVy3
-----END PGP SIGNATURE-----