Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ] - Mailing list pgsql-hackers

From Dilip kumar
Subject Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Date
Msg-id 4205E661176A124FAF891E0A6BA91352663439D9@szxeml509-mbs.china.huawei.com
Whole thread Raw
In response to Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]  (Magnus Hagander <magnus@hagander.net>)
Responses Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers

On 16 July 2014 12:13 Magnus Hagander Wrote,

>>Yeah, those are exactly my points. I think it would be significantly simpler to do it that way, rather than forking and threading. And also easier to make portable...

>>(and as a  optimization on Alvaros suggestion, you can of course reuse the initial connection as one of the workers as long as you got the full list of tasks from it up front, which I think you  do anyway in order to do sorting of tasks...)

Oh, I got your point, I will update my patch and send,

Now we can completely remove vac_parallel.h file and no need of refactoring also:)

Thanks & Regards,

Dilip Kumar

 

 

From: Magnus Hagander [mailto:magnus@hagander.net]
Sent: 16 July 2014 12:13
To: Alvaro Herrera
Cc: Dilip kumar; Jan Lentfer; Tom Lane; PostgreSQL-development; Sawada Masahiko; Euler Taveira
Subject: Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

 


On Jul 16, 2014 7:05 AM, "Alvaro Herrera" <alvherre@2ndquadrant.com> wrote:
>
> Tom Lane wrote:
> > Dilip kumar <dilip.kumar@huawei.com> writes:
> > > On 15 July 2014 19:01, Magnus Hagander Wrote,
> > >> I am late to this game, but the first thing to my mind was - do we
> > >> really need the whole forking/threading thing on the client at all?
> >
> > > Thanks for the review, I understand you point, but I think if we have do this directly by independent connection,
> > > It's difficult to equally divide the jobs b/w multiple independent connections.
> >
> > That argument seems like complete nonsense.  You're confusing work
> > allocation strategy with the implementation technology for the multiple
> > working threads.  I see no reason why a good allocation strategy couldn't
> > work with either approach; indeed, I think it would likely be easier to
> > do some things *without* client-side physical parallelism, because that
> > makes it much simpler to handle feedback between the results of different
> > operational threads.
>
> So you would have one initial connection, which generates a task list;
> then open N libpq connections.  Launch one vacuum on each, and then
> sleep on select() on the three sockets.  Whenever one returns
> read-ready, the vacuuming is done and we send another item from the task
> list.  Repeat until tasklist is empty.  No need to fork anything.
>

Yeah, those are exactly my points. I think it would be significantly simpler to do it that way, rather than forking and threading. And also easier to make portable...

(and as a  optimization on Alvaros suggestion, you can of course reuse the initial connection as one of the workers as long as you got the full list of tasks from it up front, which I think you  do anyway in order to do sorting of tasks...)

/Magnus

pgsql-hackers by date:

Previous
From: "MauMau"
Date:
Subject: Re: [bug fix] pg_ctl always uses the same event source
Next
From: Michael Paquier
Date:
Subject: Improvement of versioning on Windows, take two