Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ] - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Date
Msg-id CABUevEzXwDLtR_totdjbKrHgi55GXfgpVFk_N8CqT+zJ+DW-3A@mail.gmail.com
Whole thread Raw
In response to Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]  (Dilip kumar <dilip.kumar@huawei.com>)
Responses Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
List pgsql-hackers
On Tue, Jul 1, 2014 at 6:25 AM, Dilip kumar <dilip.kumar@huawei.com> wrote:
> On 01 July 2014 03:48, Alvaro Wrote,
>
>> > In particular, pgpipe is almost an exact duplicate between them,
>> > except the copy in vac_parallel.c has fallen behind changes made to
>> > parallel.c.  (Those changes would have fixed the Windows warnings).
>> I
>> > think that this function (and perhaps other parts as
>> > well--"exit_horribly" for example) need to refactored into a common
>> > file that both files can include.  I don't know where the best place
>> > for that would be, though.  (I haven't done this type of refactoring
>> > myself.)
>>
>> I think commit d2c1740dc275543a46721ed254ba3623f63d2204 is apropos.
>> Maybe we should move pgpipe back to src/port and have pg_dump and this
>> new thing use that.  I'm not sure about the rest of duplication in
>> vac_parallel.c; there might be a lot in common with what
>> pg_dump/parallel.c does too.  Having two copies of code is frowned upon
>> for good reasons.  This patch introduces 1200 lines of new code in
>> vac_parallel.c, ugh.
>
>>
>> If we really require 1200 lines to get parallel vacuum working for
>> vacuumdb, I would question the wisdom of this effort.  To me, it seems
>> better spent improving autovacuum to cover whatever it is that this
>> patch is supposed to be good for --- or maybe just enable having a
>> shell script that launches multiple vacuumdb instances in parallel ...
>
> Thanks for looking into the patch,
>
> I think if we use shell script for launching parallel vacuumdb, we cannot get complete control of dividing the task,
> If we directly divide table b/w multiple process, it may happen some process get very big tables then it will be as
goodas one process is doing operation.
 
>
> In this patch at a time we assign only one table to each process and whichever process finishes fast, we assign new
table,this way all process get equal sharing of the task.
 


I am late to this game, but the first thing to my mind was - do we
really need the whole forking/threading thing on the client at all? We
need it for things like pg_dump/pg_restore because they can themselvse
benefit from parallelism at the client level, but for something like
this, might the code become a lot simpler if we just use multiple
database connections and async queries? That would also bring the
benefit of less platform dependent code, less cleanup needs etc.

(Oh, and for some reason at my quick review i also noticed - you added
quoting of the table name, but forgot to do it for the schema name.
You should probably also look at using something like
quote_identifier(), that'll make things easier).

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/



pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: SSL compression info in psql header
Next
From: Magnus Hagander
Date:
Subject: Re: Audit of logout