Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ] - Mailing list pgsql-hackers
From | Magnus Hagander |
---|---|
Subject | Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ] |
Date | |
Msg-id | CABUevEzXwDLtR_totdjbKrHgi55GXfgpVFk_N8CqT+zJ+DW-3A@mail.gmail.com Whole thread Raw |
In response to | Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ] (Dilip kumar <dilip.kumar@huawei.com>) |
Responses |
Re: TODO : Allow parallel cores to be used by vacuumdb [
WIP ]
|
List | pgsql-hackers |
On Tue, Jul 1, 2014 at 6:25 AM, Dilip kumar <dilip.kumar@huawei.com> wrote: > On 01 July 2014 03:48, Alvaro Wrote, > >> > In particular, pgpipe is almost an exact duplicate between them, >> > except the copy in vac_parallel.c has fallen behind changes made to >> > parallel.c. (Those changes would have fixed the Windows warnings). >> I >> > think that this function (and perhaps other parts as >> > well--"exit_horribly" for example) need to refactored into a common >> > file that both files can include. I don't know where the best place >> > for that would be, though. (I haven't done this type of refactoring >> > myself.) >> >> I think commit d2c1740dc275543a46721ed254ba3623f63d2204 is apropos. >> Maybe we should move pgpipe back to src/port and have pg_dump and this >> new thing use that. I'm not sure about the rest of duplication in >> vac_parallel.c; there might be a lot in common with what >> pg_dump/parallel.c does too. Having two copies of code is frowned upon >> for good reasons. This patch introduces 1200 lines of new code in >> vac_parallel.c, ugh. > >> >> If we really require 1200 lines to get parallel vacuum working for >> vacuumdb, I would question the wisdom of this effort. To me, it seems >> better spent improving autovacuum to cover whatever it is that this >> patch is supposed to be good for --- or maybe just enable having a >> shell script that launches multiple vacuumdb instances in parallel ... > > Thanks for looking into the patch, > > I think if we use shell script for launching parallel vacuumdb, we cannot get complete control of dividing the task, > If we directly divide table b/w multiple process, it may happen some process get very big tables then it will be as goodas one process is doing operation. > > In this patch at a time we assign only one table to each process and whichever process finishes fast, we assign new table,this way all process get equal sharing of the task. I am late to this game, but the first thing to my mind was - do we really need the whole forking/threading thing on the client at all? We need it for things like pg_dump/pg_restore because they can themselvse benefit from parallelism at the client level, but for something like this, might the code become a lot simpler if we just use multiple database connections and async queries? That would also bring the benefit of less platform dependent code, less cleanup needs etc. (Oh, and for some reason at my quick review i also noticed - you added quoting of the table name, but forgot to do it for the schema name. You should probably also look at using something like quote_identifier(), that'll make things easier). -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
pgsql-hackers by date: