Re: Add parallelism and glibc dependent only options to reindexdb - Mailing list pgsql-hackers
From | Julien Rouhaud |
---|---|
Subject | Re: Add parallelism and glibc dependent only options to reindexdb |
Date | |
Msg-id | CAOBaU_bVFgyt17J4+BFXeAxJfHZQShixruEs_gdJ4A=XRvjxJQ@mail.gmail.com Whole thread Raw |
In response to | Re: Add parallelism and glibc dependent only options to reindexdb (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: Add parallelism and glibc dependent only options to reindexdb
|
List | pgsql-hackers |
On Thu, Jul 11, 2019 at 6:04 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Jul 10, 2019 at 09:44:14PM +0200, Julien Rouhaud wrote: > > On Wed, Jul 10, 2019 at 4:15 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > >> Looking good! > > > > Thanks! > > Confirmed. The last set is much easier to go through. > > >> I'm not sure about the "Consume" word in ConsumeIdleSlot; > >> maybe "Reserve"? "Obtain"? "Get"? > > > > Yes, Consume is maybe a little bit weird. I wanted to point out the > > make it clear that this function is actually removing a slot from the > > free list, especially since there's a (historical) get_idle_slot(). I > > like Reserve, but Obtain and Get are probably too ambiguous. > > The refactoring patch is getting in shape. Now reindex_one_database() > is the only place setting and manipulating the slots. I am wondering > if we should have a wrapper which disconnects all the slots (doing > conn = NULL after the disconnectDatabase() call does not matter). You already mentioned that in a previous mail. I was afraid it'd be overkill, but it'll make caller code easier, so let's do it. > Get* would be my choice, because we look at the set of parallel slots, > and get an idle one. That's what the former GetIdleSlot (that I renamed to get_idle_slot as it's not static) is doing. ConsumeIdleSlot() actually get an idle slot and mark it as being used. That's probably some leakage of internal implementation, but having a GetIdleParallelSlot (or ParallelSlotGetIdle) *and* a get_idle_slot sounds like a bad idea, and I don't have a better idea on how to rename get_idle_slot. If you really prefer Get* and you're fine with a static get_idle_slot, that's fine by me. > It would be nice to have more consistency in the > names for the routines, say: > - ParallelSlotInit() instead of SetupParallelSlots (still my > suggestion is not perfect either as that sounds like one single slot, > but we have a set of these). > - ParallelSlotGetIdle() instead of ConsumeIdleSlot(). Still that's > more a wait-then-get behavior. > - ParallelSlotWaitCompletion() instead of WaitForSlotsCompletion() > - ParallelSlotDisconnect, as a wrapper for the calls to > DisconnectDatabase(). I don't have an opinion on whether to use parallel slot as prefix or postfix, so I'm fine with postfixing. > + /* > + * Database-wide parallel reindex requires special processing. If > + * multiple jobs were asked, we have to reindex system catalogs first, > + * as they can't be processed in parallel. > + */ > + if (process_type == REINDEX_DATABASE) > > In patch 0002, a parallel database REINDEX first processes the catalog > relations in a serializable fashion, and then all the other relations > in parallel, which is right Could we have schema-level reindexes also > process things in parallel with all the relations from all the schemas > listed? This would be profitable in particular for callers listing > multiple schemas with an unbalanced number of tables in each It would also be beneficial for a parallel reindex of a single schema. > and we'd > need to be careful of the same where pg_catalog is listed. Actually, > your patch breaks if we do a parallel run with pg_catalog and another > schema, no? It can definitely cause problems if you ask for pg_catalog and other schema, same as if you ask a parallel reindex of some catalog tables (possibly with other tables). There's a --system switch for that need, so I don't know if documenting the limitation to avoid some extra code to deal with it is a good enough solution?
pgsql-hackers by date: