Re: Add parallelism and glibc dependent only options to reindexdb - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: Add parallelism and glibc dependent only options to reindexdb
Date
Msg-id CAOBaU_bVFgyt17J4+BFXeAxJfHZQShixruEs_gdJ4A=XRvjxJQ@mail.gmail.com
Whole thread Raw
In response to Re: Add parallelism and glibc dependent only options to reindexdb  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Add parallelism and glibc dependent only options to reindexdb  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
On Thu, Jul 11, 2019 at 6:04 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Jul 10, 2019 at 09:44:14PM +0200, Julien Rouhaud wrote:
> > On Wed, Jul 10, 2019 at 4:15 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> >> Looking good!
> >
> > Thanks!
>
> Confirmed.  The last set is much easier to go through.
>
> >>  I'm not sure about the "Consume" word in ConsumeIdleSlot;
> >> maybe "Reserve"? "Obtain"? "Get"?
> >
> > Yes, Consume is maybe a little bit weird.  I wanted to point out the
> > make it clear that this function is actually removing a slot from the
> > free list, especially since there's a (historical) get_idle_slot().  I
> > like Reserve, but Obtain and Get are probably too ambiguous.
>
> The refactoring patch is getting in shape.  Now reindex_one_database()
> is the only place setting and manipulating the slots.  I am wondering
> if we should have a wrapper which disconnects all the slots (doing
> conn = NULL after the disconnectDatabase() call does not matter).

You already mentioned that in a previous mail.  I was afraid it'd be
overkill, but it'll make caller code easier, so let's do it.

> Get* would be my choice, because we look at the set of parallel slots,
> and get an idle one.

That's what the former GetIdleSlot (that I renamed to get_idle_slot as
it's not static) is doing.  ConsumeIdleSlot() actually get an idle
slot and mark it as being used.  That's probably some leakage of
internal implementation, but having a GetIdleParallelSlot (or
ParallelSlotGetIdle) *and* a get_idle_slot sounds like a bad idea, and
I don't have a better idea on how to rename get_idle_slot.  If you
really prefer Get* and you're fine with a static get_idle_slot, that's
fine by me.

>  It would be nice to have more consistency in the
> names for the routines, say:
> - ParallelSlotInit() instead of SetupParallelSlots (still my
> suggestion is not perfect either as that sounds like one single slot,
> but we have a set of these).
> - ParallelSlotGetIdle() instead of ConsumeIdleSlot().  Still that's
> more a wait-then-get behavior.
> - ParallelSlotWaitCompletion() instead of WaitForSlotsCompletion()
> - ParallelSlotDisconnect, as a wrapper for the calls to
> DisconnectDatabase().

I don't have an opinion on whether to use parallel slot as prefix or
postfix, so I'm fine with postfixing.

> +    /*
> +     * Database-wide parallel reindex requires special processing.  If
> +     * multiple jobs were asked, we have to reindex system catalogs first,
> +     * as they can't be processed in parallel.
> +     */
> +               if (process_type == REINDEX_DATABASE)
>
> In patch 0002, a parallel database REINDEX first processes the catalog
> relations in a serializable fashion, and then all the other relations
> in parallel, which is right  Could we have schema-level reindexes also
> process things in parallel with all the relations from all the schemas
> listed?  This would be profitable in particular for callers listing
> multiple schemas with an unbalanced number of tables in each

It would also be beneficial for a parallel reindex of a single schema.

> and we'd
> need to be careful of the same where pg_catalog is listed.  Actually,
> your patch breaks if we do a parallel run with pg_catalog and another
> schema, no?

It can definitely cause problems if you ask for pg_catalog and other
schema, same as if you ask a parallel reindex of some catalog tables
(possibly with other tables).  There's a --system switch for that
need, so I don't know if documenting the limitation to avoid some
extra code to deal with it is a good enough solution?



pgsql-hackers by date:

Previous
From: Adrien Nayrat
Date:
Subject: Re: Detailed questions about pg_xact_commit_timestamp
Next
From: Adrien Nayrat
Date:
Subject: Re: Feature: Add Greek language fulltext search