Re: Add parallelism and glibc dependent only options to reindexdb - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Add parallelism and glibc dependent only options to reindexdb
Date
Msg-id 20190711133417.GK4500@paquier.xyz
Whole thread Raw
In response to Re: Add parallelism and glibc dependent only options to reindexdb  (Julien Rouhaud <rjuju123@gmail.com>)
Responses Re: Add parallelism and glibc dependent only options to reindexdb  (Julien Rouhaud <rjuju123@gmail.com>)
List pgsql-hackers
On Thu, Jul 11, 2019 at 11:48:20AM +0200, Julien Rouhaud wrote:
> On Thu, Jul 11, 2019 at 6:04 AM Michael Paquier <michael@paquier.xyz> wrote:
>> Get* would be my choice, because we look at the set of parallel slots,
>> and get an idle one.
>
> That's what the former GetIdleSlot (that I renamed to get_idle_slot as
> it's not static) is doing.  ConsumeIdleSlot() actually get an idle
> slot and mark it as being used.  That's probably some leakage of
> internal implementation, but having a GetIdleParallelSlot (or
> ParallelSlotGetIdle) *and* a get_idle_slot sounds like a bad idea, and
> I don't have a better idea on how to rename get_idle_slot.  If you
> really prefer Get* and you're fine with a static get_idle_slot, that's
> fine by me.

Do we actually need get_idle_slot?  ConsumeIdleSlot is its only
caller.

>> and we'd
>> need to be careful of the same where pg_catalog is listed.  Actually,
>> your patch breaks if we do a parallel run with pg_catalog and another
>> schema, no?
>
> It can definitely cause problems if you ask for pg_catalog and other
> schema, same as if you ask a parallel reindex of some catalog tables
> (possibly with other tables).  There's a --system switch for that
> need, so I don't know if documenting the limitation to avoid some
> extra code to deal with it is a good enough solution?

vacuumdb --full still has limitations in this area and we had some
reports on the matter about this behavior being annoying.  Its
documentation also mentions that mixing catalog relations with  --full
can cause deadlocks.

Documenting it may be fine at the end, but my take is that it would be
nice to make sure that we don't have deadlocks if we can avoid them
easily.  It is also a matter of balance.  If for example the patch
gets 3 times bigger in size because of that we may have an argument
for not doing it and keep the code simple.  What do people think about
that?  I would be nice to get more opinions here.

And while scanning the code...

+ * getQuerySucess
Typo here.

- * Pump the conn till it's dry of results; return false if any are errors.
This comment could be improved on the way, like "Go through all the
connections and make sure to consume any remaining results.  If any
error is found, false is returned after processing all the parallel
slots."
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: [RFC] [PATCH] Flexible "partition pruning" hook
Next
From: Michael Paquier
Date:
Subject: Re: complier warnings from ecpg tests