Re: Add parallelism and glibc dependent only options to reindexdb - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Add parallelism and glibc dependent only options to reindexdb
Date
Msg-id 20190725100245.GA10546@paquier.xyz
Whole thread Raw
In response to Re: Add parallelism and glibc dependent only options to reindexdb  (Sergei Kornilov <sk@zsrv.org>)
Responses Re: Add parallelism and glibc dependent only options to reindexdb  (Sergei Kornilov <sk@zsrv.org>)
Re: Add parallelism and glibc dependent only options to reindexdb  (Julien Rouhaud <julien.rouhaud@free.fr>)
List pgsql-hackers
On Thu, Jul 25, 2019 at 12:12:40PM +0300, Sergei Kornilov wrote:
> Thank you, v9 code and behavior looks good for me. Builds cleanly,
> works with older releases. I'll mark Ready to Commiter.

The restriction with --jobs and --system is not documented, and that's
good to have from the start.  This actually makes the parallel job
handling with --index inconsistent as we mask parallelism in this case
by enforcing the use of one connection.  I think that we should
revisit the interactions with --index and --jobs actually, because,
assuming that users never read the documentation, they may actually be
surprised to see that something like --index idx1 .. --index idxN
--jobs=N does not lead to any improvements at all, until they find out
the reason why.  It is also much easier to have an error as starting
point because it can be lifted later one.  There is an argument that
we may actually not have this restriction at all on --index as long as
the user knows what it is doing and does not define indexes from the
same relation, still I would keep an error.

Thinking deeper about it, there is also a point of gathering first all
the relations if one associates --schemas and --tables in the same
call of reindexdb and then pass down a list of decomposed relations
which are processed in parallel.  The code as currently presented is
rather straight-forward, and I don't think that this is worth the
extra complication, but this was not mentioned until now on this
thread :)

For the non-parallel case in reindex_one_database(), I would add an
Assert on REINDEX_DATABASE and REINDEX_SYSTEM with a comment to
mention that a NULL list of objects can just be passed down only in
those two cases when the single-object list with the database name is
built.

>> I don't have a strong opinion here. If we change for >=, it'd be
>> better to also adapt vacuumdb for consistency. I didn't change it for
>> now, to stay consistent with vacuumdb.
>
> Yep, no strong opinion from me too.

My opinion tends towards consistency.  Consistency sounds good.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Sergei Kornilov
Date:
Subject: Re: Add parallelism and glibc dependent only options to reindexdb
Next
From: Sergei Kornilov
Date:
Subject: Re: Add parallelism and glibc dependent only options to reindexdb