Re: Add parallelism and glibc dependent only options to reindexdb - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Add parallelism and glibc dependent only options to reindexdb
Date
Msg-id 20190702025507.GD1388@paquier.xyz
Whole thread Raw
In response to Re: Add parallelism and glibc dependent only options to reindexdb  (Julien Rouhaud <rjuju123@gmail.com>)
List pgsql-hackers
On Mon, Jul 01, 2019 at 06:14:20PM +0200, Julien Rouhaud wrote:
> On Mon, Jul 1, 2019 at 3:51 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> >
> > Please don't reuse a file name as generic as "parallel.c" -- it's
> > annoying when navigating source.  Maybe conn_parallel.c multiconn.c
> > connscripts.c admconnection.c ...?
>
> I could use scripts_parallel.[ch] as I've already used it in the
> #define part?

multiconn.c sounds rather good, but I have a poor ear for any kind of
naming..

>> If your server crashes or is stopped midway during the reindex, you
>> would have to start again from scratch, and it's tedious (if it's
>> possible at all) to determine which indexes were missed.  I think it
>> would be useful to have a two-phase mode: in the initial phase reindexdb
>> computes the list of indexes to be reindexed and saves them into a work
>> table somewhere.  In the second phase, it reads indexes from that table
>> and processes them, marking them as done in the work table.  If the
>> second phase crashes or is stopped, it can be restarted and consults the
>> work table.  I would keep the work table, as it provides a bit of an
>> audit trail.  It may be important to be able to run even if unable to
>> create such a work table (because of the <ironic>numerous</> users that
>> DROP DATABASE postgres).
>
> Or we could create a table locally in each database, that would fix
> this problem and probably make the code simpler?
>
> It also raises some additional concerns about data expiration.  I
> guess that someone could launch the tool by mistake, kill reindexdb,
> and run it again 2 months later while a lot of new objects have been
> added for instance.

This looks like fancy additions, still that's not the core of the
problem, no?  If you begin to play in this area you would need more
control options, basically a "continue" mode to be able to restart a
previously failed attempt, and a "reinit" mode able to restart the
operation completely from scratch, and perhaps even a "reset" mode
which cleans up any data already present.  Not really a complexity,
but this has to be maintained a database level.

>>  The --glibc-dependent
>> switch seems too ad-hoc.  Maybe "--exclude-rule=glibc"?  That way we can
>> add other rules later.  (Not "--exclude=foo" because we'll want to add
>> the possibility to ignore specific indexes by name.)
>
> That's a good point, I like the --exclude-rule switch.

Sounds kind of nice.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Add parallelism and glibc dependent only options to reindexdb
Next
From: Peter Geoghegan
Date:
Subject: Re: Code comment change