Re: Add parallelism and glibc dependent only options to reindexdb - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: Add parallelism and glibc dependent only options to reindexdb
Date
Msg-id CAOBaU_YDTym=Er78KCrf4HK_DKC3OgeSwiCqdBdP=h1nWD1=7Q@mail.gmail.com
Whole thread Raw
In response to Re: Add parallelism and glibc dependent only options to reindexdb  (Sergei Kornilov <sk@zsrv.org>)
Responses Re: Add parallelism and glibc dependent only options to reindexdb  (Sergei Kornilov <sk@zsrv.org>)
List pgsql-hackers
Thanks for the review!

On Thu, Jul 25, 2019 at 10:17 AM Sergei Kornilov <sk@zsrv.org> wrote:
>
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:       tested, failed
> Spec compliant:           not tested
> Documentation:            tested, passed
>
> Hi
>
> I did some review and have few notes about behavior.
>
> reindex database does not work with concurrently option:
>
> > ./inst/bin/reindexdb --echo -d postgres -j 8 --concurrently
> > SELECT pg_catalog.set_config('search_path', '', false);
> > REINDEX SYSTEM CONCURRENTLY postgres;
> > reindexdb: error: reindexing of system catalogs on database "postgres" failed: ERROR:  cannot reindex system
catalogsconcurrently
 
>
> I think we need print message and skip system catalogs for concurrently reindex.
> Or we can disallow concurrently database reindex with multiple jobs. I prefer first option.

Good point.  I agree with 1st option, as that's already what would
happen without the --jobs switch:

$ reindexdb -d postgres --concurrently
WARNING:  cannot reindex system catalogs concurrently, skipping all

(although this is emitted by the backend)
I modified the client code to behave the same and added a regression test.

> > +                     reindex_one_database(dbname, REINDEX_SCHEMA, &schemas, host,
> > +                                                              port, username, prompt_password, progname,
> > +                                                              echo, verbose, concurrently,
> > +                                                              Min(concurrentCons, nsp_count));
>
> Should be just concurrentCons instead of Min(concurrentCons, nsp_count)

Indeed, that changed with v8 and I forgot to update it, fixed.

> reindex_one_database for REINDEX_SCHEMA will build tables list and then processing by available workers. So:
> -j 8 -S public -S public -S public -S poblic -S public -S public - will work with 6 jobs (and without multiple
processingfor same table)
 
> -j 8 -S public - will have only one worker regardless tables count
>
> > if (concurrentCons > FD_SETSIZE - 1)
>
> "if (concurrentCons >= FD_SETSIZE)" would not cleaner? Well, pgbench uses >= condition, vacuumdb uses > FD_SETSIZE -
1.No more FD_SETSIZE in conditions =)
 

I don't have a strong opinion here.  If we change for >=, it'd be
better to also adapt vacuumdb for consistency.  I didn't change it for
now, to stay consistent with vacuumdb.

Attachment

pgsql-hackers by date:

Previous
From: Jehan-Guillaume de Rorthais
Date:
Subject: Re: pg_receivewal documentation
Next
From: Pavel Stehule
Date:
Subject: Re: dropdb --force