Re: Add parallelism and glibc dependent only options to reindexdb - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Add parallelism and glibc dependent only options to reindexdb
Date
Msg-id 20190709072446.GC17321@paquier.xyz
Whole thread Raw
In response to Re: Add parallelism and glibc dependent only options to reindexdb  (Julien Rouhaud <rjuju123@gmail.com>)
Responses Re: Add parallelism and glibc dependent only options to reindexdb
List pgsql-hackers
On Mon, Jul 08, 2019 at 11:02:14PM +0200, Julien Rouhaud wrote:
> On Mon, Jul 8, 2019 at 9:08 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>> I'll resubmit the parallel patch using per-table only
>> approach
>
> Attached.

I have done a lookup of this patch set with a focus on the refactoring
part, and the split is a bit confusing.

+void
+DisconnectDatabase(ParallelSlot *slot)
+{
+   char        errbuf[256];
common.c has already an API to connect to a database.  It would be
more natural to move the disconnection part also to common.c and have
the caller of DisconnectDatabase reset the slot connection by itself?
disconnectDatabase() (lower case for the first character) would make
the set more consistent.  We could also have a wrapper like say
DiscardSlot() which does this work, but that seems like an overkill
for a single slot if one API could do the cleanup of the full set.

$ git grep select_loop
scripts_parallel.c:     /* We must reconstruct the fd_set for each
call to select_loop */
scripts_parallel.c:     i = select_loop(maxFd, &slotset, &aborting);
scripts_parallel.c:select_loop(int maxFd, fd_set *workerset, bool
*aborting)
scripts_parallel.h:extern int   select_loop(int maxFd, fd_set
*workerset, bool *aborting);

select_loop is only present in scripts_parallel.c, so it can remain
static.

+       slots = (ParallelSlot *) pg_malloc(sizeof(ParallelSlot) *
concurrentCons);
+       init_slot(slots, conn);
+       if (parallel)
+       {
+               for (i = 1; i < concurrentCons; i++)
+               {
+                       conn = connectDatabase(dbname, host, port,
username, prompt_password,
+
progname, echo, false, true);
+                       init_slot(slots + i, conn);
+               }
+       }

This comes from 0002 and could be more refactored as vacuumdb does the
same thing.  Based on 0001, init_slot() is called now in vacuumdb.c
and initializes a set of slots while connecting to a given database.
In short, in input we have a set of parameters and the ask to open
connections with N slots, and the return result is an pg_malloc'd
array of slots ready to be used.  We could just call that
ParallelSlotInit() (if you have a better name feel free).

+    /*
+     * Get the connection slot to use.  If in parallel mode, here we wait
+     * for one connection to become available if none already is. In
+     * non-parallel mode we simply use the only slot we have, which we
+     * know to be free.
+     */
+    if (parallel)
This business also is duplicated in both reindexdb.c and vacuumdb.c.

+bool
+GetQueryResult(PGconn *conn, const char *progname)
+{
This also does not stick with the parallel stuff, as that's basically
only getting a query result.  We could stick that into common.c.

Patch 2 has no documentation.  The option as well as the restrictions
in place need to be documented properly.

Here is a small idea about the set of routines we could have for the
parallel stuff, with only three of them needed to work on the parallel
slots and get free connections:
- Initialization of the full slot set.
- Cleanup and disconnection of the slots.
- Fetch an idle connection and wait for one until available.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Next
From: Michael Paquier
Date:
Subject: Re: Extra quote_all_identifiers in _dumpOptions