Re: Add parallelism and glibc dependent only options to reindexdb - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: Add parallelism and glibc dependent only options to reindexdb
Date
Msg-id CAOBaU_YZWa4dF+14_Q0BTepQtXgwmfJ+90CgQnS8w-7+bzAxWA@mail.gmail.com
Whole thread Raw
In response to Re: Add parallelism and glibc dependent only options to reindexdb  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Add parallelism and glibc dependent only options to reindexdb  (Julien Rouhaud <rjuju123@gmail.com>)
List pgsql-hackers
Thanks for the review.

On Tue, Jul 9, 2019 at 9:24 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Jul 08, 2019 at 11:02:14PM +0200, Julien Rouhaud wrote:
> > On Mon, Jul 8, 2019 at 9:08 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >> I'll resubmit the parallel patch using per-table only
> >> approach
> >
> > Attached.
>
> I have done a lookup of this patch set with a focus on the refactoring
> part, and the split is a bit confusing.

Yes, that wasn't a smart split  :(

> +void
> +DisconnectDatabase(ParallelSlot *slot)
> +{
> +   char        errbuf[256];
> common.c has already an API to connect to a database.  It would be
> more natural to move the disconnection part also to common.c and have
> the caller of DisconnectDatabase reset the slot connection by itself?

Ok.

> $ git grep select_loop
> scripts_parallel.c:     /* We must reconstruct the fd_set for each
> call to select_loop */
> scripts_parallel.c:     i = select_loop(maxFd, &slotset, &aborting);
> scripts_parallel.c:select_loop(int maxFd, fd_set *workerset, bool
> *aborting)
> scripts_parallel.h:extern int   select_loop(int maxFd, fd_set
> *workerset, bool *aborting);
>
> select_loop is only present in scripts_parallel.c, so it can remain
> static.

Good point.

> +       slots = (ParallelSlot *) pg_malloc(sizeof(ParallelSlot) *
> concurrentCons);
> +       init_slot(slots, conn);
> +       if (parallel)
> +       {
> +               for (i = 1; i < concurrentCons; i++)
> +               {
> +                       conn = connectDatabase(dbname, host, port,
> username, prompt_password,
> +
> progname, echo, false, true);
> +                       init_slot(slots + i, conn);
> +               }
> +       }
>
> This comes from 0002 and could be more refactored as vacuumdb does the
> same thing.  Based on 0001, init_slot() is called now in vacuumdb.c
> and initializes a set of slots while connecting to a given database.
> In short, in input we have a set of parameters and the ask to open
> connections with N slots, and the return result is an pg_malloc'd
> array of slots ready to be used.  We could just call that
> ParallelSlotInit() (if you have a better name feel free).

Given how the rest of the functions are named, I'll probably use
InitParallelSlots().

>
> +    /*
> +     * Get the connection slot to use.  If in parallel mode, here we wait
> +     * for one connection to become available if none already is. In
> +     * non-parallel mode we simply use the only slot we have, which we
> +     * know to be free.
> +     */
> +    if (parallel)
> This business also is duplicated in both reindexdb.c and vacuumdb.c.
>
> +bool
> +GetQueryResult(PGconn *conn, const char *progname)
> +{
> This also does not stick with the parallel stuff, as that's basically
> only getting a query result.  We could stick that into common.c.

This function also has a bad name, as it's discarding the result via
ProcessQueryResult.  Maybe we should rename them to GetQuerySuccess()
and ConsumeAndTrashQueryResult()?

> Patch 2 has no documentation.  The option as well as the restrictions
> in place need to be documented properly.

I forgot that I had forgotten to add documentation :( will fix this time.



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Extra quote_all_identifiers in _dumpOptions
Next
From: Arthur Zakirov
Date:
Subject: Re: Extra quote_all_identifiers in _dumpOptions