Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Date
Msg-id CA+TgmoZTfC7WeYNfXHKXjwZGMd7zKYEbf93J+OQJrdQeRH2u+g@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
List pgsql-hackers
On Wed, Jan 17, 2018 at 7:00 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> There seems to be some yak shaving involved in getting the barrier
> abstraction to do exactly what is required, as Thomas went into at the
> time. How should that prerequisite work be structured? For example,
> should a patch be spun off for that part?
>
> I may not be the most qualified person for this job, since Thomas
> considered two alternative approaches (to making the static barrier
> abstraction forget about never-launched participants) without ever
> settling on one of them.

I had forgotten about the previous discussion.  The sketch in my
previous email supposed that we would use dynamic barriers since the
whole point, after all, is to handle the fact that we don't know how
many participants will really show up.  Thomas's idea seems to be that
the leader will initialize the barrier based on the anticipated number
of participants and then tell it to forget about the participants that
don't materialize.  Of course, that would require that the leader
somehow figure out how many participants didn't show up so that it can
deduct then from the counter in the barrier.  And how is it going to
do that?

It's true that the leader will know the value of nworkers_launched,
but as the comment in LaunchParallelWorkers() says: "The caller must
be able to tolerate ending up with fewer workers than expected, so
there is no need to throw an error here if registration fails.  It
wouldn't help much anyway, because registering the worker in no way
guarantees that it will start up and initialize successfully."  So it
seems to me that a much better plan than having the leader try to
figure out how many workers failed to launch would be to just keep a
count of how many workers did in fact launch.  The count can be stored
in shared memory, and each worker that comes along can increment it.
Then we don't have to worry about whether we accurately detect failure
to launch.  We can argue about whether it's possible to detect all
cases of failure to launch unerringly, but what's for sure is that if
a worker increments a counter in shared memory, it launched.  Now,
where should this counter be located?  There are of course multiple
possibilities, but in my sketch it goes in
some_barrier_variable->nparticipants i.e. we just use a dynamic
barrier.

So my position (at least until Thomas or Andres shows up and tells me
why I'm wrong) is that you can use the Barrier API just as it is
without any yak-shaving, just by following the sketch I set out
before.  The additional API I proposed in that sketch isn't really
required, although it might be more efficient.  But it doesn't really
matter: if that comes along later, it will be trivial to adjust the
code to take advantage of it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: David Gould
Date:
Subject: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] GnuTLS support