On Wed, Jan 24, 2018 at 5:31 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Here's a version that works, and a minimal repro test module thing.
> Without 0003 applied, it hangs.
I can confirm that this version does in fact fix the problem with
parallel CREATE INDEX hanging in the event of (simulated) worker
fork() failure. And, it seems to have at least one tiny advantage over
the other approaches I was talking about that you didn't mention,
which is that we never have to wait until the leader stops
participating as a worker before an error is raised. IOW, either the
whole parallel CREATE INDEX operation throws an error at an early
point in the CREATE INDEX, or the CREATE INDEX completely succeeds.
Obviously, the other, stated advantage is more relevant: *everyone*
automatically doesn't have to worry about nworkers_launched being
inaccurate this way, including code that gets away with this today
only due to using a tuple queue, such as nodeGather.c, but may not
always get away with it in the future.
I've run out of time to assess what you've done here in any real
depth. For now, I will say that this approach seems interesting to me.
I'll take a closer look tomorrow.
--
Peter Geoghegan