Re: [HACKERS] parallel.c oblivion of worker-startup failures - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: [HACKERS] parallel.c oblivion of worker-startup failures
Date
Msg-id CAH2-Wz=TnL4xP4cuEqQy-bCzQJoyKR3Ec3dsMbHPvagroY=isw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] parallel.c oblivion of worker-startup failures  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] parallel.c oblivion of worker-startup failures
List pgsql-hackers
On Wed, Jan 24, 2018 at 12:05 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> In Thomas's test case, he's using 4 workers, and if even one of those
> manages to start, then it'll probably do so after any fork failures
> have already occurred, and the error will be noticed when that process
> sends its first message to the leader through the error queue, because
> it'll send PROCSIG_PARALLEL_MESSAGE via all the queues.  If all of the
> workers fail to start, then that doesn't help.  But it still manages
> to detect the failure in that case because it reaches
> WaitForParallelWorkersToFinish, which we just patched.
>
> But how does that happen, anyway?  Shouldn't it get stuck waiting for
> the tuple queues to which nobody's attached yet?  The reason it
> doesn't is because
> ExecParallelCreateReaders() calls shm_mq_set_handle() for each queue,
> which causes the tuple queues to be regarded as detached if the worker
> fails to start.  A detached tuple queue, in general, is not an error
> condition: it just means that worker has no more tuples.

This explains all the confusion. Amit told me that using a tuple queue
made all the difference here. Even till, it seemed surprising that
we'd rely on that from such a long distance from within nodeGather.c.

> I guess that works, but it seems more like blind luck than good
> design.  Parallel CREATE INDEX fails to be as "lucky" as Gather
> because there's nothing in parallel CREATE INDEX that lets it skip
> waiting for a worker which doesn't start -- and in general it seems
> unacceptable to impose a coding requirement that all future parallel
> operations must fail to notice the difference between a worker that
> completed successfully and one that never ran at all.

+1.

> If we made the Gather node wait only for workers that actually reached
> the Gather -- either by using a Barrier or by some other technique --
> then this would be a lot less fragile.  And the same kind of technique
> would work for parallel CREATE INDEX.

The use of a barrier has problems of its own [1], though, of which one
is unique to the parallel_leader_participation=off case. I thought
that you yourself agreed with this [2] -- do you?

Another argument against using a barrier in this way is that it seems
like way too much mechanism to solve a simple problem. Why should a
client of parallel.h not be able to rely on nworkers_launched (perhaps
only after "verifying it can be trusted")? That seem like a pretty
reasonable requirement for clients to have for any framework for
parallel imperative programming.

I think that we should implement "some other technique", instead of
using a barrier. As I've said, Amit's WaitForParallelWorkersToAttach()
idea seems like a promising "other technique".

[1] https://www.postgresql.org/message-id/CAA4eK1+a0OF4M231vBgPr_0Ygg_BNmRGZLiB7WQDE-FYBSyrGg@mail.gmail.com
[2] https://www.postgresql.org/message-id/CA+TgmoaF8UA8v8hP=CcoqUc50pucPC8ABj-_yyC++yGggjWFsw@mail.gmail.com
-- 
Peter Geoghegan


pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump
Next
From: Tom Lane
Date:
Subject: Re: Converting plpgsql to use DTYPE_REC for named composite types