On Wed, Jan 24, 2018 at 1:57 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Wed, Jan 24, 2018 at 5:25 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> If there were some way for the postmaster to cause reason
>> PROCSIG_PARALLEL_MESSAGE to be set in the leader process instead of
>> just notification via kill(SIGUSR1) when it fails to fork a parallel
>> worker, we'd get (1) for free in any latch/CFI loop code. But I
>> understand that we can't do that by project edict.
>
> Based on the above observation, here is a terrible idea you'll all
> hate. It is pessimistic and expensive: it thinks that every latch
> wake might be the postmaster telling us it's failed to fork() a
> parallel worker, until we've seen a sign of life on every worker's
> error queue. Untested illustration code only. This is the only way
> I've come up with to discover fork failure in any latch/CFI loop (ie
> without requiring client code to explicitly try to read either error
> or tuple queues).
The question, I suppose, is how expensive this is in the real world.
If it's actually not a cost that anybody is likely to notice, then I
think we should pursue this approach. I wouldn't put too much weight
on keeping this simple for users of the parallel infrastructure,
though, because something like Amit's WaitForParallelWorkersToAttach()
idea still seems acceptable. "Call this function before trusting the
finality of nworkers_launched" isn't too onerous a rule to have to
follow.
--
Peter Geoghegan