Re: [HACKERS] parallel.c oblivion of worker-startup failures - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [HACKERS] parallel.c oblivion of worker-startup failures
Date
Msg-id CAA4eK1L0QoS0VSG=guyFiTM0TgoSAoLewSkj0XP-D9tGJ-nDLA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] parallel.c oblivion of worker-startup failures  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] parallel.c oblivion of worker-startup failures  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Thu, Dec 21, 2017 at 6:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Dec 21, 2017 at 6:46 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> What if we don't allow to reuse such slots till the backend/session
>> that has registered it performs unregister?  Currently, we don't seem
>> to have an API corresponding to Register*BackgroundWorker() which can
>> be used to unregister, but maybe we can provide such an API.
>
> Well, then we could have slots pinned down for a long time, if the
> backend never gets around to calling unregister.  Furthermore, that's
> absolutely not back-patchable, because we can't put a requirement like
> that on code running in the back branches.  Also, what if the code
> path that would have done the unregister eventually errors out?  We'd
> need TRY/CATCH blocks everywhere that registers the worker.  In short,
> this seems terrible for multiple reasons.
>
>>> Furthermore, it doesn't help in the case where the worker starts and
>>> immediately exits without attaching to the DSM.
>>
>> Yeah, but can't we detect that case?  After the worker exits, we can
>> know its exit status as is passed to CleanupBackgroundWorker, we can
>> use that to mark the worker state as  BGWH_ERROR_STOPPED (or something
>> like BGWH_IMMEDIATE_STOPPED).
>>
>> I think above way sounds invasive, but it seems to me that it can be
>> used by other users of background workers as well.
>
> The exit status doesn't tell us whether the worker attached to the DSM.
>
> I'm relatively puzzled as to why you're rejecting a relatively
> low-impact way of handling a corner case that was missed in the
> original design in favor of major architectural changes.
>

I am not against using the way specific to parallel context layer as
described by you above.   However, I was trying to see if there is
some general purpose solution as the low-impact way is not very
straightforward.  I think you can go ahead with the way you have
described to fix the hole I was pointing to and I can review it or I
can also give it a try if you want to.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Beena Emerson
Date:
Subject: Re: [HACKERS] Runtime Partition Pruning
Next
From: Alvaro Herrera
Date:
Subject: Re: Reproducible builds: genbki.pl and Gen_fmgrtab.pl