On Fri, Jan 26, 2018 at 6:40 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Thanks for looking into this. Yeah. I think you're right that it
> could add a bit of overhead in some cases (ie if you receive a lot of
> signals that AREN'T caused by fork failure, then you'll enter
> HandleParallelMessage() every time unnecessarily), and it does feel a
> bit kludgy. The best idea I have to fix that so far is like this: (1)
> add a member fork_failure_count to struct BackgroundWorkerArray, (2)
> in do_start_bgworker() whenever fork fails, do
> ++BackgroundWorkerData->fork_failure_count (ie before a signal is sent
> to the leader), (3) in procsignal_sigusr1_handler where we normally do
> a bunch of CheckProcSignal(PROCSIG_XXX) stuff, if
> (BackgroundWorkerData->fork_failure_count !=
> last_observed_fork_failure_count) HandleParallelMessageInterrupt().
> As far as I know, as long as fork_failure_count is (say) int32 (ie not
> prone to tearing) then no locking is required due to the barriers
> implicit in the syscalls involved there. This is still slightly more
> pessimistic than it needs to be (the failed fork may be for someone
> else's ParallelContext), but only in rare cases so it would be
> practically as good as precise PROCSIG delivery. It's just that we
> aren't allowed to deliver PROCSIGs from the postmaster. We are
> allowed to communicate through BackgroundWorkerData, and there is a
> precedent for cluster-visible event counters in there already.
I could sign on to that plan, but I don't think we should hold this
patch up for it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company