removing set_latch_on_sigusr1 - Mailing list pgsql-hackers

From Robert Haas
Subject removing set_latch_on_sigusr1
Date
Msg-id CA+TgmoY6QSwGj1Uhu-SqXExdMmCX=Mk6S0pOFn-VSF_2EWDwCQ@mail.gmail.com
Whole thread Raw
Responses Re: removing set_latch_on_sigusr1  (Andres Freund <andres@anarazel.de>)
Re: removing set_latch_on_sigusr1  (Haribabu Kommi <kommi.haribabu@gmail.com>)
List pgsql-hackers
As part of the dynamic background worker mechanism, I introduced a
flag set_latch_on_sigusr1.  When set, a process sets its own latch any
time it receives SIGUSR1.  The original motivation for this was that
the postmaster sends SIGUSR1 to a process to indicate the death of a
background worker, but cannot directly set the process latch since we
want to minimize the postmaster's contact surface with the main shared
memory segment.  The reason I introduced a new flag instead of just
*always* setting the latch was because I thought somebody might
complain about the cost of setting the latch every time.  But now I
think that's exactly what we should do: forget the flag, always set
the latch.  If you get a SIGUSR1, somebody is trying to get your
attention.  At worst, setting the latch will cause whatever latch-wait
loop got interrupted to re-execute without making forward progress.
Most of the time, though, you'll do something like
CHECK_FOR_INTERRUPTS() in that loop, and you'll probably find that
you've got one.

The code we're talking about here is in procsignal_sigusr1_handler.
That function can potentially call HandleCatchupInterrupt,
HandleNotifyInterrupt, HandleParallelMessageInterrupt, or
RecoveryConflictInterrupt.   And all of those functions set the
process latch.  So the only thing we're gaining by having
set_latch_on_sigusr1 is the possibility that we might avoid setting
the latch in response to either (1) a postmaster signal related to a
background worker that we happen not to care about at the moment or
(2) a totally spurious SIGUSR1.  But neither of those events should
happen very often, so what's the point?  The cost of setting the latch
when we don't need to is typically small, but NOT setting it when we
should have done can lead to an indefinite hang.

As it happens, the TupleQueueFunnelNext function I recently committed
has such a hazard, which I failed to spot during review and testing.
If people don't like this, I can instead cause that function to set
the flag.  But every place that sets the flag has to use a
PG_TRY()/PG_CATCH() block to make sure the old value of the flag gets
restored.  I'm pretty sure that's going to burn more cycles than the
flag can ever hope to save, not to mention the risk of bugs due to
people forgetting to add necessary volatile qualifiers.  We've already
got four PG_TRY() blocks in the code to cater to this stupid flag, and
if we keep it around I'm sure we'll accumulate at least a few more.

Patch attached.  Objections?  Suggestions?  Comments?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: Anastasia Lubennikova
Date:
Subject: WIP: Covering + unique indexes.
Next
From: Andres Freund
Date:
Subject: Re: removing set_latch_on_sigusr1