Home > mailing lists

removing set_latch_on_sigusr1 - Mailing list pgsql-hackers

From	Robert Haas
Subject	removing set_latch_on_sigusr1
Date	October 8, 2015 15:41:43
Msg-id	CA+TgmoY6QSwGj1Uhu-SqXExdMmCX=Mk6S0pOFn-VSF_2EWDwCQ@mail.gmail.com Whole thread Raw
Responses	Re: removing set_latch_on_sigusr1 Re: removing set_latch_on_sigusr1
List	pgsql-hackers

Tree view

As part of the dynamic background worker mechanism, I introduced a
flag set_latch_on_sigusr1.  When set, a process sets its own latch any
time it receives SIGUSR1.  The original motivation for this was that
the postmaster sends SIGUSR1 to a process to indicate the death of a
background worker, but cannot directly set the process latch since we
want to minimize the postmaster's contact surface with the main shared
memory segment.  The reason I introduced a new flag instead of just
*always* setting the latch was because I thought somebody might
complain about the cost of setting the latch every time.  But now I
think that's exactly what we should do: forget the flag, always set
the latch.  If you get a SIGUSR1, somebody is trying to get your
attention.  At worst, setting the latch will cause whatever latch-wait
loop got interrupted to re-execute without making forward progress.
Most of the time, though, you'll do something like
CHECK_FOR_INTERRUPTS() in that loop, and you'll probably find that
you've got one.

The code we're talking about here is in procsignal_sigusr1_handler.
That function can potentially call HandleCatchupInterrupt,
HandleNotifyInterrupt, HandleParallelMessageInterrupt, or
RecoveryConflictInterrupt.   And all of those functions set the
process latch.  So the only thing we're gaining by having
set_latch_on_sigusr1 is the possibility that we might avoid setting
the latch in response to either (1) a postmaster signal related to a
background worker that we happen not to care about at the moment or
(2) a totally spurious SIGUSR1.  But neither of those events should
happen very often, so what's the point?  The cost of setting the latch
when we don't need to is typically small, but NOT setting it when we
should have done can lead to an indefinite hang.

As it happens, the TupleQueueFunnelNext function I recently committed
has such a hazard, which I failed to spot during review and testing.
If people don't like this, I can instead cause that function to set
the flag.  But every place that sets the flag has to use a
PG_TRY()/PG_CATCH() block to make sure the old value of the flag gets
restored.  I'm pretty sure that's going to burn more cycles than the
flag can ever hope to save, not to mention the risk of bugs due to
people forgetting to add necessary volatile qualifiers.  We've already
got four PG_TRY() blocks in the code to cater to this stupid flag, and
if we keep it around I'm sure we'll accumulate at least a few more.

Patch attached.  Objections?  Suggestions?  Comments?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

remove-set-latch-on-sigusr1.patch

pgsql-hackers by date:

From: Anastasia Lubennikova
Date: 08 October 2015, 15:18:48
Subject: WIP: Covering + unique indexes.

From: Andres Freund
Date: 08 October 2015, 16:05:56
Subject: Re: removing set_latch_on_sigusr1

removing set_latch_on_sigusr1 - Mailing list pgsql-hackers

Attachment

Previous

Next