Re: [HACKERS] libpqrcv_PQexec() seems to violate latch protocol - Mailing list pgsql-hackers

From Petr Jelinek
Subject Re: [HACKERS] libpqrcv_PQexec() seems to violate latch protocol
Date
Msg-id cd0cd76d-8057-529c-831b-903b1474435b@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] libpqrcv_PQexec() seems to violate latch protocol  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On 06/06/17 23:42, Andres Freund wrote:
> On 2017-06-06 23:24:50 +0200, Petr Jelinek wrote:
>> On 06/06/17 23:17, Andres Freund wrote:
>>> Right.  I found a couple more instance of similarly iffy, although not
>>> quite as broken, patterns in launcher.c.  It's easy to get this wrong,
>>> but it's a lot easy if you do it differently everywhere you use a
>>> latch.  It's not good if code in the same file, by the same author(s),
>>> has different ways of using latches.
>>
>> Huh? I see same pattern everywhere in launcher.c, what am I missing?
> 
> WaitForReplicationWorkerAttach:
> while (...)
>     CHECK_FOR_INTERRUPTS();
>     /* other stuff including returns */
>     WaitLatch()
>     WL_POSTMASTER_DEATH
>     ResetLatch()
> 
> logicalrep_worker_stop loop 1:
> while (...)
>     /* other stuff */
>     CHECK_FOR_INTERRUPTS()
>     WaitLatch()
>     POSTMASTER_DEATH
>     ResetLatch()
>     /* other stuff including returns */
> logicalrep_worker_stop loop 1:
> while (...)
>     /* other stuff including returns */
>     CHECK_FOR_INTERRUPTS();
>     WaitLatch()
>     WL_POSTMASTER_DEATH
>     ResetLatch()
> 
> ApplyLauncherMain:
> while (!got_SIGTERM)
>     /* lots other stuff */
>     WaitLatch()
>     WL_POSTMASTER_DEATH
>     /* some other stuff */
>     ResetLatch()
> (note no CFI)
> 
> they're not hugely different, but subtely there are differences.
> Sometimes you're guaranteed to check for interrupts after resetting the
> latch, in other cases not. Sometimes expensive-ish things happen before
> a CFI...
> 

Ah that's because signals in launcher are broken, see
https://www.postgresql.org/message-id/fe072153-babd-3b5d-8052-73527a6eb657@2ndquadrant.com
which also includes patch to fix it.

We originally had custom signal handling everywhere, then I realized it
was mistake for workers because of locking interaction but missed same
issue with launcher (the CFI in current launcher doesn't work).

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Mike Palmiotto
Date:
Subject: Re: [HACKERS] [BUGS] BUG #14682: row level security not work with partitioned table
Next
From: Peter Geoghegan
Date:
Subject: Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table