On Wed, Oct 4, 2017 at 7:32 AM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
> On 03/10/17 19:44, Tom Lane wrote:
>> I wrote:
>>>> So that's trouble waiting to happen, for sure. At the very least we
>>>> need to do a single fetch of WalRcv->latch, not two. I wonder whether
>>>> even that is sufficient, though: this coding requires an atomic fetch of
>>>> a pointer, which is something we generally don't assume to be safe.
>>
>> BTW, I had supposed that this bug was of long standing, but actually it's
>> new in v10, dating to 597a87ccc9a6fa8af7f3cf280b1e24e41807d555. Before
>> that walreceiver start/stop just changed the owner of a long-lived shared
>> latch, and there was no question of stale pointers.
>>
>> I considered reverting that decision, but the reason for it seems to have
>> been to let libpqwalreceiver.c manipulate MyProc->procLatch rather than
>> having to know about a custom latch. That's probably a sufficient reason
>> to justify some squishiness in the wakeup logic. Still, we might want to
>> revisit it if we find any other problems here.
> That's correct, and because the other users of that library don't have
> special latch it seemed feasible to standardize on the procLatch. If we
> indeed wanted to change the decision about this I think we can simply
> give latch as a parameter to libpqrcv_connect and store it in the
> WalReceiverConn struct. The connection does not need to live past the
> latch (although it currently does, but that's just a matter of
> reordering the code in WalRcvDie() a little AFAICS).
I wonder if the new ConditionVariable mechanism would be worth
considering in future cases like this, where the signalling process
doesn't know the identity of the process to be woken up (or even how
many waiting processes there are), but instead any interested waiters
block on a particular ConditionVariable that models a specific
interesting condition. In the end it's just latches anyway, but it
may be a better abstraction. On the other hand I'm not sure how waits
on a ConditionVariable would be multiplexed with IO (a distinct wait
event, or leaky abstraction where the caller relies on the fact that
it's built on MyProc->latch, something else?).
--
Thomas Munro
http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers