Re: Race conditions in shm_mq.c - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Race conditions in shm_mq.c
Date
Msg-id CA+TgmobzmqU2jir=ahLrCDHOWDDnbtpoHf5Tt=KEaBhF4q4g-Q@mail.gmail.com
Whole thread Raw
In response to Re: Race conditions in shm_mq.c  (Antonin Houska <ah@cybertec.at>)
List pgsql-hackers
On Thu, Aug 6, 2015 at 5:59 PM, Antonin Houska <ah@cybertec.at> wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Aug 6, 2015 at 2:38 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> > On Thu, Aug 6, 2015 at 10:10 AM, Antonin Houska <ah@cybertec.at> wrote:
>> >> During my experiments with parallel workers I sometimes saw the "master" and
>> >> worker process blocked. The master uses shm queue to send data to the worker,
>> >> both sides nowait==false. I concluded that the following happened:
>> >>
>> >> The worker process set itself as a receiver on the queue after
>> >> shm_mq_wait_internal() has completed its first check of "ptr", so this
>> >> function left sender's procLatch in reset state. But before the procLatch was
>> >> reset, the receiver still managed to read some data and set sender's procLatch
>> >> to signal the reading, and eventually called its (receiver's) WaitLatch().
>> >>
>> >> So sender has effectively missed the receiver's notification and called
>> >> WaitLatch() too (if the receiver already waits on its latch, it does not help
>> >> for sender to call shm_mq_notify_receiver(): receiver won't do anything
>> >> because there's no new data in the queue).
>> >>
>> >> Below is my patch proposal.
>> >
>> > Another good catch.  However, I would prefer to fix this without
>> > introducing a "continue" as I think that will make the control flow
>> > clearer.  Therefore, I propose the attached variant of your idea.
>>
>> Err, that doesn't work at all.  Have a look at this version instead.
>
> This makes sense to me.
>
> One advantage of "continue" was that I could apply the patch to my test code
> (containing the appropriate sleep() calls, to simulate the race conditions)
> with no conflicts and see the difference. The restructuring you do does not
> allow for such a "mechanical" testing, but it's clear enough.

OK, I've committed that and back-patched it to 9.5 and 9.4.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Raising our compiler requirements for 9.6
Next
From: Alvaro Herrera
Date:
Subject: Re: Raising our compiler requirements for 9.6