Home > mailing lists

Re: Race conditions in shm_mq.c - Mailing list pgsql-hackers

From	Antonin Houska
Subject	Re: Race conditions in shm_mq.c
Date	August 6, 2015 21:58:51
Msg-id	13923.1438898365@localhost Whole thread Raw
In response to	Re: Race conditions in shm_mq.c (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Race conditions in shm_mq.c
List	pgsql-hackers

Tree view

Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Aug 6, 2015 at 2:38 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Thu, Aug 6, 2015 at 10:10 AM, Antonin Houska <ah@cybertec.at> wrote:
> >> During my experiments with parallel workers I sometimes saw the "master" and
> >> worker process blocked. The master uses shm queue to send data to the worker,
> >> both sides nowait==false. I concluded that the following happened:
> >>
> >> The worker process set itself as a receiver on the queue after
> >> shm_mq_wait_internal() has completed its first check of "ptr", so this
> >> function left sender's procLatch in reset state. But before the procLatch was
> >> reset, the receiver still managed to read some data and set sender's procLatch
> >> to signal the reading, and eventually called its (receiver's) WaitLatch().
> >>
> >> So sender has effectively missed the receiver's notification and called
> >> WaitLatch() too (if the receiver already waits on its latch, it does not help
> >> for sender to call shm_mq_notify_receiver(): receiver won't do anything
> >> because there's no new data in the queue).
> >>
> >> Below is my patch proposal.
> >
> > Another good catch.  However, I would prefer to fix this without
> > introducing a "continue" as I think that will make the control flow
> > clearer.  Therefore, I propose the attached variant of your idea.
>
> Err, that doesn't work at all.  Have a look at this version instead.

This makes sense to me.

One advantage of "continue" was that I could apply the patch to my test code
(containing the appropriate sleep() calls, to simulate the race conditions)
with no conflicts and see the difference. The restructuring you do does not
allow for such a "mechanical" testing, but it's clear enough.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

pgsql-hackers by date:

From: Bruce Momjian
Date: 06 August 2015, 21:54:22
Subject: Re: 9.5 release notes

From: Bruce Momjian
Date: 06 August 2015, 22:03:05
Subject: Re: 9.5 release notes

Re: Race conditions in shm_mq.c - Mailing list pgsql-hackers

Previous

Next