Re: Weak-memory specific problem in ResetLatch/WaitLatch (follow-up analysis) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Weak-memory specific problem in ResetLatch/WaitLatch (follow-up analysis)
Date
Msg-id CA+TgmobHOisrdUk5EtqAt4ogihfOtGnso1Ekozazvterbq9vDA@mail.gmail.com
Whole thread Raw
In response to Weak-memory specific problem in ResetLatch/WaitLatch (follow-up analysis)  (Michael Tautschnig <mt@debian.org>)
Responses Re: Weak-memory specific problem in ResetLatch/WaitLatch (follow-up analysis)  (Michael Tautschnig <mt@debian.org>)
Re: Weak-memory specific problem in ResetLatch/WaitLatch (follow-up analysis)  (Michael Tautschnig <mt@debian.org>)
List pgsql-hackers
On Wed, Feb 29, 2012 at 10:18 AM, Michael Tautschnig <mt@debian.org> wrote:
> In [3] it was suggested to fix the problem by placing a barrier in ResetLatch,
> which corresponds to placing it between lines 11 and 12 in the code above.  This
> amounts to placing a barrier between the two reads (lines 7/19 and 12, i.e.,
> between WaitLatch and the if(flag[1]) ) of Worker 1.
>
> Placing a sync (i.e., the strongest Power barrier) accordingly would, however,
> still be insufficient for the second problem, as it would only fix the
> reordering of read-read pairs by Worker 1 and the store atomicity issue from
> Worker 0. But the writes on Worker 0 could still be reordered (problem number
> 2). One possible fix consists of placing a sync between the two writes on Worker
> 0, and an address dependency between the two reads on Worker 1. Clearly,
> however, these are changes that cannot any longer be hidden behind the
> ResetLatch/WaitLatch interface, but rather go in the code using these.

Well, part of my skepticism about Tom's proposal to include memory
barrier instructions in the latch primitives was the fear that
something like what you're suggesting here might be true: namely, that
it might create the illusion of safety for people using the
primitives, when reality thought and possibly additional barrier
instructions might still be needed.

However, your example is enough unlike the actual code that the
conclusion you state following the word "clearly" isn't actually clear
to me.  According to latch.h, the correct method of using a latch is
like this:
* for (;;)* {*         ResetLatch();*         if (work to do)*                 Do Stuff();*         WaitLatch();* }

Meanwhile, anyone who is creating additional work to do should add the
work to the queue and then set the latch.

So it seems to me that we could potentially fix this by inserting
barriers at the end of ResetLatch and at the beginning of SetLatch and
WaitLatch.  Then the latch has to get reset before we check whether
there's work to do; and we've got to finish checking for work before
we again try to wait for the latch.  Similarly, any work that was in
progress before SetLatch was called will be forced to be committed to
memory before SetLatch does anything else.  Adding that many barriers
might not be very good for performance but it seems OK from a
correctness point of view, unless I am missing something, which is
definitely possible.  I'd appreciate any thoughts you have on this, as
this is clearly subtle and tricky to get exactly right.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Gsoc2012 Idea --- Social Network database schema
Next
From: Robert Haas
Date:
Subject: Re: Command Triggers