On Mon, Jul 25, 2011 at 6:24 PM, Noah Misch <noah@2ndquadrant.com> wrote:
> On Fri, Jul 22, 2011 at 03:54:03PM -0400, Robert Haas wrote:
>> On Fri, Jul 22, 2011 at 3:28 PM, Noah Misch <noah@2ndquadrant.com> wrote:
>> > This is attractive, and I don't see any problems with it. (In theory, you could
>> > hit a case where the load of resetState gives an ancient "false" just as the
>> > counters wrap to match. Given that the wrap interval is 1000000x as long as the
>> > reset interval, I'm not worried about problems on actual silicon.)
>>
>> It's actually 262,144 times as long - see MSGNUMWRAPAROUND.
>
> Ah, so it is.
>
>> It would be pretty easy to eliminate even the theoretical possibility
>> of a race by getting rid of resetState altogether and using nextMsgNum
>> = -1 to mean that. Maybe I should go ahead and do that.
>
> Seems like a nice simplification.
On further reflection, I don't see that this helps: it just moves the
problem around. With resetState as a separate variable, nextMsgNum is
never changed by anyone other than the owner, so we can never have a
stale load. But if we overload nextMsgNum to also indicate whether
our state has been reset, then there's a race between when we load
nextMsgNum and when we load maxMsgNum (instead of code I posted
previously, which has a race between when we load resetState and when
we load maxMsgNum). Now, as you say, it seems really, really
difficult to hit that in practice, but I don't see a way of getting
rid of the theoretical possibility without either (1) a spinlock or
(2) a fence. (Of course, on x86, the fence could be optimized down to
a compiler barrier.) I guess the question is "should we worry about
that?".
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company