Re: sinval synchronization considered harmful - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: sinval synchronization considered harmful |
Date | |
Msg-id | 20110726201647.GA17857@tornado.leadboat.com Whole thread Raw |
In response to | Re: sinval synchronization considered harmful (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: sinval synchronization considered harmful
|
List | pgsql-hackers |
On Tue, Jul 26, 2011 at 03:40:32PM -0400, Tom Lane wrote: > Noah Misch <noah@2ndQuadrant.com> writes: > > On Thu, Jul 21, 2011 at 11:37:27PM -0400, Robert Haas wrote: > >> I think I have a simpler idea, though: > >> before acquiring any locks, just have SIGetDataEntries() do this: > >> > >> + if (stateP->nextMsgNum == segP->maxMsgNum && !stateP->resetState) > >> + return 0; > >> > >> Patch (with comment explaining why I think this is OK) attached. If > >> the message numbers happen to be equal only because the counter has > >> wrapped, then stateP->resetState will be true, so we'll still realize > >> we need to do some work. > > > This is attractive, and I don't see any problems with it. (In theory, you could > > hit a case where the load of resetState gives an ancient "false" just as the > > counters wrap to match. Given that the wrap interval is 1000000x as long as the > > reset interval, I'm not worried about problems on actual silicon.) > > > +1 for doing this and moving on. > > After some further reflection I believe this patch actually is pretty > safe, although Noah's explanation of why seems a bit confused. The key > points are that (1) each of the reads is atomic, and (2) we should not > be able to see a value that is older than our last acquisition/release > of a shared memory lock. These being granted, we will make a decision > that is correct, or at least was correct as of some time after that last > lock action. As stated in the patch comments, we are not required to > promptly assimilate sinval actions on objects we don't hold any lock on, > so this seems sufficient. In essence, we are relying on an assumed > prior visit to the lock manager to provide memory access synchronization > for the sinval no-work-to-do test. > > The case Noah speculates about above amounts to supposing that this > reasoning fails to hold for the resetState value, and I don't see why > that would be true. Even if the above scenario did manage to happen, > it would not mean that we missed the reset, just that we hadn't noticed > it *yet*. And by hypothesis, none of the as-yet-not-seen catalog > updates are a problem for us. Here's the way it can fail: 1. Backend enters SIGetDataEntries() with main memory bearing stateP->resetState = false, stateP->nextMsgNum = 500, segP->maxMsgNum = 505. The CPU has those latest stateP values in cache, but segP->maxMsgNum is *not* in cache. 2. Backend stalls for <long time>. Meanwhile, other backends issue MSGNUMWRAPAROUND - 5 invalidation messages. Main memory bears stateP->resetState = true, stateP->nextMsgNum = 500 - MSGNUMWRAPAROUND, segP->maxMsgNum = 500. 3. Backend wakes up, uses its cached stateP values and reads segP->maxMsgNum = 500 from main memory. The patch's test finds no need to reset or process invalidation messages. That's the theoretical risk I wished to illustrate. Though this appears possible on an abstract x86_64 system, I think it's unrealistic to suppose that a dirty cache line could persist *throughout* the issuance of more than 10^9 invalidation messages on a concrete implementation. A way to think about the problem is that our read of segP->maxMsgNum is too new. If we had used a snapshot of values as of the most recent lock acquisition/release, there would be no problem. It's the mix of a new-enough stateP with an all-too-new segP that yields the anomaly. -- Noah Misch http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: