Thread: Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)
Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)
From
Tom Lane
Date:
I wrote: > OK, I think I see it. The problem is that the code in slru.c is careful > about not modifying state when it doesn't hold the proper lock, but not > so careful about not *inspecting* state without the proper lock. > ... > I'm still thinking about how to make a real fix without introducing > another lock/unlock cycle --- we can do that if we have to, but I think > maybe it's possible to fix it without. Attached is a proposed patch to fix up slru.c so that it's not playing any games by either reading or writing shared state without holding the ControlLock for the SLRU set. The main problem with the existing code, as I now see it, was a poor choice of representation of page state: we had states CLEAN, DIRTY, and WRITE_IN_PROGRESS, which meant that re-dirtying a page while a write was in progress required setting the state back to DIRTY, which hid the fact that indeed a write was in progress. So the I/O code was designed to cope with not knowing whether another write was already in progress. We can make it a whole lot cleaner by changing the state representation so that we can tell the difference --- then, we can know before releasing the ControlLock whether we need to write the page or just wait for someone else to finish writing it. And that means we can do all the state-manipulation before releasing the lock. I could have just added an additional state WRITE_IN_PROGRESS_REDIRTIED, or some such, but it seemed notationally cleaner to create a separate "page_dirty" boolean, and reduce the number of states to four (empty, read-in-progress, valid, write-in-progress). This lets outside code such as clog.c just set "page_dirty = true" rather than doing a complex bit of logic to change the state value properly. The patch is a bit bulky because of the representation change, but the key changes are localized in SimpleLruReadPage and SimpleLruWritePage. I think this code is a whole lot cleaner than it was before, but it's a bit of a large change to be making so late in the 8.1 cycle (not to mention that we really ought to back-patch similar changes all the way back, because the bug exists as far back as 7.2). I am going to take another look to see if there is a less invasive change that will fix the problem at some performance cost; if so, we might want to do it that way in 8.1 and back branches. Any comments on the patch, or thoughts on how to proceed? regards, tom lane
Attachment
Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags
From
Bruce Momjian
Date:
Good analysis. I guess the question is what patch would we put into a subrelease? If you go for a new state code, rather than a separate boolean, does it reduce the size of the patch? --------------------------------------------------------------------------- Tom Lane wrote: > I wrote: > > OK, I think I see it. The problem is that the code in slru.c is careful > > about not modifying state when it doesn't hold the proper lock, but not > > so careful about not *inspecting* state without the proper lock. > > ... > > I'm still thinking about how to make a real fix without introducing > > another lock/unlock cycle --- we can do that if we have to, but I think > > maybe it's possible to fix it without. > > Attached is a proposed patch to fix up slru.c so that it's not playing > any games by either reading or writing shared state without holding > the ControlLock for the SLRU set. > > The main problem with the existing code, as I now see it, was a poor > choice of representation of page state: we had states CLEAN, DIRTY, and > WRITE_IN_PROGRESS, which meant that re-dirtying a page while a write was > in progress required setting the state back to DIRTY, which hid the fact > that indeed a write was in progress. So the I/O code was designed to > cope with not knowing whether another write was already in progress. We > can make it a whole lot cleaner by changing the state representation so > that we can tell the difference --- then, we can know before releasing > the ControlLock whether we need to write the page or just wait for > someone else to finish writing it. And that means we can do all the > state-manipulation before releasing the lock. > > I could have just added an additional state WRITE_IN_PROGRESS_REDIRTIED, > or some such, but it seemed notationally cleaner to create a separate > "page_dirty" boolean, and reduce the number of states to four (empty, > read-in-progress, valid, write-in-progress). This lets outside code > such as clog.c just set "page_dirty = true" rather than doing a complex > bit of logic to change the state value properly. > > The patch is a bit bulky because of the representation change, but the > key changes are localized in SimpleLruReadPage and SimpleLruWritePage. > > I think this code is a whole lot cleaner than it was before, but it's > a bit of a large change to be making so late in the 8.1 cycle (not to > mention that we really ought to back-patch similar changes all the way > back, because the bug exists as far back as 7.2). I am going to take > another look to see if there is a less invasive change that will fix > the problem at some performance cost; if so, we might want to do it that > way in 8.1 and back branches. > > Any comments on the patch, or thoughts on how to proceed? > > regards, tom lane > > Content-Description: slru-race-1.patch [ Type application/octet-stream treated as attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)
From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes: > If you go for a new state code, rather than a separate > boolean, does it reduce the size of the patch? No, and it certainly wouldn't improve my level of confidence in it ... regards, tom lane
Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags
From
Bruce Momjian
Date:
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > If you go for a new state code, rather than a separate > > boolean, does it reduce the size of the patch? > > No, and it certainly wouldn't improve my level of confidence in it ... Well, then what real options do we have? It seems the patch is just required for all branches. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)
From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Well, then what real options do we have? It seems the patch is just > required for all branches. I think it would be possible to fix it in a less invasive way by taking and releasing the ControlLock an extra time in SimpleLruReadPage and SimpleLruWritePage. What's indeterminate about that is the performance cost. In situations where there's not a lot of SLRU I/O traffic it's presumably negligible, but in a case like Jim's where there's evidently a *whole* lot of traffic, it might be a killer. regards, tom lane
Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags
From
Bruce Momjian
Date:
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Well, then what real options do we have? It seems the patch is just > > required for all branches. > > I think it would be possible to fix it in a less invasive way by taking > and releasing the ControlLock an extra time in SimpleLruReadPage and > SimpleLruWritePage. What's indeterminate about that is the performance > cost. In situations where there's not a lot of SLRU I/O traffic it's > presumably negligible, but in a case like Jim's where there's evidently > a *whole* lot of traffic, it might be a killer. To me a performance problem is much harder get reports on and to locate than a real fix to the problem. I think if a few people eyeball the patch, it is OK for application. Are backpatches significantly different? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)
From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes: > To me a performance problem is much harder get reports on and to locate > than a real fix to the problem. I think if a few people eyeball the > patch, it is OK for application. Are backpatches significantly > different? Well, the logic is the same all the way back, but the code has changed textually quite a bit since 7.4 and even more since 7.3. I think the patch would apply reasonably cleanly to 8.0, but adjusting it for 7.x will take a bit of work, which would mean those versions would probably need to be reviewed separately. One possible compromise is to use this patch in 8.x and a simpler patch in 7.x --- people who are very concerned about performance ought to be running 8.x anyway ;-) regards, tom lane
Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)
From
Tom Lane
Date:
I wrote: > I think it would be possible to fix it in a less invasive way by taking > and releasing the ControlLock an extra time in SimpleLruReadPage and > SimpleLruWritePage. What's indeterminate about that is the performance > cost. Attached is an alternative patch that does it this way. I realized that we could use LWLockConditionalAcquire to usually avoid any extra lock traffic, so the performance cost may be negligible except under the very heaviest of loads. I still like the bigger patch for 8.2 and forward, because it's a lot cleaner, but this seems like a credible alternative for 8.1 and back branches. Comments? regards, tom lane
Attachment
Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags
From
Bruce Momjian
Date:
OK, this is the way to fix for 8.0 and earlier. It is up to you about 8.1. I think we can handle the larger patch if we do another RC. --------------------------------------------------------------------------- Tom Lane wrote: > I wrote: > > I think it would be possible to fix it in a less invasive way by taking > > and releasing the ControlLock an extra time in SimpleLruReadPage and > > SimpleLruWritePage. What's indeterminate about that is the performance > > cost. > > Attached is an alternative patch that does it this way. I realized that > we could use LWLockConditionalAcquire to usually avoid any extra lock > traffic, so the performance cost may be negligible except under the very > heaviest of loads. I still like the bigger patch for 8.2 and forward, > because it's a lot cleaner, but this seems like a credible alternative > for 8.1 and back branches. > > Comments? > > regards, tom lane > Content-Description: slru-race-2.patch [ Type application/octet-stream treated as attachment, skipping... ] -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)
From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes: > OK, this is the way to fix for 8.0 and earlier. It is up to you about > 8.1. I think we can handle the larger patch if we do another RC. Well, I'd like not to do another RC, so I'll hold the larger patch for 8.2. We still need a test to confirm it fixes Jim's problem though. Jim, if you like you can test the second proposed patch instead of that off-the-cuff line swapping. However, either one will need to be run with Asserts on in order to have any confidence that the problem is fixed. If performance is an issue, most of the performance hit is probably coming from memory context checking, so what I'd suggest you do is comment out these two #defines in src/include/pg_config_manual.h: #define CLOBBER_FREED_MEMORY #define MEMORY_CONTEXT_CHECKING That should let you build with --enable-cassert and not take quite so much speed hit. regards, tom lane