Thread: pgsql: Avoid improbable PANIC during heap_update.
Avoid improbable PANIC during heap_update. heap_update needs to clear any existing "all visible" flag on the old tuple's page (and on the new page too, if different). Per coding rules, to do this it must acquire pin on the appropriate visibility-map page while not holding exclusive buffer lock; which creates a race condition since someone else could set the flag whenever we're not holding the buffer lock. The code is supposed to handle that by re-checking the flag after acquiring buffer lock and retrying if it became set. However, one code path through heap_update itself, as well as one in its subroutine RelationGetBufferForTuple, failed to do this. The end result, in the unlikely event that a concurrent VACUUM did set the flag while we're transiently not holding lock, is a non-recurring "PANIC: wrong buffer passed to visibilitymap_clear" failure. This has been seen a few times in the buildfarm since recent VACUUM changes that added code paths that could set the all-visible flag while holding only exclusive buffer lock. Previously, the flag was (usually?) set only after doing LockBufferForCleanup, which would insist on buffer pin count zero, thus preventing the flag from becoming set partway through heap_update. However, it's clear that it's heap_update not VACUUM that's at fault here. What's less clear is whether there is any hazard from these bugs in released branches. heap_update is certainly violating API expectations, but if there is no code path that can set all-visible without a cleanup lock then it's only a latent bug. That's not 100% certain though, besides which we should worry about extensions or future back-patch fixes that could introduce such code paths. I chose to back-patch to v12. Fixing RelationGetBufferForTuple before that would require also back-patching portions of older fixes (notably 0d1fe9f74), which is more code churn than seems prudent to fix a hypothetical issue. Discussion: https://postgr.es/m/2247102.1618008027@sss.pgh.pa.us Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/34f581c39e97e2ea237255cf75cccebccc02d477 Modified Files -------------- src/backend/access/heap/heapam.c | 44 ++++++++++++++++++++++++---------------- src/backend/access/heap/hio.c | 24 ++++++++++++++++------ 2 files changed, 45 insertions(+), 23 deletions(-)
On Tue, Apr 13, 2021 at 04:17:39PM +0000, Tom Lane wrote: > Avoid improbable PANIC during heap_update. > > heap_update needs to clear any existing "all visible" flag on > the old tuple's page (and on the new page too, if different). > Per coding rules, to do this it must acquire pin on the appropriate > visibility-map page while not holding exclusive buffer lock; > which creates a race condition since someone else could set the > flag whenever we're not holding the buffer lock. The code is > supposed to handle that by re-checking the flag after acquiring > buffer lock and retrying if it became set. However, one code > path through heap_update itself, as well as one in its subroutine > RelationGetBufferForTuple, failed to do this. The end result, > in the unlikely event that a concurrent VACUUM did set the flag > while we're transiently not holding lock, is a non-recurring > "PANIC: wrong buffer passed to visibilitymap_clear" failure. > Hi, This doesn't look as improbable because I saw it at least 3 times with v15beta4. The first time I thought it was my fault, then I tried with a commit on september 25 (didn't remember which exactly but that doesn't seems too relevant). Finally I saw it again in a build with TRACE_VISIBILITYMAP defined (the same commit). But I haven't see it anymore on rc1. Anyway I'm attaching the backtrace (this is from the build with TRACE_VISIBILITYMAP), the query that was running at the time was (no changes were made to quad_poly_tbl table nor any indexes were added to this table): """ update public.quad_poly_tbl set id = public.quad_poly_tbl.id returning public.quad_poly_tbl.id as c0 """ -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
Attachment
On Thu, Sep 29, 2022 at 02:55:40AM -0500, Jaime Casanova wrote: > On Tue, Apr 13, 2021 at 04:17:39PM +0000, Tom Lane wrote: > > Avoid improbable PANIC during heap_update. > > > > heap_update needs to clear any existing "all visible" flag on > > the old tuple's page (and on the new page too, if different). > > Per coding rules, to do this it must acquire pin on the appropriate > > visibility-map page while not holding exclusive buffer lock; > > which creates a race condition since someone else could set the > > flag whenever we're not holding the buffer lock. The code is > > supposed to handle that by re-checking the flag after acquiring > > buffer lock and retrying if it became set. However, one code > > path through heap_update itself, as well as one in its subroutine > > RelationGetBufferForTuple, failed to do this. The end result, > > in the unlikely event that a concurrent VACUUM did set the flag > > while we're transiently not holding lock, is a non-recurring > > "PANIC: wrong buffer passed to visibilitymap_clear" failure. > > > > Hi, > > This doesn't look as improbable because I saw it at least 3 times with > v15beta4. > > The first time I thought it was my fault, then I tried with a commit on > september 25 (didn't remember which exactly but that doesn't seems too > relevant). > Finally I saw it again in a build with TRACE_VISIBILITYMAP defined (the > same commit). > > But I haven't see it anymore on rc1. Anyway I'm attaching the backtrace > (this is from the build with TRACE_VISIBILITYMAP), the query that was > running at the time was (no changes were made to quad_poly_tbl table > nor any indexes were added to this table): > > """ > update public.quad_poly_tbl set > id = public.quad_poly_tbl.id > returning > public.quad_poly_tbl.id as c0 > """ > Just to confirm I saw this on RC1 -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
Jaime Casanova <jcasanov@systemguards.com.ec> writes: > Just to confirm I saw this on RC1 What test case are you using? regards, tom lane
Jaime Casanova <jcasanov@systemguards.com.ec> writes: > Just to confirm I saw this on RC1 Ugh ... I think I see the problem. There's still one path through RelationGetBufferForTuple that fails to guarantee that it's acquired a vmbuffer pin if the all-visible flag becomes set in the otherBuffer. Namely, if we're forced to extend the relation, then we deal with vm pins when ConditionalLockBuffer(otherBuffer) fails ... but not when it succeeds. I think the fix is just to move the last GetVisibilityMapPins call out of the "if (unlikely(!ConditionalLockBuffer(otherBuffer)))" stanza. It'd still be good to have a test case for this ... regards, tom lane
On Fri, Sep 30, 2022 at 2:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jaime Casanova <jcasanov@systemguards.com.ec> writes: > > Just to confirm I saw this on RC1 > > Ugh ... I think I see the problem. There's still one path through > RelationGetBufferForTuple that fails to guarantee that it's acquired > a vmbuffer pin if the all-visible flag becomes set in the otherBuffer. > It'd still be good to have a test case for this ... FWIW it seems possible that the Postgres 15 vacuumlazy.c work that added lazy_scan_noprune() made this scenario more likely in practice -- even compared to Postgres 14. Note that VACUUM will collect preexisting LP_DEAD items in heap pages that cannot be cleanup locked during VACUUM's first heap pass in Postgres 15 (in lazy_scan_noprune). There is no need for a cleanup lock in the second heap pass, either (that details is the same as 14, but not earlier versions). So 15 is the first version that doesn't need a cleanup lock in either the first heap pass or the second heap pass to be able to set the heap page all-visible. That difference seems like it could be "protective" on 14, especially when vacuuming smaller tables. -- Peter Geoghegan
I wrote: > Ugh ... I think I see the problem. There's still one path through > RelationGetBufferForTuple that fails to guarantee that it's acquired > a vmbuffer pin if the all-visible flag becomes set in the otherBuffer. > Namely, if we're forced to extend the relation, then we deal with > vm pins when ConditionalLockBuffer(otherBuffer) fails ... but not > when it succeeds. I think the fix is just to move the last > GetVisibilityMapPins call out of the "if > (unlikely(!ConditionalLockBuffer(otherBuffer)))" stanza. Concretely, about like this. regards, tom lane diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c index ae2e2ce37a..b0ece66629 100644 --- a/src/backend/access/heap/hio.c +++ b/src/backend/access/heap/hio.c @@ -678,29 +678,34 @@ loop: LockBuffer(buffer, BUFFER_LOCK_UNLOCK); LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE); LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE); + } - /* - * Because the buffers were unlocked for a while, it's possible, - * although unlikely, that an all-visible flag became set or that - * somebody used up the available space in the new page. We can - * use GetVisibilityMapPins to deal with the first case. In the - * second case, just retry from start. - */ - GetVisibilityMapPins(relation, otherBuffer, buffer, - otherBlock, targetBlock, vmbuffer_other, - vmbuffer); + /* + * Because the buffers were unlocked for a while, it's possible, + * although unlikely, that an all-visible flag became set or that + * somebody used up the available space in the new page. We can use + * GetVisibilityMapPins to deal with the first case. In the second + * case, just retry from start. + */ + GetVisibilityMapPins(relation, otherBuffer, buffer, + otherBlock, targetBlock, vmbuffer_other, + vmbuffer); - if (len > PageGetHeapFreeSpace(page)) - { - LockBuffer(otherBuffer, BUFFER_LOCK_UNLOCK); - UnlockReleaseBuffer(buffer); + /* + * Note that we have to check the available space even if our + * conditional lock succeeded, because GetVisibilityMapPins might've + * transiently released lock on the target buffer to acquire a VM pin + * for the otherBuffer. + */ + if (len > PageGetHeapFreeSpace(page)) + { + LockBuffer(otherBuffer, BUFFER_LOCK_UNLOCK); + UnlockReleaseBuffer(buffer); - goto loop; - } + goto loop; } } - - if (len > PageGetHeapFreeSpace(page)) + else if (len > PageGetHeapFreeSpace(page)) { /* We should not get here given the test at the top */ elog(PANIC, "tuple is too big: size %zu", len);
Peter Geoghegan <pg@bowt.ie> writes: > On Fri, Sep 30, 2022 at 2:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Ugh ... I think I see the problem. There's still one path through >> RelationGetBufferForTuple that fails to guarantee that it's acquired >> a vmbuffer pin if the all-visible flag becomes set in the otherBuffer. > FWIW it seems possible that the Postgres 15 vacuumlazy.c work that > added lazy_scan_noprune() made this scenario more likely in practice > -- even compared to Postgres 14. Could be, because we haven't seen field reports of this in v14 yet. And there's still no hard evidence of a bug pre-14. Nonetheless, I'm inclined to backpatch to v12 as 34f581c39 was. regards, tom lane
On Fri, Sep 30, 2022 at 2:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > FWIW it seems possible that the Postgres 15 vacuumlazy.c work that > > added lazy_scan_noprune() made this scenario more likely in practice > > -- even compared to Postgres 14. > > Could be, because we haven't seen field reports of this in v14 yet. I would be more confident here were it not for the recent heap_delete() issue reported by one of my AWS colleagues (and fixed by another, Jeff Davis). See commit 163b0993 if you missed it before now. > And there's still no hard evidence of a bug pre-14. Nonetheless, > I'm inclined to backpatch to v12 as 34f581c39 was. +1 -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Fri, Sep 30, 2022 at 2:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Could be, because we haven't seen field reports of this in v14 yet. > I would be more confident here were it not for the recent > heap_delete() issue reported by one of my AWS colleagues (and fixed by > another, Jeff Davis). See commit 163b0993 if you missed it before now. Hmm, okay, though that's really a distinct bug of the same ilk. You're right that I'd not paid close attention to that thread after Jeff diagnosed the problem. It does seem like Robins' report shows that there's some way that v13 will set the AV bit without a cleanup lock ... does that constitute a bug in itself? regards, tom lane
On Fri, Sep 30, 2022 at 4:52 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:> > I would be more confident here were it not for the recent > > heap_delete() issue reported by one of my AWS colleagues (and fixed by > > another, Jeff Davis). See commit 163b0993 if you missed it before now. > > Hmm, okay, though that's really a distinct bug of the same ilk. > You're right that I'd not paid close attention to that thread after > Jeff diagnosed the problem. I just meant that I don't feel particularly confident about what might be possible or likely in Postgres 14 with this new issue in heap_update() on point releases without today's bugfix. My theory about lazy_scan_noprune() might be correct, but take it with a grain of salt. > It does seem like Robins' report > shows that there's some way that v13 will set the AV bit without > a cleanup lock ... does that constitute a bug in itself? We never got to the bottom of that part, strangely enough. I can ask again. In any case we cannot really treat the information that we have about that as a bug report -- not as things stand. Why should the question of whether or not we ever set a page PD_ALL_VISIBLE without a cleanup lock on v13 be a mystery at all? Why wouldn't a simple grep get to the bottom of it? I have to imagine that the true explanation is very simple and boring. -- Peter Geoghegan
On Fri, Sep 30, 2022 at 5:09 PM Peter Geoghegan <pg@bowt.ie> wrote: > In any case we cannot really treat the information that we have about > that as a bug report -- not as things stand. Why should the question > of whether or not we ever set a page PD_ALL_VISIBLE without a cleanup > lock on v13 be a mystery at all? Why wouldn't a simple grep get to the > bottom of it? I have to imagine that the true explanation is very > simple and boring. I talked to Robins about this privately. I was wrong; there isn't a simple or boring explanation. Robins set out to find bugs like this in Postgres via stress-testing. He used a lab environment for this, and was quite methodical. So there is no reason to doubt that a PANIC happened on v13 at least once. There must be some relatively complicated explanation for that, but right now I can only speculate. -- Peter Geoghegan
On Fri, Sep 30, 2022 at 6:29 PM Peter Geoghegan <pg@bowt.ie> wrote: > I talked to Robins about this privately. I was wrong; there isn't a > simple or boring explanation. I think that I figured it out. With or without bugfix commit 163b0993, we do these steps early in heap_delete() (this is 13 code as of today): 2490 page = BufferGetPage(buffer); 2491 2492 /* 2493 * Before locking the buffer, pin the visibility map page if it appears to 2494 * be necessary. Since we haven't got the lock yet, someone else might be 2495 * in the middle of changing this, so we'll need to recheck after we have 2496 * the lock. 2497 */ 2498 if (PageIsAllVisible(page)) 2499 visibilitymap_pin(relation, block, &vmbuffer); So we're calling visibilitymap_pin() having just acquired a buffer pin on a heap page buffer for the first time, and without acquiring a buffer lock on the same heap page (we don't hold one now, and we've never held one at some earlier point). Without Jeff's bugfix, nothing stops heap_delete() from getting a pin on a heap page that happens to have already been cleanup locked by another session running VACUUM. The same session could then (correctly) observe that the page does not have PD_ALL_VISIBLE set -- not yet. VACUUM might then set PD_ALL_VISIBLE, without having to acquire any kind of interlock that heap_delete() will reliably notice. After all, VACUUM had a cleanup lock before the other session's call to heap_delete() even began -- so the responsibility has to lie with heap_delete(). Jeff's bugfix will fix the bug on 13 too. The bugfix doesn't take the aggressive/conservative approach of simply getting an exclusive lock to check PageIsAllVisible() at the same point, on performance grounds (no change there). The bugfix does make this old heap_delete() no-buffer-lock behavior safe by teaching heap_delete() to not assume that a page that didn't have PD_ALL_VISIBLE initially set cannot have it set concurrently. So 13 is only different to 14 in that there are fewer ways for essentially the same race to happen. This is probably only true for the heap_delete() issue, not either of the similar heap_update() issues. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > ... nothing stops heap_delete() from getting a pin > on a heap page that happens to have already been cleanup locked by > another session running VACUUM. The same session could then > (correctly) observe that the page does not have PD_ALL_VISIBLE set -- > not yet. VACUUM might then set PD_ALL_VISIBLE, without having to > acquire any kind of interlock that heap_delete() will reliably notice. I'm too tired to think this through completely clearly, but this sounds right, and what it seems to imply is that this race condition exists in all PG versions. Which would imply that we need to do the work to back-patch these three fixes into v11/v10. I would rather not do that, because then we'd have to also back-patch some other more-invasive changes, and the net risk of introducing new bugs seems uncomfortably high. (Especially for v10, where there will be no second chance after the November releases.) So what is bothering me about this line of thought is: how come there have not been reports of these failures in older branches? Is there some aspect we're not thinking about that masks the bug? regards, tom lane
On Fri, Sep 30, 2022 at 9:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm too tired to think this through completely clearly, but this > sounds right, and what it seems to imply is that this race condition > exists in all PG versions. I think that the heap_delete() issue is probably in all PG versions. > Which would imply that we need to do the > work to back-patch these three fixes into v11/v10. I am not aware of any reason why we should need the heap_update() fixes to be backpatched any further. Though I will need to think about it some more. > So what is bothering me about this line of thought is: how come > there have not been reports of these failures in older branches? > Is there some aspect we're not thinking about that masks the bug? The likely explanation is that Robins was able to find the heap_delete() bug by throwing lots of resources (human effort and machine time) into it. It literally took weeks of adversarial stress-testing to find the bug. It's entirely possible and perhaps likely that this isn't representative of real world conditions in some crucial way. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > I think that the heap_delete() issue is probably in all PG versions. Yeah, that's what I'm afraid of ... > I am not aware of any reason why we should need the heap_update() > fixes to be backpatched any further. How so? AFAICS these are exactly the same oversight, ie failure to deal with the all-visible bit getting set partway through the operation. You've explained how that can happen. regards, tom lane
On Fri, Sep 30, 2022 at 10:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > How so? AFAICS these are exactly the same oversight, ie failure > to deal with the all-visible bit getting set partway through the > operation. You've explained how that can happen. I thought that there might have been something protective about how the loop would work in heap_update(), but perhaps that's not true. It might just be that heap_update() does lots of stuff in between, so it's less likely to be affected by this particular race (the race which seems to be present in all versions). -- Peter Geoghegan
On Fri, 2022-09-30 at 21:23 -0700, Peter Geoghegan wrote: > 2490 page = BufferGetPage(buffer); > 2491 > 2492 /* > 2493 * Before locking the buffer, pin the visibility map page if > it appears to > 2494 * be necessary. Since we haven't got the lock yet, someone > else might be > 2495 * in the middle of changing this, so we'll need to recheck > after we have > 2496 * the lock. > 2497 */ > 2498 if (PageIsAllVisible(page)) > 2499 visibilitymap_pin(relation, block, &vmbuffer); > > So we're calling visibilitymap_pin() having just acquired a buffer > pin > on a heap page buffer for the first time, and without acquiring a > buffer lock on the same heap page (we don't hold one now, and we've > never held one at some earlier point). > > Without Jeff's bugfix, nothing stops heap_delete() from getting a pin > on a heap page that happens to have already been cleanup locked by > another session running VACUUM. The same session could then > (correctly) observe that the page does not have PD_ALL_VISIBLE set -- > not yet. With you so far; I had considered this code path and was still unable to repro. > VACUUM might then set PD_ALL_VISIBLE, without having to > acquire any kind of interlock that heap_delete() will reliably > notice. > After all, VACUUM had a cleanup lock before the other session's call > to heap_delete() even began -- so the responsibility has to lie with > heap_delete(). Directly after the code you reference above, there is (in 5f9dda4c06, right before my patch): 2501 LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE); 2502 2503 /* 2504 * If we didn't pin the visibility map page and the page has become all 2505 * visible while we were busy locking the buffer, we'll have to unlock and 2506 * re-lock, to avoid holding the buffer lock across an I/O. That's a bit 2507 * unfortunate, but hopefully shouldn't happen often. 2508 */ 2509 if (vmbuffer == InvalidBuffer && PageIsAllVisible(page)) 2510 { 2511 LockBuffer(buffer, BUFFER_LOCK_UNLOCK); 2512 visibilitymap_pin(relation, block, &vmbuffer); 2513 LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE); 2514 } Doesn't that deal with the case you brought up directly? heap_delete() can't get the exclusive lock until VACUUM releases its cleanup lock, at which point all-visible will be set. Then heap_delete() should notice in line 2509 and then pin the VM buffer. Right? And I don't think a similar issue exists when the locks are briefly released on lines 2563 or 2606. The pin is held until after the VM bit is cleared (aside from an error path and an early return): 2489 buffer = ReadBuffer(relation, block); ... 2717 if (PageIsAllVisible(page)) 2718 { 2719 all_visible_cleared = true; 2720 PageClearAllVisible(page); ... 2835 ReleaseBuffer(buffer); If VACCUM hasn't acquired the cleanup lock before 2489, it can't get one until heap_delete() is done looking at the all-visible bit anyway. So I don't see how my patch would have fixed it even if that was the problem. Of course, I must be wrong somewhere, because the bug seems to exist. -- Jeff Davis PostgreSQL Contributor Team - AWS
On Sat, Oct 1, 2022 at 9:35 AM Jeff Davis <pgsql@j-davis.com> wrote: > Doesn't that deal with the case you brought up directly? heap_delete() > can't get the exclusive lock until VACUUM releases its cleanup lock, at > which point all-visible will be set. Then heap_delete() should notice > in line 2509 and then pin the VM buffer. Right? I now believe that you're right. I don't think that this code was ever designed to rely on cleanup locks in any way; that was kind of an accident all along. Even still, I'm not sure how I missed such an obvious thing. Sorry for the misdirection. Still, there has to be *some* reason why the bug could repro on 13. -- Peter Geoghegan
On Fri, Sep 30, 2022 at 04:51:20PM -0400, Tom Lane wrote: > Jaime Casanova <jcasanov@systemguards.com.ec> writes: > > Just to confirm I saw this on RC1 > > What test case are you using? > Hi, Currently the way I have to reproduce it is: - install the regression database - drop all tables but: hash_i4_heap, hash_name_heap, hash_txt_heap, quad_poly_tbl, road - run 10 sqlsmith processes... normally in an hour or less the problem appears I have the logs from the last time it happened so maybe I can trace the exact pattern to reproducite at will... at least to keep a test somewhere. BTW, so far so good with your last fix (about 12 hours now)... -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL