Re: pgsql: Avoid improbable PANIC during heap_update. - Mailing list pgsql-committers

From Peter Geoghegan
Subject Re: pgsql: Avoid improbable PANIC during heap_update.
Date
Msg-id CAH2-WzkGUXW5+7Ry6r3iVN2vhU0cO74YrH07cXW8ykxjZUeWMA@mail.gmail.com
Whole thread Raw
In response to Re: pgsql: Avoid improbable PANIC during heap_update.  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: pgsql: Avoid improbable PANIC during heap_update.
Re: pgsql: Avoid improbable PANIC during heap_update.
List pgsql-committers
On Fri, Sep 30, 2022 at 6:29 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I talked to Robins about this privately. I was wrong; there isn't a
> simple or boring explanation.

I think that I figured it out. With or without bugfix commit 163b0993,
we do these steps early in heap_delete() (this is 13 code as of
today):

2490     page = BufferGetPage(buffer);
2491
2492     /*
2493      * Before locking the buffer, pin the visibility map page if
it appears to
2494      * be necessary.  Since we haven't got the lock yet, someone
else might be
2495      * in the middle of changing this, so we'll need to recheck
after we have
2496      * the lock.
2497      */
2498     if (PageIsAllVisible(page))
2499         visibilitymap_pin(relation, block, &vmbuffer);

So we're calling visibilitymap_pin() having just acquired a buffer pin
on a heap page buffer for the first time, and without acquiring a
buffer lock on the same heap page (we don't hold one now, and we've
never held one at some earlier point).

Without Jeff's bugfix, nothing stops heap_delete() from getting a pin
on a heap page that happens to have already been cleanup locked by
another session running VACUUM. The same session could then
(correctly) observe that the page does not have PD_ALL_VISIBLE set --
not yet. VACUUM might then set PD_ALL_VISIBLE, without having to
acquire any kind of interlock that heap_delete() will reliably notice.
After all, VACUUM had a cleanup lock before the other session's call
to heap_delete() even began -- so the responsibility has to lie with
heap_delete().

Jeff's bugfix will fix the bug on 13 too. The bugfix doesn't take the
aggressive/conservative approach of simply getting an exclusive lock
to check PageIsAllVisible() at the same point, on performance grounds
(no change there). The bugfix does make this old heap_delete()
no-buffer-lock behavior safe by teaching heap_delete() to not assume
that a page that didn't have PD_ALL_VISIBLE initially set cannot have
it set concurrently.

So 13 is only different to 14 in that there are fewer ways for
essentially the same race to happen. This is probably only true for
the heap_delete() issue, not either of the similar heap_update()
issues.

--
Peter Geoghegan



pgsql-committers by date:

Previous
From: Andres Freund
Date:
Subject: pgsql: meson: mingw: Add -Wl,--disable-auto-import, enable when linking
Next
From: Tom Lane
Date:
Subject: Re: pgsql: Avoid improbable PANIC during heap_update.