Re: pgsql: Avoid improbable PANIC during heap_update. - Mailing list pgsql-committers

From Jeff Davis
Subject Re: pgsql: Avoid improbable PANIC during heap_update.
Date
Msg-id baa25a6fc7c725d28ec5d0b7787230180b45a275.camel@j-davis.com
Whole thread Raw
In response to Re: pgsql: Avoid improbable PANIC during heap_update.  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: pgsql: Avoid improbable PANIC during heap_update.
List pgsql-committers
On Fri, 2022-09-30 at 21:23 -0700, Peter Geoghegan wrote:
> 2490     page = BufferGetPage(buffer);
> 2491
> 2492     /*
> 2493      * Before locking the buffer, pin the visibility map page if
> it appears to
> 2494      * be necessary.  Since we haven't got the lock yet, someone
> else might be
> 2495      * in the middle of changing this, so we'll need to recheck
> after we have
> 2496      * the lock.
> 2497      */
> 2498     if (PageIsAllVisible(page))
> 2499         visibilitymap_pin(relation, block, &vmbuffer);
>
> So we're calling visibilitymap_pin() having just acquired a buffer
> pin
> on a heap page buffer for the first time, and without acquiring a
> buffer lock on the same heap page (we don't hold one now, and we've
> never held one at some earlier point).
>
> Without Jeff's bugfix, nothing stops heap_delete() from getting a pin
> on a heap page that happens to have already been cleanup locked by
> another session running VACUUM. The same session could then
> (correctly) observe that the page does not have PD_ALL_VISIBLE set --
> not yet. 

With you so far; I had considered this code path and was still unable
to repro.

> VACUUM might then set PD_ALL_VISIBLE, without having to
> acquire any kind of interlock that heap_delete() will reliably
> notice.
> After all, VACUUM had a cleanup lock before the other session's call
> to heap_delete() even began -- so the responsibility has to lie with
> heap_delete().

Directly after the code you reference above, there is (in 5f9dda4c06,
right before my patch):

 2501     LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 2502
 2503     /*
 2504      * If we didn't pin the visibility map page and the page has
become all
 2505      * visible while we were busy locking the buffer, we'll have
to unlock and
 2506      * re-lock, to avoid holding the buffer lock across an I/O.
That's a bit
 2507      * unfortunate, but hopefully shouldn't happen often.
 2508      */
 2509     if (vmbuffer == InvalidBuffer && PageIsAllVisible(page))
 2510     {
 2511         LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
 2512         visibilitymap_pin(relation, block, &vmbuffer);
 2513         LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 2514     }

Doesn't that deal with the case you brought up directly? heap_delete()
can't get the exclusive lock until VACUUM releases its cleanup lock, at
which point all-visible will be set. Then heap_delete() should notice
in line 2509 and then pin the VM buffer. Right?

And I don't think a similar issue exists when the locks are briefly
released on lines 2563 or 2606. The pin is held until after the VM bit
is cleared (aside from an error path and an early return):

 2489     buffer = ReadBuffer(relation, block);
 ...
 2717     if (PageIsAllVisible(page))
 2718     {
 2719         all_visible_cleared = true;
 2720         PageClearAllVisible(page);
 ...
 2835     ReleaseBuffer(buffer);

If VACCUM hasn't acquired the cleanup lock before 2489, it can't get
one until heap_delete() is done looking at the all-visible bit anyway.
So I don't see how my patch would have fixed it even if that was the
problem.

Of course, I must be wrong somewhere, because the bug seems to exist.

--
Jeff Davis
PostgreSQL Contributor Team - AWS





pgsql-committers by date:

Previous
From: Peter Eisentraut
Date:
Subject: pgsql: Fix tiny memory leaks
Next
From: Peter Geoghegan
Date:
Subject: Re: pgsql: Avoid improbable PANIC during heap_update.