Thread: Allow cancelling VACUUM of nbtrees with corrupted right links

Allow cancelling VACUUM of nbtrees with corrupted right links

From
Andres Freund
Date:
Hi,

A couple times, one very recently, I've encountered btrees that somehow
had corrupted right links. The links formed a cycle, involving a number
of pages.  As of yet it's unclear to me where the corruption is
originating from - could be a storage issue or a postgres issue.


What makes that kind of corruption annoying is not so much lookups or
insertsion not working, but that it can lead to VACUUM being stuck. Page
deletion codepaths have to follow right links, and if there's a cycle
they'll do so forever. That'd be bad enough, but there's no
CHECK_FOR_INTERRUPTS() in those codepaths, which means autovacuum can't
be cancelled. And thus the index can't easily be dropped / reindexed.

In an older case that lead to significant difficulty for the user to
ever get out of the situation, because even after a shutdown autovacuum
quickly latched onto the table, IIRC due to an impeding wraparound.

I think it'd be a good minimal fix if we added a bunch of CFIs to the
likely instances of such loops. I've done that in the attached patch.
Unfortunately it's entirely trivial, because CFI will not trigger when
an lwlock is held (as LWLockAcquire() does a HOLD_INTERRUPTS()).  Any
comments about the patch?


I couldn't see how to fix the _bt_unlink_halfdead_page() right-sib loop,
because we always hold a lock. But given that that loop appears to
mostly be dead code, that doesn't seem too bad?


I think we should backpatch those checks - it's a fairly nasty situation
for users to not be able to even drop an index without going to single
user mode.


I wonder if we additionally should put a CFI() in _bt_relandgetbuf(), as
it's otherwise impossible to check interrupts at the callsites.


Alternatively we could also invent a CFI version that allows
cancellation even if locks held - but that seems nontrivial.

Greetings,

Andres Freund

Attachment

Re: Allow cancelling VACUUM of nbtrees with corrupted right links

From
Andres Freund
Date:
Hi,

On 2018-06-27 12:16:29 -0700, Andres Freund wrote:
> I couldn't see how to fix the _bt_unlink_halfdead_page() right-sib loop,
> because we always hold a lock. But given that that loop appears to
> mostly be dead code, that doesn't seem too bad?

It's possibly wrong that it's unreachable - I've just not managed to get
there. If somebody has an idea how to build a reproducible case to reach
it...

- Andres


Re: Allow cancelling VACUUM of nbtrees with corrupted right links

From
Peter Geoghegan
Date:
On Wed, Jun 27, 2018 at 12:18 PM, Andres Freund <andres@anarazel.de> wrote:
> It's possibly wrong that it's unreachable - I've just not managed to get
> there. If somebody has an idea how to build a reproducible case to reach
> it...

Set a breakpoint in _bt_unlink_halfdead_page() after the initial
"LockBuffer(leafbuf, BUFFER_LOCK_UNLOCK);", and then provoke a page
split in the left sibling by carefully inserting values that belong
within its part of the key space? I would probably attempt this with
an index on text, so that I could contrive as many key values that
belong on the left sibling as needed.

You're probably looking at a test-case that doesn't involve a
multi-level deletion, where the leaf and target page are the same. Not
sure if you want to reproduce the multi-level deletion case
independently.


-- 
Peter Geoghegan


Re: Allow cancelling VACUUM of nbtrees with corrupted right links

From
Andres Freund
Date:
Hi,

On 2018-06-27 12:49:56 -0700, Peter Geoghegan wrote:
> On Wed, Jun 27, 2018 at 12:18 PM, Andres Freund <andres@anarazel.de> wrote:
> > It's possibly wrong that it's unreachable - I've just not managed to get
> > there. If somebody has an idea how to build a reproducible case to reach
> > it...
> 
> Set a breakpoint in _bt_unlink_halfdead_page() after the initial
> "LockBuffer(leafbuf, BUFFER_LOCK_UNLOCK);", and then provoke a page
> split in the left sibling by carefully inserting values that belong
> within its part of the key space? I would probably attempt this with
> an index on text, so that I could contrive as many key values that
> belong on the left sibling as needed.

A related question is if it matters - without complicating the code I
don't see how we could release all the locks in that loop. Therefore no
interrupts can be accepted.  I hope I'm missing something?

Greetings,

Andres Freund


Re: Allow cancelling VACUUM of nbtrees with corrupted right links

From
Peter Geoghegan
Date:
On Wed, Jun 27, 2018 at 12:52 PM, Andres Freund <andres@anarazel.de> wrote:
> A related question is if it matters - without complicating the code I
> don't see how we could release all the locks in that loop. Therefore no
> interrupts can be accepted.  I hope I'm missing something?

I agree.

In general, page deletion is the most complicated part of nbtree
concurrency, by far (if we just had the basic L&Y, the concurrency
aspects would be far easier to grasp). Doing better in
_bt_unlink_halfdead_page() seems extremely difficult, and very
unlikely to be worthwhile.

-- 
Peter Geoghegan


Re: Allow cancelling VACUUM of nbtrees with corrupted right links

From
Andres Freund
Date:
On 2018-06-27 13:02:25 -0700, Peter Geoghegan wrote:
> On Wed, Jun 27, 2018 at 12:52 PM, Andres Freund <andres@anarazel.de> wrote:
> > A related question is if it matters - without complicating the code I
> > don't see how we could release all the locks in that loop. Therefore no
> > interrupts can be accepted.  I hope I'm missing something?
> 
> I agree.
> 
> In general, page deletion is the most complicated part of nbtree
> concurrency, by far (if we just had the basic L&Y, the concurrency
> aspects would be far easier to grasp). Doing better in
> _bt_unlink_halfdead_page() seems extremely difficult, and very
> unlikely to be worthwhile.

Well, I don't really want to generally do better. Just be able to check
for interrupts ;)

Greetings,

Andres Freund


Re: Allow cancelling VACUUM of nbtrees with corrupted right links

From
Peter Geoghegan
Date:
On Wed, Jun 27, 2018 at 1:11 PM, Andres Freund <andres@anarazel.de> wrote:
> Well, I don't really want to generally do better. Just be able to check
> for interrupts ;)

That's what I meant. :-)


-- 
Peter Geoghegan


Re: Allow cancelling VACUUM of nbtrees with corrupted right links

From
Andres Freund
Date:
Hi,

On 2018-06-27 12:16:29 -0700, Andres Freund wrote:
> I think we should backpatch those checks - it's a fairly nasty situation
> for users to not be able to even drop an index without going to single
> user mode.

Did that back to 9.4 - before that page deletion and splits worked
differently enough that it seems higher risk to add the checks there.

Greetings,

Andres Freund