Thread: Allow cancelling VACUUM of nbtrees with corrupted right links
Hi, A couple times, one very recently, I've encountered btrees that somehow had corrupted right links. The links formed a cycle, involving a number of pages. As of yet it's unclear to me where the corruption is originating from - could be a storage issue or a postgres issue. What makes that kind of corruption annoying is not so much lookups or insertsion not working, but that it can lead to VACUUM being stuck. Page deletion codepaths have to follow right links, and if there's a cycle they'll do so forever. That'd be bad enough, but there's no CHECK_FOR_INTERRUPTS() in those codepaths, which means autovacuum can't be cancelled. And thus the index can't easily be dropped / reindexed. In an older case that lead to significant difficulty for the user to ever get out of the situation, because even after a shutdown autovacuum quickly latched onto the table, IIRC due to an impeding wraparound. I think it'd be a good minimal fix if we added a bunch of CFIs to the likely instances of such loops. I've done that in the attached patch. Unfortunately it's entirely trivial, because CFI will not trigger when an lwlock is held (as LWLockAcquire() does a HOLD_INTERRUPTS()). Any comments about the patch? I couldn't see how to fix the _bt_unlink_halfdead_page() right-sib loop, because we always hold a lock. But given that that loop appears to mostly be dead code, that doesn't seem too bad? I think we should backpatch those checks - it's a fairly nasty situation for users to not be able to even drop an index without going to single user mode. I wonder if we additionally should put a CFI() in _bt_relandgetbuf(), as it's otherwise impossible to check interrupts at the callsites. Alternatively we could also invent a CFI version that allows cancellation even if locks held - but that seems nontrivial. Greetings, Andres Freund
Attachment
Hi, On 2018-06-27 12:16:29 -0700, Andres Freund wrote: > I couldn't see how to fix the _bt_unlink_halfdead_page() right-sib loop, > because we always hold a lock. But given that that loop appears to > mostly be dead code, that doesn't seem too bad? It's possibly wrong that it's unreachable - I've just not managed to get there. If somebody has an idea how to build a reproducible case to reach it... - Andres
On Wed, Jun 27, 2018 at 12:18 PM, Andres Freund <andres@anarazel.de> wrote: > It's possibly wrong that it's unreachable - I've just not managed to get > there. If somebody has an idea how to build a reproducible case to reach > it... Set a breakpoint in _bt_unlink_halfdead_page() after the initial "LockBuffer(leafbuf, BUFFER_LOCK_UNLOCK);", and then provoke a page split in the left sibling by carefully inserting values that belong within its part of the key space? I would probably attempt this with an index on text, so that I could contrive as many key values that belong on the left sibling as needed. You're probably looking at a test-case that doesn't involve a multi-level deletion, where the leaf and target page are the same. Not sure if you want to reproduce the multi-level deletion case independently. -- Peter Geoghegan
Hi, On 2018-06-27 12:49:56 -0700, Peter Geoghegan wrote: > On Wed, Jun 27, 2018 at 12:18 PM, Andres Freund <andres@anarazel.de> wrote: > > It's possibly wrong that it's unreachable - I've just not managed to get > > there. If somebody has an idea how to build a reproducible case to reach > > it... > > Set a breakpoint in _bt_unlink_halfdead_page() after the initial > "LockBuffer(leafbuf, BUFFER_LOCK_UNLOCK);", and then provoke a page > split in the left sibling by carefully inserting values that belong > within its part of the key space? I would probably attempt this with > an index on text, so that I could contrive as many key values that > belong on the left sibling as needed. A related question is if it matters - without complicating the code I don't see how we could release all the locks in that loop. Therefore no interrupts can be accepted. I hope I'm missing something? Greetings, Andres Freund
On Wed, Jun 27, 2018 at 12:52 PM, Andres Freund <andres@anarazel.de> wrote: > A related question is if it matters - without complicating the code I > don't see how we could release all the locks in that loop. Therefore no > interrupts can be accepted. I hope I'm missing something? I agree. In general, page deletion is the most complicated part of nbtree concurrency, by far (if we just had the basic L&Y, the concurrency aspects would be far easier to grasp). Doing better in _bt_unlink_halfdead_page() seems extremely difficult, and very unlikely to be worthwhile. -- Peter Geoghegan
On 2018-06-27 13:02:25 -0700, Peter Geoghegan wrote: > On Wed, Jun 27, 2018 at 12:52 PM, Andres Freund <andres@anarazel.de> wrote: > > A related question is if it matters - without complicating the code I > > don't see how we could release all the locks in that loop. Therefore no > > interrupts can be accepted. I hope I'm missing something? > > I agree. > > In general, page deletion is the most complicated part of nbtree > concurrency, by far (if we just had the basic L&Y, the concurrency > aspects would be far easier to grasp). Doing better in > _bt_unlink_halfdead_page() seems extremely difficult, and very > unlikely to be worthwhile. Well, I don't really want to generally do better. Just be able to check for interrupts ;) Greetings, Andres Freund
On Wed, Jun 27, 2018 at 1:11 PM, Andres Freund <andres@anarazel.de> wrote: > Well, I don't really want to generally do better. Just be able to check > for interrupts ;) That's what I meant. :-) -- Peter Geoghegan
Hi, On 2018-06-27 12:16:29 -0700, Andres Freund wrote: > I think we should backpatch those checks - it's a fairly nasty situation > for users to not be able to even drop an index without going to single > user mode. Did that back to 9.4 - before that page deletion and splits worked differently enough that it seems higher risk to add the checks there. Greetings, Andres Freund