Thread: Backpatching nbtree VACUUM (page deletion) hardening
Postgres 14 commit 5b861baa55 added hardening to nbtree page deletion. This had the effect of making nbtree VACUUM robust against misbehaving operator classes -- we just LOG the problem and move on, without throwing an error. In practice a "misbehaving operator class" is often a problem with collation versioning. I think that this should be backpatched now, to protect users from particularly nasty problems that hitting the error eventually leads to. An error ends the whole VACUUM operation. If VACUUM cannot delete the page the first time, there is no reason to think that it'll be any different on the second or the tenth attempt. The eventual result (absent user/DBA intervention) is that no antiwraparound autovacuum will ever complete, leading to an outage when the system hits xidStopLimit. (Actually this scenario won't result in the system hitting xidStopLimit where the failsafe is available, but that's another thing that is only in 14, so that's not any help.) This seems low risk. The commit in question is very simple. It just downgrades an old 9.4-era ereport() from ERROR to LOG, and adds a "return false;" immediately after that. The function in question is fundamentally structured in a way that allows it to back out of page deletion because of problems that are far removed from where the caller starts from. When and why we back out of page deletion is already opaque to the caller, so it's very hard to imagine a new problem caused by backpatching. Besides all this, 14 has been out for a while now. -- Peter Geoghegan
On Fri, Sep 02, 2022 at 02:13:15PM -0700, Peter Geoghegan wrote: > I think that this should be backpatched now, to protect users from > particularly nasty problems that hitting the error eventually leads > to. +1 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Fri, Sep 02, 2022 at 02:13:15PM -0700, Peter Geoghegan wrote: > Postgres 14 commit 5b861baa55 added hardening to nbtree page deletion. > This had the effect of making nbtree VACUUM robust against misbehaving > operator classes -- we just LOG the problem and move on, without > throwing an error. In practice a "misbehaving operator class" is often > a problem with collation versioning. This has been a problem for years, and still for years to come with libc updates. I am not much into this stuff, but does running VACUUM in this case help with the state of the index that used a past, now-invalid, collation (be it libc or ICU) to get a bit cleaned up? > An error ends the whole VACUUM operation. If VACUUM cannot delete the > page the first time, there is no reason to think that it'll be any > different on the second or the tenth attempt. The eventual result > (absent user/DBA intervention) is that no antiwraparound autovacuum > will ever complete, leading to an outage when the system hits > xidStopLimit. (Actually this scenario won't result in the system > hitting xidStopLimit where the failsafe is available, but that's > another thing that is only in 14, so that's not any help.) When written like that, this surely sounds extremely bad and this would need more complex chirurgy (or just running with a build that includes this patch?). > This seems low risk. The commit in question is very simple. It just > downgrades an old 9.4-era ereport() from ERROR to LOG, and adds a > "return false;" immediately after that. The function in question is > fundamentally structured in a way that allows it to back out of page > deletion because of problems that are far removed from where the > caller starts from. When and why we back out of page deletion is > already opaque to the caller, so it's very hard to imagine a new > problem caused by backpatching. Besides all this, 14 has been out for > a while now. Yeah, I can take it that we would have seen reports if this was an issue, and I don't recall seeing one on the community lists, at least. -- Michael
Attachment
On Fri, Sep 2, 2022 at 6:14 PM Michael Paquier <michael@paquier.xyz> wrote: > This has been a problem for years, and still for years to come with > libc updates. I am not much into this stuff, but does running VACUUM > in this case help with the state of the index that used a past, > now-invalid, collation (be it libc or ICU) to get a bit cleaned up? Yes -- nbtree VACUUM generally can cope quite well, even when the index is corrupt. It should mostly manage to do what is expected here, even with a misbehaving opclass, because it relies as little as possible on user-defined opclass code. Even without the hardening in place, nbtree VACUUM will still do a *surprisingly* good job of recovering when the opclass is broken in some way: VACUUM just needs the insertion scankey operator class code to initially determine roughly where to look for the to-be-deleted page's downlink, one level up in the tree. Even when an operator class is wildly broken (e.g. the comparator gives a result that it determines at random), we still won't see problems in nbtree VACUUM most of the time -- because even being roughly correct is good enough in practice! You have to be quite unlucky to hit this, even when the opclass is wildly broken (which is probably much less common than "moderately broken"). > When written like that, this surely sounds extremely bad and this > would need more complex chirurgy (or just running with a build that > includes this patch?). The patch will fix the case in question, which I have seen internal AWS reports about -- though the initial fix that went into 14 wasn't driven by any complaint from any user. I just happened to notice that we were throwing an ERROR in nbtree VACUUM for no good reason, which is something that should be avoided on general principle. In theory there could be other ways in which you'd run into the same basic problem (in any index AM). The important point is that we're better off not throwing any errors in the first place, but if we must then they had better not be errors that will be repeated again and again, without any chance of the problem going away naturally. (Not that it never makes sense to just throw an error; there are meaningful gradations of "totally unacceptable problem".) -- Peter Geoghegan
On Fri, Sep 2, 2022 at 6:51 PM Peter Geoghegan <pg@bowt.ie> wrote: > Yes -- nbtree VACUUM generally can cope quite well, even when the > index is corrupt. It should mostly manage to do what is expected here, > even with a misbehaving opclass, because it relies as little as > possible on user-defined opclass code. I just backpatched the hardening commit from 14 to every supported branch. -- Peter Geoghegan