Re: xid wraparound danger due to INDEX_CLEANUP false - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: xid wraparound danger due to INDEX_CLEANUP false |
Date | |
Msg-id | CAH2-WzmvNqh=wdwikT8zrH6WjC7HNxEBUvLProNF9-cHE7aHvg@mail.gmail.com Whole thread Raw |
In response to | Re: xid wraparound danger due to INDEX_CLEANUP false (Andres Freund <andres@anarazel.de>) |
Responses |
Re: xid wraparound danger due to INDEX_CLEANUP false
|
List | pgsql-hackers |
On Thu, Apr 16, 2020 at 11:27 AM Andres Freund <andres@anarazel.de> wrote: > Sure, there is some pre-existing wraparound danger for individual > pages. But it's a pretty narrow corner case before INDEX_CLEANUP > off. It's a matter of degree. Hard to judge something like that. > And, what's worse, in the INDEX_CLEANUP off case, future VACUUMs with > INDEX_CLEANUP on might not even visit the index. As there very well > might not be many dead heap tuples around anymore (previous vacuums with > cleanup off will have removed them), the > vacuum_cleanup_index_scale_factor logic may prevent index vacuums. In > contrast to the normal situations where the btm_oldest_btpo_xact check > will prevent that from becoming a problem. I guess that they should visit the metapage to see if they need to do that much. That would allow us to fix the problem while mostly honoring INDEX_CLEANUP off, I think. > Peter, as far as I can tell, with INDEX_CLEANUP off, nbtree will never > be able to recycle half-dead pages? And thus would effectively never > recycle any dead space? Is that correct? I agree. The fact that btm_oldest_btpo_xact is an all-or-nothing thing (with wraparound hazards) is bad in itself, and introduced new risk to v11 compared to previous versions (without the INDEX_CLEANUP = off feature entering into it). The simple fact that we don't even check it with INDEX_CLEANUP = off is a bigger problem, though, and one that now seems unrelated. BWT, a lot of people get confused about what half-dead pages are. I would like to make something clear that may not be obvious: While it's bad that the implementation leaks pages that should go in the FSM, it's not the end of the world. They should get evicted from shared_buffers pretty quickly if there is any pressure, and impose no real cost on index scans. There are (roughly) 3 types of pages that we're concerned about here in the common case where we're just deleting a leaf page: * A half-dead page -- no downlink in its parent, marked dead. * A deleted page -- now no sidelinks, either. Not initially safe to recycle. * A deleted page in the FSM -- this is what we have the interlock for. Half-dead pages are pretty rare, because VACUUM really has to have a hard crash for that to happen (that might not be 100% true, but it's at least 99% true). That's always been the case, and we don't really need to talk about them here at all. We're just concerned with deleted pages in the context of this discussion (and whether or not they can be recycled without confusing in-flight index scans). These are the only pages that are marked with an XID at all. Another thing that's worth pointing out is that this whole RecentGlobalXmin business is how we opted to implement what Lanin & Sasha call "the drain technique". It is rather different to the usual ways in which we use RecentGlobalXmin. We're only using it as a proxy (an absurdly conservative proxy) for whether or not there might be an in-flight index scan that lands on a concurrently recycled index page and gets completely confused. So it is purely about the integrity of the data structure itself. It is a consequence of doing so little locking when descending the tree -- our index scans don't need to couple buffer locks on the way down the tree at all. So we make VACUUM worry about that, rather than making index scans worry about VACUUM (though the latter design is a reasonable and common one). There is absolutely no reason why we have to delay recycling for very long, even in cases with long running transactions or whatever. I agree that it's just an accident that it works that way. VACUUM could probably remember deleted pages, and then revisited those pages at the end of the index vacuuming -- that might make a big difference in a lot of workloads. Or it could chain them together as a linked list which can be accessed much more eagerly in some cases. -- Peter Geoghegan
pgsql-hackers by date: