Re: xid wraparound danger due to INDEX_CLEANUP false - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: xid wraparound danger due to INDEX_CLEANUP false |
Date | |
Msg-id | 20200416224901.b24n2vio42lypocs@alap3.anarazel.de Whole thread Raw |
In response to | Re: xid wraparound danger due to INDEX_CLEANUP false (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: xid wraparound danger due to INDEX_CLEANUP false
Re: xid wraparound danger due to INDEX_CLEANUP false |
List | pgsql-hackers |
Hi, On 2020-04-16 13:28:00 -0700, Peter Geoghegan wrote: > > And, what's worse, in the INDEX_CLEANUP off case, future VACUUMs with > > INDEX_CLEANUP on might not even visit the index. As there very well > > might not be many dead heap tuples around anymore (previous vacuums with > > cleanup off will have removed them), the > > vacuum_cleanup_index_scale_factor logic may prevent index vacuums. In > > contrast to the normal situations where the btm_oldest_btpo_xact check > > will prevent that from becoming a problem. > > I guess that they should visit the metapage to see if they need to do > that much. That would allow us to fix the problem while mostly > honoring INDEX_CLEANUP off, I think. Yea. _bt_vacuum_needs_cleanup() needs to check if metad->btm_oldest_btpo_xact is older than the FreezeLimit computed by vacuum_set_xid_limits() and vacuum the index if so even if INDEX_CLEANUP false. > BWT, a lot of people get confused about what half-dead pages are. I > would like to make something clear that may not be obvious: While it's > bad that the implementation leaks pages that should go in the FSM, > it's not the end of the world. They should get evicted from > shared_buffers pretty quickly if there is any pressure, and impose no > real cost on index scans. Yea, half-dead pages aren't the main problem. It's pages that contain only dead tuples, but aren't unlinked from the tree. Without a vacuum scan we'll never reuse them - even if we know they're all dead. Note that the page being in the FSM is not protection against wraparound :(. We recheck whether a page is recyclable when getting it from the FSM (probably required, due to the FSM not being crashsafe). It's of course much less likely to happen at that stage, because the pages can get reused. I think we really just stop being miserly and update the xid to be FrozenTransactionId or InvalidTransactionId when vacuum encounters one that's from before the the xid cutoff used by vacuum (i.e. what could become the new relfrozenxid). That seems like it'd be a few lines, not more. > Another thing that's worth pointing out is that this whole > RecentGlobalXmin business is how we opted to implement what Lanin & > Sasha call "the drain technique". It is rather different to the usual > ways in which we use RecentGlobalXmin. We're only using it as a proxy > (an absurdly conservative proxy) for whether or not there might be an > in-flight index scan that lands on a concurrently recycled index page > and gets completely confused. So it is purely about the integrity of > the data structure itself. It is a consequence of doing so little > locking when descending the tree -- our index scans don't need to > couple buffer locks on the way down the tree at all. So we make VACUUM > worry about that, rather than making index scans worry about VACUUM > (though the latter design is a reasonable and common one). > > There is absolutely no reason why we have to delay recycling for very > long, even in cases with long running transactions or whatever. I > agree that it's just an accident that it works that way. VACUUM could > probably remember deleted pages, and then revisited those pages at the > end of the index vacuuming -- that might make a big difference in a > lot of workloads. Or it could chain them together as a linked list > which can be accessed much more eagerly in some cases. I think it doesn't really help meaningfully for vacuum to be a bit smarter about when to recognize pages as being recyclable. IMO the big issue is that vacuum won't be very frequent, so we'll grow the index until that time, even if there's many "effectively empty" pages. I.e. even if the killtuples logic allows us to recognize that all actual index tuples are fully dead, we'll not benefit from that unless there's a new insertion that belongs onto the "empty" page. That's fine for indexes that are updated roughly evenly across the value range, but terrible for indexes that "grow" mostly on one side, and "shrink" on the other. I'd bet it be beneficial if we were to either have scans unlink such pages directly, or if they just entered the page into the FSM and have _bt_getbuf() do the unlinking. I'm not sure if the current locking model assumes anywhere that there is only one process (vacuum) unlinking pages though? Greetings, Andres Freund
pgsql-hackers by date: