Re: xid wraparound danger due to INDEX_CLEANUP false - Mailing list pgsql-hackers

From Andres Freund
Subject Re: xid wraparound danger due to INDEX_CLEANUP false
Date
Msg-id 20200416224901.b24n2vio42lypocs@alap3.anarazel.de
Whole thread Raw
In response to Re: xid wraparound danger due to INDEX_CLEANUP false  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: xid wraparound danger due to INDEX_CLEANUP false  (Peter Geoghegan <pg@bowt.ie>)
Re: xid wraparound danger due to INDEX_CLEANUP false  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2020-04-16 13:28:00 -0700, Peter Geoghegan wrote:
> > And, what's worse, in the INDEX_CLEANUP off case, future VACUUMs with
> > INDEX_CLEANUP on might not even visit the index. As there very well
> > might not be many dead heap tuples around anymore (previous vacuums with
> > cleanup off will have removed them), the
> > vacuum_cleanup_index_scale_factor logic may prevent index vacuums. In
> > contrast to the normal situations where the btm_oldest_btpo_xact check
> > will prevent that from becoming a problem.
> 
> I guess that they should visit the metapage to see if they need to do
> that much. That would allow us to fix the problem while mostly
> honoring INDEX_CLEANUP off, I think.

Yea. _bt_vacuum_needs_cleanup() needs to check if
metad->btm_oldest_btpo_xact is older than the FreezeLimit computed by
vacuum_set_xid_limits() and vacuum the index if so even if INDEX_CLEANUP
false.


> BWT, a lot of people get confused about what half-dead pages are. I
> would like to make something clear that may not be obvious: While it's
> bad that the implementation leaks pages that should go in the FSM,
> it's not the end of the world. They should get evicted from
> shared_buffers pretty quickly if there is any pressure, and impose no
> real cost on index scans.

Yea, half-dead pages aren't the main problem. It's pages that contain
only dead tuples, but aren't unlinked from the tree. Without a vacuum
scan we'll never reuse them - even if we know they're all dead.

Note that the page being in the FSM is not protection against wraparound
:(. We recheck whether a page is recyclable when getting it from the FSM
(probably required, due to the FSM not being crashsafe). It's of course
much less likely to happen at that stage, because the pages can get
reused.

I think we really just stop being miserly and update the xid to be
FrozenTransactionId or InvalidTransactionId when vacuum encounters one
that's from before the the xid cutoff used by vacuum (i.e. what could
become the new relfrozenxid).  That seems like it'd be a few lines, not
more.


> Another thing that's worth pointing out is that this whole
> RecentGlobalXmin business is how we opted to implement what Lanin &
> Sasha call "the drain technique". It is rather different to the usual
> ways in which we use RecentGlobalXmin. We're only using it as a proxy
> (an absurdly conservative proxy) for whether or not there might be an
> in-flight index scan that lands on a concurrently recycled index page
> and gets completely confused. So it is purely about the integrity of
> the data structure itself. It is a consequence of doing so little
> locking when descending the tree -- our index scans don't need to
> couple buffer locks on the way down the tree at all. So we make VACUUM
> worry about that, rather than making index scans worry about VACUUM
> (though the latter design is a reasonable and common one).
>
> There is absolutely no reason why we have to delay recycling for very
> long, even in cases with long running transactions or whatever. I
> agree that it's just an accident that it works that way. VACUUM could
> probably remember deleted pages, and then revisited those pages at the
> end of the index vacuuming -- that might make a big difference in a
> lot of workloads. Or it could chain them together as a linked list
> which can be accessed much more eagerly in some cases.

I think it doesn't really help meaningfully for vacuum to be a bit
smarter about when to recognize pages as being recyclable. IMO the big
issue is that vacuum won't be very frequent, so we'll grow the index
until that time, even if there's many "effectively empty" pages.

I.e. even if the killtuples logic allows us to recognize that all actual
index tuples are fully dead, we'll not benefit from that unless there's
a new insertion that belongs onto the "empty" page. That's fine for
indexes that are updated roughly evenly across the value range, but
terrible for indexes that "grow" mostly on one side, and "shrink" on the
other.

I'd bet it be beneficial if we were to either have scans unlink such
pages directly, or if they just entered the page into the FSM and have
_bt_getbuf() do the unlinking.  I'm not sure if the current locking
model assumes anywhere that there is only one process (vacuum) unlinking
pages though?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: remove_useless_groupby_columns does not need to record constraint dependencies
Next
From: Ranier Vilela
Date:
Subject: [PATCH] Tiny optimization on nbtinsert.c