Re: xid wraparound danger due to INDEX_CLEANUP false - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: xid wraparound danger due to INDEX_CLEANUP false
Date
Msg-id CAH2-WzmvNqh=wdwikT8zrH6WjC7HNxEBUvLProNF9-cHE7aHvg@mail.gmail.com
Whole thread Raw
In response to Re: xid wraparound danger due to INDEX_CLEANUP false  (Andres Freund <andres@anarazel.de>)
Responses Re: xid wraparound danger due to INDEX_CLEANUP false  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Thu, Apr 16, 2020 at 11:27 AM Andres Freund <andres@anarazel.de> wrote:
> Sure, there is some pre-existing wraparound danger for individual
> pages. But it's a pretty narrow corner case before INDEX_CLEANUP
> off.

It's a matter of degree. Hard to judge something like that.

> And, what's worse, in the INDEX_CLEANUP off case, future VACUUMs with
> INDEX_CLEANUP on might not even visit the index. As there very well
> might not be many dead heap tuples around anymore (previous vacuums with
> cleanup off will have removed them), the
> vacuum_cleanup_index_scale_factor logic may prevent index vacuums. In
> contrast to the normal situations where the btm_oldest_btpo_xact check
> will prevent that from becoming a problem.

I guess that they should visit the metapage to see if they need to do
that much. That would allow us to fix the problem while mostly
honoring INDEX_CLEANUP off, I think.

> Peter, as far as I can tell, with INDEX_CLEANUP off, nbtree will never
> be able to recycle half-dead pages? And thus would effectively never
> recycle any dead space? Is that correct?

I agree. The fact that btm_oldest_btpo_xact is an all-or-nothing thing
(with wraparound hazards) is bad in itself, and introduced new risk to
v11 compared to previous versions (without the INDEX_CLEANUP = off
feature entering into it).  The simple fact that we don't even check
it with INDEX_CLEANUP = off is a bigger problem, though, and one that
now seems unrelated.

BWT, a lot of people get confused about what half-dead pages are. I
would like to make something clear that may not be obvious: While it's
bad that the implementation leaks pages that should go in the FSM,
it's not the end of the world. They should get evicted from
shared_buffers pretty quickly if there is any pressure, and impose no
real cost on index scans.

There are (roughly) 3 types of pages that we're concerned about here
in the common case where we're just deleting a leaf page:

* A half-dead page -- no downlink in its parent, marked dead.

* A deleted page -- now no sidelinks, either. Not initially safe to recycle.

* A deleted page in the FSM -- this is what we have the interlock for.

Half-dead pages are pretty rare, because VACUUM really has to have a
hard crash for that to happen (that might not be 100% true, but it's
at least 99% true). That's always been the case, and we don't really
need to talk about them here at all. We're just concerned with deleted
pages in the context of this discussion (and whether or not they can
be recycled without confusing in-flight index scans). These are the
only pages that are marked with an XID at all.

Another thing that's worth pointing out is that this whole
RecentGlobalXmin business is how we opted to implement what Lanin &
Sasha call "the drain technique". It is rather different to the usual
ways in which we use RecentGlobalXmin. We're only using it as a proxy
(an absurdly conservative proxy) for whether or not there might be an
in-flight index scan that lands on a concurrently recycled index page
and gets completely confused. So it is purely about the integrity of
the data structure itself. It is a consequence of doing so little
locking when descending the tree -- our index scans don't need to
couple buffer locks on the way down the tree at all. So we make VACUUM
worry about that, rather than making index scans worry about VACUUM
(though the latter design is a reasonable and common one).

There is absolutely no reason why we have to delay recycling for very
long, even in cases with long running transactions or whatever. I
agree that it's just an accident that it works that way. VACUUM could
probably remember deleted pages, and then revisited those pages at the
end of the index vacuuming -- that might make a big difference in a
lot of workloads. Or it could chain them together as a linked list
which can be accessed much more eagerly in some cases.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Kuntal Ghosh
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Next
From: Andrew Dunstan
Date:
Subject: Re: cleaning perl code