Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae - Mailing list pgsql-bugs

From Noah Misch
Subject Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
Date
Msg-id 20240522235901.83@rfd.leadboat.com
Whole thread Raw
In response to Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae  (Melanie Plageman <melanieplageman@gmail.com>)
List pgsql-bugs
On Wed, May 22, 2024 at 12:57:26PM -0400, Melanie Plageman wrote:
> On Mon, May 20, 2024 at 4:48 PM Noah Misch <noah@leadboat.com> wrote:
> > On Mon, May 20, 2024 at 11:58:23AM -0400, Melanie Plageman wrote:
> > > On Sat, May 18, 2024 at 6:23 PM Noah Misch <noah@leadboat.com> wrote:
> > > > I looked at each mention of oldestxmin in vacuumlazy.c and vacuum.c.
> > > > Does the following heap_vacuum_rel() comment need an update?
> > > >
> > > >         /*
> > > >          * Get cutoffs that determine which deleted tuples are considered DEAD,
> > > >          * not just RECENTLY_DEAD, and which XIDs/MXIDs to freeze.  Then determine
> >
> > Since 1ccc1e05ae, the cutoffs (values acquired in vacuum_get_cutoffs()) don't
> > decide DEAD.
> 
> Thinking about it more, this is misleading either way. The values
> acquired in vacuum_get_cutoffs() don't determine which tuples are
> removed because they are considered HEAPTUPLE_DEAD. If they are
> considered HEAPTUPLE_RECENTLY_DEAD by VisTest and HEAPTUPLE_DEAD when
> compared to OldestXmin, there will be a hang.

Agreed.

> > > >          * the extent of the blocks that we'll scan in lazy_scan_heap.  It has to
> >
> > Still accurate.
> 
> I thought about this more, and I actually don't understand what "Then
> determine the extent of the blocks that we'll scan in lazy_scan_heap"
> means. I assume it refers to RelationGetNumberOfBlocks() which is
> called after vacuum_get_cutoffs().

I think it does refer to that.

> But, if so, I have no idea how that
> relates to the following sentence which says that it "has to happen in
> this order to ensure that the OldestXmin cutoff field works as an
> upper bound"...

When the comment says "pages we'll actually scan", it means "pages that
existed at the start of VACUUM" or equivalently "pages of the rel except those
pages extended after this point in time".  I bet that phrase of the comment
predates the other ways VACUUM can now opt not to visit specific pages.

The lines where order is important:

    vacrel->aggressive = vacuum_get_cutoffs(rel, params, &vacrel->cutoffs);
    vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel);

Suppose the first line sets vacrel->cutoffs.OldestXmin=1234 and the second
line sets orig_rel_pages=10.  Suppose, by the end of the VACUUM, the rel size
has grown to 15 pages.  The last 5 pages can contain only XIDs >= 1234.  We
can assume that without scanning those pages because of the order of those two
lines.  If we swapped the order of those two lines, XID=1233 could extend the
rel and commit between the two lines, without any cutoff reflecting that.

> I don't think we can say anything about the XIDs of tuples on pages we
> don't scan. We skip pages if they are all visible. We don't know that
> tuples on those pages are newer than OldestXmin. That's why we can't
> update relfrozenxid when we skip pages that are not all frozen.

Agreed.  We can infer bounds only for newly-extended pages, not for pages not
scanned for some other reason.

> And even if we skip scanning pages because they are all frozen, their
> xids will be older than OldestXmin, so it wouldn't be a lower bound on
> those either.
> 
> I don't understand "snapshots don't affect this bound, so the oldest
> running XID is also a lower bound".

Assume OldestXmin=1M and OldestXid=2M, because there's a read-only transaction
with an old snapshot.  If something extends the rel after this code runs,
newly-extended pages won't add any transaction older than xid=2M.

> > > >          * expect vistest will always make heap_page_prune_and_freeze() remove any
> > > >          * deleted tuple whose xmax is < OldestXmin.
> >
> > Since 1ccc1e05ae, that sentence does not apply.
> 
> That's true. I wonder how we should modify it. Some options:
> 
> 1) change it from "always" to "usually"
> 2) remove the sentence because it is not worth mentioning OldestXmin
> in relation to removing tuples
> 3) explain the situation with GlobalVisTest moving the horizon
> backwards to before OldestXmin
> 4) change it to say vistest will always cause pruning to remove
> deleted tuples visible as deleted to everyone

Any of (2),(3), or (4) works for me.

> I also think we should change this part:
> 
>      *  lazy_scan_prune must never
>      * become confused about whether a tuple should be frozen or removed.  (In
>      * the future we might want to teach lazy_scan_prune to recompute vistest
>      * from time to time, to increase the number of dead tuples it can prune
>      * away.)
> 
> lazy_scan_prune() can't really become confused about whether a tuple
> should be frozen or removed. It may choose to freeze a tuple that was
> ruled RECENTLY_DEAD by vistest and not removed. But that is okay. I
> don't think there is any confusion.

Agreed.

> The second part I'm not sure about either. Recomputing vistest in
> lazy_scan_prune() would not cause more tuples to be removed. Doing so
> in heap_page_prune_and_freeze() for reasons other than the current
> logic in GlobalVisTestShouldUpdate() would have this effect. But, I
> can't tell if that is even what this comment is suggesting.

I read it as the following general idea.  If other backends update their xmin
values and/or end their transactions while a VACUUM is running, VACUUM could
opt to prune more as that news arrives.  VACUUM could check for such news
before starting on each page or whatever.

> > (Long-term, I favor removal of the cutoffs->OldestXmin field.  I'm not aware
> > of any calculation for which it remains the optimal value.)

For avoidance of doubt, I don't think this thread should make such a change.

> I'm trying to think if there is any complication with using
> GlobalVisState for freezing given that it could be updated during the
> course of vacuuming the relation. If we used GlobalVisState when
> determining what to freeze and maybe_needed moves backwards, perhaps
> we could end up incorrectly updating relfrozenxid?

I think it suffices to calculate relfrozenxid=min(OldestXid at VACUUM start,
all XIDs actually present in pages).  OldestXid might still insert a tuple on
any page of the rel.  OldestXid might not even start until after VACUUM
finishes.  Hence, we can't set any relfrozenxid lower than OldestXid.  But a
transaction with an old snapshot and no XID doesn't constrain relfrozenxid.
If that transaction does assign an XID, it will assign a value >= OldestXid.



pgsql-bugs by date:

Previous
From: Clemens Eisserer
Date:
Subject: Re: Re: BUG #18471: Possible JIT memory leak resulting in signal 11:Segmentation fault on ARM
Next
From: Michael Paquier
Date:
Subject: Re: BUG #18362: unaccent rules and Old Greek text