Thread: Re: Correcting freeze conflict horizon calculation

Re: Correcting freeze conflict horizon calculation

From

Peter Geoghegan

Date:

29 May, 18:37:04

On Thu, May 29, 2025 at 10:57 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> However, we calculate the snapshot conflict horizon for the
> prune/freeze record in master/17 and the freeze record in 16 before
> updating all-frozen and all-visible. That means the horizon may be too
> aggressive if the page cannot actually be set all-frozen. This isn't a
> correctness issue, but it may lead to transactions on the standby
> being unnecessarily canceled for pages which have some tuples frozen
> but not all due to remaining LP_DEAD items.

I don't follow.

Your concern is that the horizon might be overly aggressive/too
conservative. But your patch (for 16) makes us take the
don't-use-snapshotConflictHorizon-twice block *less* frequently (and
the "use OldestXmin conservatively" block *more* frequently):

- if (prunestate->all_visible && prunestate->all_frozen)
+ if (prunestate->all_visible && prunestate->all_frozen && lpdead_items == 0)
{
    /* Using same cutoff when setting VM is now unnecessary */
    snapshotConflictHorizon = prunestate->visibility_cutoff_xid;
    prunestate->visibility_cutoff_xid = InvalidTransactionId;
}
else
{
    /* Avoids false conflicts when hot_standby_feedback in use */
    snapshotConflictHorizon = vacrel->cutoffs.OldestXmin;
    TransactionIdRetreat(snapshotConflictHorizon);
}

How can taking the "Avoids false conflicts when hot_standby_feedback
in use" path more often result in fewer unnecessary conflicts on
standbys? Isn't it the other way around?

--
Peter Geoghegan

Re: Correcting freeze conflict horizon calculation

From

Melanie Plageman

Date:

31 May, 00:16:07

On Thu, May 29, 2025 at 11:37 AM Peter Geoghegan <pg@bowt.ie> wrote:
>
> Your concern is that the horizon might be overly aggressive/too
> conservative. But your patch (for 16) makes us take the
> don't-use-snapshotConflictHorizon-twice block *less* frequently (and
> the "use OldestXmin conservatively" block *more* frequently):
>
> - if (prunestate->all_visible && prunestate->all_frozen)
> + if (prunestate->all_visible && prunestate->all_frozen && lpdead_items == 0)
> {
>     /* Using same cutoff when setting VM is now unnecessary */
>     snapshotConflictHorizon = prunestate->visibility_cutoff_xid;
>     prunestate->visibility_cutoff_xid = InvalidTransactionId;
> }
> else
> {
>     /* Avoids false conflicts when hot_standby_feedback in use */
>     snapshotConflictHorizon = vacrel->cutoffs.OldestXmin;
>     TransactionIdRetreat(snapshotConflictHorizon);
> }
>
> How can taking the "Avoids false conflicts when hot_standby_feedback
> in use" path more often result in fewer unnecessary conflicts on
> standbys? Isn't it the other way around?

You're right. I forgot that the visibility_cutoff_xid will be older
than OldestXmin when all the tuples are going to be frozen. I
associate the visibility_cutoff_xid with being younger/newer than
OldestXmin because it often will be when there are newer live tuples
we don't freeze.

And, in the case where the page is not actually all-frozen because of
LP_DEAD items we haven't accounted for yet in the value of all_frozen,
they wouldn't affect the freeze record's snapshot conflict horizon in
16 because they won't be frozen and thus unaffected by the WAL record
and in the case of the prune/freeze WAL record in 17/master, I
calculate the newer of the latest_xid_removed and the snapshot
conflict horizon calculated for freezing, so it would also be fine.

- Melanie

Re: Correcting freeze conflict horizon calculation

From

Peter Geoghegan

Date:

31 May, 00:34:27

On Fri, May 30, 2025 at 5:16 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> On Thu, May 29, 2025 at 11:37 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > How can taking the "Avoids false conflicts when hot_standby_feedback
> > in use" path more often result in fewer unnecessary conflicts on
> > standbys? Isn't it the other way around?
>
> You're right. I forgot that the visibility_cutoff_xid will be older
> than OldestXmin when all the tuples are going to be frozen.

I added an assertion in a number of places that
"Assert(!TransactionIdIsValid(visibility_cutoff_xid))" when we go to
set a page all-frozen in the VM -- it's irrelvant, and recovery cannot
possibly need a conflict when it replays the record. This seemed
logical, since an all-frozen page must already be visible to every
possible MVCC snapshot (it might also have some small performance
benefit, though that's not really why I did it).

> I associate the visibility_cutoff_xid with being younger/newer than
> OldestXmin because it often will be when there are newer live tuples
> we don't freeze.

I'm not sure what you mean by this. visibility_cutoff_xid can only be
set using tuples that HTSV says are HEAPTUPLE_LIVE according to
VACUUM's OldestXmin cutoff. Personally I find it natural to think of
visibility_cutoff_xid as meaningless unless we're actually able to
apply it in some way.

--
Peter Geoghegan

Re: Correcting freeze conflict horizon calculation

From

Peter Geoghegan

Date:

31 May, 01:10:11

On Fri, May 30, 2025 at 5:57 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> I don't see how OldestXmin comes into play with the visibility_cutoff_xid.

Code in heap_page_is_all_visible() (and other place, I guess the other
one is in pruneheap.c now) have a separate OldestXmin test:

/*
 * The inserter definitely committed. But is it old enough
 * that everyone sees it as committed?
 */
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
if (!TransactionIdPrecedes(xmin,
                           vacrel->cutoffs.OldestXmin))
{
    all_visible = false;
    *all_frozen = false;
    break;
}

Once we "break" here, it doesn't matter what visibility_cutoff_xid has
been set to. It cannot be used for any purpose.

--
Peter Geoghegan

Re: Correcting freeze conflict horizon calculation

From

Melanie Plageman

Date:

02 June, 21:50:06

On Thu, May 29, 2025 at 11:37 AM Peter Geoghegan <pg@bowt.ie> wrote:
>
> Your concern is that the horizon might be overly aggressive/too
> conservative. But your patch (for 16) makes us take the
> don't-use-snapshotConflictHorizon-twice block *less* frequently (and
> the "use OldestXmin conservatively" block *more* frequently):
>
> - if (prunestate->all_visible && prunestate->all_frozen)
> + if (prunestate->all_visible && prunestate->all_frozen && lpdead_items == 0)
> {
>     /* Using same cutoff when setting VM is now unnecessary */
>     snapshotConflictHorizon = prunestate->visibility_cutoff_xid;
>     prunestate->visibility_cutoff_xid = InvalidTransactionId;
> }
> else
> {
>     /* Avoids false conflicts when hot_standby_feedback in use */
>     snapshotConflictHorizon = vacrel->cutoffs.OldestXmin;
>     TransactionIdRetreat(snapshotConflictHorizon);
> }
>
> How can taking the "Avoids false conflicts when hot_standby_feedback
> in use" path more often result in fewer unnecessary conflicts on
> standbys? Isn't it the other way around?

Having discussed this and updated my understanding, now I realize I
don't understand when it would be unsafe to use
prunestate->visibility_cutoff_xid as the snapshot conflict horizon for
the freeze record.

As you've explained, it will always be <= OldestXmin. And, if the
record only freezes tuples (meaning it makes no other changes to the
page) and all of those tuples' xmins were considered when calculating
prunestate->visibility_cutoff_xid, then, even if the page isn't
all-frozen, how could it be incorrect to use the
prunestate->visibility_cutoff_xid as the horizon? Why do we use
OldestXmin when the page wouldn't be all-frozen?

- Melanie

Re: Correcting freeze conflict horizon calculation

From

Peter Geoghegan

Date:

02 June, 22:05:12

On Mon, Jun 2, 2025 at 2:50 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> As you've explained, it will always be <= OldestXmin. And, if the
> record only freezes tuples (meaning it makes no other changes to the
> page) and all of those tuples' xmins were considered when calculating
> prunestate->visibility_cutoff_xid, then, even if the page isn't
> all-frozen, how could it be incorrect to use the
> prunestate->visibility_cutoff_xid as the horizon?

Obviously, it isn't safe to use prunestate->visibility_cutoff_xid
unless it has actually been maintained using all of the tuples on the
page. It is primarily intended for use by the VM record (though it
doesn't have to continue to work that way, of course).

> Why do we use
> OldestXmin when the page wouldn't be all-frozen?

It has always worked that way (except that it was FreezeLimit, not
OldestXmin, since we could only ever freeze precisely those tuples <
FreezeLimit before Postgres 16). Using
prunestate->visibility_cutoff_xid instead (when that was possible) was
a big improvement.

You're right that in principle we could safely use a conflict horizon
XID before OldestXmin in cases where we still give up and just use
OldestXmin (cases where we just use "OldestXmin - 1", I should say).
I'm not sure how much this matters in practice; I imagine that using
prunestate->visibility_cutoff_xid is very much the common case with
page-level freezing.

--
Peter Geoghegan

Re: Correcting freeze conflict horizon calculation

From

Melanie Plageman

Date:

02 June, 22:30:42

On Mon, Jun 2, 2025 at 3:05 PM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Mon, Jun 2, 2025 at 2:50 PM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> > As you've explained, it will always be <= OldestXmin. And, if the
> > record only freezes tuples (meaning it makes no other changes to the
> > page) and all of those tuples' xmins were considered when calculating
> > prunestate->visibility_cutoff_xid, then, even if the page isn't
> > all-frozen, how could it be incorrect to use the
> > prunestate->visibility_cutoff_xid as the horizon?
>
> Obviously, it isn't safe to use prunestate->visibility_cutoff_xid
> unless it has actually been maintained using all of the tuples on the
> page. It is primarily intended for use by the VM record (though it
> doesn't have to continue to work that way, of course).

Right, if the page isn't all-visible, we don't continue to keep
visibility_cutoff_xid up to date. But I didn't understand why we
didn't just keep it up to date regardless of whether the page is
all-visible and use it in the freeze record. It doesn't seem like all
that much extra computation.

> > Why do we use
> > OldestXmin when the page wouldn't be all-frozen?
>
> It has always worked that way (except that it was FreezeLimit, not
> OldestXmin, since we could only ever freeze precisely those tuples <
> FreezeLimit before Postgres 16). Using
> prunestate->visibility_cutoff_xid instead (when that was possible) was
> a big improvement.
>
> You're right that in principle we could safely use a conflict horizon
> XID before OldestXmin in cases where we still give up and just use
> OldestXmin (cases where we just use "OldestXmin - 1", I should say).
> I'm not sure how much this matters in practice; I imagine that using
> prunestate->visibility_cutoff_xid is very much the common case with
> page-level freezing.

Yes, I guess what I mean is that, like pruning keeps track of the
newest removed xmax, what would make the most sense to me is for
freezing to keep track of and use the newest frozen xmin as the
conflict horizon.

I understand that if I wanted to actually use the newest frozen xmin
as the conflict horizon when the page is not all-frozen, I'd need a
new counter since visibility_cutoff_xid is calculated across all live
tuples older than OldestXmin -- not just ones we'll freeze. But, for
cases when a few tuples are frozen on the page, it seems like it could
be worth it?

- Melanie

Re: Correcting freeze conflict horizon calculation

From

Peter Geoghegan

Date:

02 June, 22:40:00

On Mon, Jun 2, 2025 at 3:30 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> I understand that if I wanted to actually use the newest frozen xmin
> as the conflict horizon when the page is not all-frozen, I'd need a
> new counter since visibility_cutoff_xid is calculated across all live
> tuples older than OldestXmin -- not just ones we'll freeze.

The idea of always calculating precisely the oldest possible conflict
horizon XID that is still safe when performing freezing seems
reasonable to me. I'm certainly not opposed.

> But, for cases when a few tuples are frozen on the page, it seems like it could
> be worth it?

In general, I don't expect that we're all that likely to freeze some
tuples on the page, without being able to subsequently mark the whole
page as all-frozen in the VM. Obviously it happens more often when
VACUUM FREEZE is used -- though that works best as an argument against
VACUUM FREEZE.

A scheme like the one you're thinking of might be worth the
implementation effort if it ended up being simpler. As you pointed
out, we're already doing almost the same thing for pruning. Now that
pruning and freezing both happen in the same place, it might make
sense to do it just to make things more consistent.

--
Peter Geoghegan

Re: Correcting freeze conflict horizon calculation

From

Melanie Plageman

Date:

02 June, 22:40:05

On Mon, Jun 2, 2025 at 3:05 PM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Mon, Jun 2, 2025 at 2:50 PM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> > As you've explained, it will always be <= OldestXmin. And, if the
> > record only freezes tuples (meaning it makes no other changes to the
> > page) and all of those tuples' xmins were considered when calculating
> > prunestate->visibility_cutoff_xid, then, even if the page isn't
> > all-frozen, how could it be incorrect to use the
> > prunestate->visibility_cutoff_xid as the horizon?
>
> Obviously, it isn't safe to use prunestate->visibility_cutoff_xid
> unless it has actually been maintained using all of the tuples on the
> page. It is primarily intended for use by the VM record (though it
> doesn't have to continue to work that way, of course).

Oh, and, more specifically, in my previous email, I was wondering if,
and why, in 16 this diff wouldn't be correct

diff --git a/src/backend/access/heap/vacuumlazy.c
b/src/backend/access/heap/vacuumlazy.c
index 9fa88960ada..f1831bba95c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1833,7 +1833,7 @@ retry:
             * once we're done with it.  Otherwise we generate a conservative
             * cutoff by stepping back from OldestXmin.
             */
-           if (prunestate->all_visible && prunestate->all_frozen)
+           if (prunestate->all_visible)
            {
                /* Using same cutoff when setting VM is now unnecessary */
                snapshotConflictHorizon = prunestate->visibility_cutoff_xid;

- Melanie

Re: Correcting freeze conflict horizon calculation

From

Peter Geoghegan

Date:

02 June, 22:49:06

On Mon, Jun 2, 2025 at 3:40 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> Oh, and, more specifically, in my previous email, I was wondering if,
> and why, in 16 this diff wouldn't be correct

I *think* that it would be correct.

Again, it is certainly possible to make the conflict horizon precisely
the oldest safe value in all cases. How much that actually buys you
seems much less clear.

--
Peter Geoghegan

Re: Correcting freeze conflict horizon calculation

From

Peter Geoghegan

Date:

02 June, 23:05:51

On Mon, Jun 2, 2025 at 4:00 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> I think, however, that I can't avoid keeping a separate counter for
> the horizon for the VM record. Pruning and freezing horizon is the
> newest "modified" (pruned or frozen) tuple xid, whereas the VM record
> needs the newest live committed tuple's xmin <= OldestXmin. So,
> perhaps maintaining multiple counters is unavoidable.

Unlike pruning of dead tuples, freezing doesn't always happen to the
maximum possible extent that OldestXmin will allow. Presumably you'll
want to keep the ability to delay the decision to freeze or not freeze
until the last possible moment. I think that supporting that
requirement necessitates using multiple conflict horizon counters (you
don't know which one you'll end up using until the last possible
moment).

> So, I wasn't actually planning (originally) to write a patch to try
> and change the horizon to make it older in more cases when it's
> correct. I'm trying to figure out the most straightforward code to
> calculate the combined snapshot conflict horizon for a prune/freeze/vm
> update record.

I figured that that was what this was really about.

--
Peter Geoghegan