Thread: Why latestRemovedXid|cuteoff_xid are always sent?

Why latestRemovedXid|cuteoff_xid are always sent?

From
Michail Nikolaev
Date:
Hello, hackers.

Working on some stuff, I realized I do not understand why
latestRemovedXid|cuteoff_xid (in different types of WAL records) are
sent every time they appear on the primary side.

latestRemovedXid|cuteoff_xid is used to call
ResolveRecoveryConflictWithSnapshot and cancel conflicting backend on
Standby. In some of the cases, snapshot conflict resolving is the only
work REDO does (heap_xlog_cleanup_info
 or btree_xlog_reuse_page, for example).

Could we try to somehow optimistically advance the latest sent
latestRemovedXid value in shared memory on the primary and skip
sending it if the newer xid was sent already? In such a way we could
reduce the number of ResolveRecoveryConflictWithSnapshot calls on
Standby and even skip some WAL records.

At least we could do the same optimization on the standby side
(skipping ResolveRecoveryConflictWithSnapshot if it was called with
newer xid already).

Is it a sane idea or I have missed something huge?

Thanks,
Michail.



Re: Why latestRemovedXid|cuteoff_xid are always sent?

From
Peter Geoghegan
Date:
On Sat, Jan 2, 2021 at 8:00 AM Michail Nikolaev
<michail.nikolaev@gmail.com> wrote:
> Working on some stuff, I realized I do not understand why
> latestRemovedXid|cuteoff_xid (in different types of WAL records) are
> sent every time they appear on the primary side.
>
> latestRemovedXid|cuteoff_xid is used to call
> ResolveRecoveryConflictWithSnapshot and cancel conflicting backend on
> Standby. In some of the cases, snapshot conflict resolving is the only
> work REDO does (heap_xlog_cleanup_info
>  or btree_xlog_reuse_page, for example).

But you can say the same thing about fields from certain WAL record
types, too. It's not that uncommon for code to make a conceptually
optional piece of information into a normal WAL record struct field,
even though that approach has unnecessary space overhead in the cases
that don't need the information. Often this makes hardly any
difference due to factors like alignment and the simple fact that we
don't expect very many WAL records (with or without the optional
information) in practice.

Of course, it's possible that the question of whether or not it's
worth it has been misjudged for any given case. And maybe these
particular WAL records are one such case where somebody got it wrong,
affecting a real workload (I am ignoring the complexity of making it
work for latestRemovedXid in particular for now). But I tend to doubt
that the space saving would be noticeable, from what I've seen with
pg_waldump.

> Could we try to somehow optimistically advance the latest sent
> latestRemovedXid value in shared memory on the primary and skip
> sending it if the newer xid was sent already? In such a way we could
> reduce the number of ResolveRecoveryConflictWithSnapshot calls on
> Standby and even skip some WAL records.
>
> At least we could do the same optimization on the standby side
> (skipping ResolveRecoveryConflictWithSnapshot if it was called with
> newer xid already).

Maybe it makes sense to add a fast path to
ResolveRecoveryConflictWithSnapshot() so that it falls out early
without scanning the proc array (in cases where it will still do so
today, since of course ResolveRecoveryConflictWithSnapshot() has the
obvious InvalidTransactionId fast path already).

> Is it a sane idea or I have missed something huge?

It seems like two almost distinct ideas to me. Though the important
thing in both cases is the savings in real world conditions.

-- 
Peter Geoghegan



Re: Why latestRemovedXid|cuteoff_xid are always sent?

From
Peter Geoghegan
Date:
On Sat, Jan 2, 2021 at 3:22 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Of course, it's possible that the question of whether or not it's
> worth it has been misjudged for any given case. And maybe these
> particular WAL records are one such case where somebody got it wrong,
> affecting a real workload (I am ignoring the complexity of making it
> work for latestRemovedXid in particular for now).

BTW, what I notice with xl_btree_delete records on the master branch
is that the latestRemovedXid value in the WAL record is almost always
InvalidTransactionId ("conflict definitely unnecessary"). And even
when it isn't, the actual xid is usually much older than what we see
for nearby pruning records.

However, with the bottom-up deletion patch that I plan on committing
soon, the situation changes quite a bit. We're now regularly in a
position to delete index tuples that became dead-to-all just moments
earlier, which in practice means that there is a very high chance that
there hasn't been a heap prune for at least one or two affected heap
tuples. Now the latestRemovedXid field in xl_btree_delete can be a
relatively recent XID, which is very similar to what we see in nearby
xl_heap_clean/XLOG_HEAP2_CLEAN records. In fact there are *hardly any*
InvalidTransactionId/0 values for the xl_btree_delete latestRemovedXid
field. They become very rare, having been very common.

In short: Your intuition that the xl_btree_delete record's
latestRemovedXid value is usually not needed anyway seems correct to
me. However, that won't be true for much longer, and ISTM that this
factor eliminates any opportunity for WAL space optimizations of the
kind you were contemplating.

-- 
Peter Geoghegan



Re: Why latestRemovedXid|cuteoff_xid are always sent?

From
Michail Nikolaev
Date:
Hello, Peter.

Thanks for your explanation. One of the reasons I was asking - is an idea to use the same technique in the "LP_DEAD index hint bits on standby" WIP patch to reduce the amount of additional WAL.

Now I am sure such optimization should work correctly.

Thanks, 
Michail.