Re: Thoughts on "killed tuples" index hint bits support on standby - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Thoughts on "killed tuples" index hint bits support on standby
Date
Msg-id CAH2-Wzn_h0Lm9fmY=E-_z07UrAk-7hDJLkMx8wV=_N3n=Bz9Pw@mail.gmail.com
Whole thread Raw
In response to Re: Thoughts on "killed tuples" index hint bits support on standby  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Thoughts on "killed tuples" index hint bits support on standby
List pgsql-hackers
On Sat, Jan 30, 2021 at 5:39 PM Peter Geoghegan <pg@bowt.ie> wrote:
> If you invent some entirely new category of standby-only hint bit at a
> level below the access method code, then you can use it inside access
> method code such as nbtree. Maybe you don't have to play games with
> minRecoveryPoint in code like the "if (RecoveryInProgress())" path
> from the XLogNeedsFlush() function. Maybe you can do some kind of
> rudimentary "masking" for the in recovery case at the point that we
> *write out* a buffer (*not* at the point hint bits are initially set)
> -- maybe this could happen near to or inside FlushBuffer(), and maybe
> only when checksums are enabled? I'm unsure.

I should point out that hint bits in heap pages are really not like
LP_DEAD bits in indexes -- if they're similar at all then the
similarity is only superficial/mechanistic. In fact, the term "hint
bits in indexes" does not seem useful at all to me, for this reason.

Heap hint bits indicate whether or not the xmin or xmax in a heap
tuple header committed or aborted. We cache the commit or abort status
of one particular XID in the heap tuple header. Notably, this
information alone tells us nothing about whether or not the tuple
should be visible to any given MVCC snapshot. Except perhaps in cases
involving aborted transactions -- but that "exception" is just a
limited special case (and less than 1% of transactions are aborted in
almost all environments anyway).

In contrast, a set LP_DEAD bit in an index is all the information we
need to know that the tuple is dead, and can be ignored completely
(except during hot standby, where at least today we assume nothing
about the status of the tuple, since that would be unsafe). Generally
speaking, the index page LP_DEAD bit is "direct" visibility
information about the tuple, not information about XIDs that are
stored in the tuple header. So a set LD_DEAD bit in an index is
actually like an LP_DEAD-set line pointer in the heap (that's the
closest equivalent in the heap, by far). It's also like a frozen heap
tuple (except it's dead-to-all, not live-to-all).

The difference may be subtle, but it's important here -- it justifies
inventing a whole new type of LP_DEAD-style status bit that gets set
only on standbys. Even today, heap tuples can have hint bits
"independently" set on standbys, subject to certain limitations needed
to avoid breaking things like data page checksums. Hint bits are
ultimately just a thing that remembers the status of transactions that
are known committed or aborted, and so can be set immediately after
the relevant xact commits/aborts (at least on the primary, during
original execution). A long-held MVCC snapshot is never a reason to
not set a hint bit in a heap tuple header (during original execution
or during hot standby/recovery). Of course, a long-held MVCC snapshot
*is* often a reason why we cannot set an LP_DEAD bit in an index.

Conclusion: The whole minRecoveryPoint question that you're trying to
answer to improve things for your patch is just the wrong question.
Because LP_DEAD bits in indexes are not *true* "hint bits". Maybe it
would be useful to set "true hint bits" on standbys earlier, and maybe
thinking about minRecoveryPoint would help with that problem, but that
doesn't help your index-related patch -- because indexes simply don't
have true hint bits.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Recording foreign key relationships for the system catalogs
Next
From: Noah Misch
Date:
Subject: Re: Why does creating logical replication subscriptions require superuser?