Re: Thoughts on "killed tuples" index hint bits support on standby - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Thoughts on "killed tuples" index hint bits support on standby |
Date | |
Msg-id | CAH2-Wzn_h0Lm9fmY=E-_z07UrAk-7hDJLkMx8wV=_N3n=Bz9Pw@mail.gmail.com Whole thread Raw |
In response to | Re: Thoughts on "killed tuples" index hint bits support on standby (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Thoughts on "killed tuples" index hint bits support on standby
|
List | pgsql-hackers |
On Sat, Jan 30, 2021 at 5:39 PM Peter Geoghegan <pg@bowt.ie> wrote: > If you invent some entirely new category of standby-only hint bit at a > level below the access method code, then you can use it inside access > method code such as nbtree. Maybe you don't have to play games with > minRecoveryPoint in code like the "if (RecoveryInProgress())" path > from the XLogNeedsFlush() function. Maybe you can do some kind of > rudimentary "masking" for the in recovery case at the point that we > *write out* a buffer (*not* at the point hint bits are initially set) > -- maybe this could happen near to or inside FlushBuffer(), and maybe > only when checksums are enabled? I'm unsure. I should point out that hint bits in heap pages are really not like LP_DEAD bits in indexes -- if they're similar at all then the similarity is only superficial/mechanistic. In fact, the term "hint bits in indexes" does not seem useful at all to me, for this reason. Heap hint bits indicate whether or not the xmin or xmax in a heap tuple header committed or aborted. We cache the commit or abort status of one particular XID in the heap tuple header. Notably, this information alone tells us nothing about whether or not the tuple should be visible to any given MVCC snapshot. Except perhaps in cases involving aborted transactions -- but that "exception" is just a limited special case (and less than 1% of transactions are aborted in almost all environments anyway). In contrast, a set LP_DEAD bit in an index is all the information we need to know that the tuple is dead, and can be ignored completely (except during hot standby, where at least today we assume nothing about the status of the tuple, since that would be unsafe). Generally speaking, the index page LP_DEAD bit is "direct" visibility information about the tuple, not information about XIDs that are stored in the tuple header. So a set LD_DEAD bit in an index is actually like an LP_DEAD-set line pointer in the heap (that's the closest equivalent in the heap, by far). It's also like a frozen heap tuple (except it's dead-to-all, not live-to-all). The difference may be subtle, but it's important here -- it justifies inventing a whole new type of LP_DEAD-style status bit that gets set only on standbys. Even today, heap tuples can have hint bits "independently" set on standbys, subject to certain limitations needed to avoid breaking things like data page checksums. Hint bits are ultimately just a thing that remembers the status of transactions that are known committed or aborted, and so can be set immediately after the relevant xact commits/aborts (at least on the primary, during original execution). A long-held MVCC snapshot is never a reason to not set a hint bit in a heap tuple header (during original execution or during hot standby/recovery). Of course, a long-held MVCC snapshot *is* often a reason why we cannot set an LP_DEAD bit in an index. Conclusion: The whole minRecoveryPoint question that you're trying to answer to improve things for your patch is just the wrong question. Because LP_DEAD bits in indexes are not *true* "hint bits". Maybe it would be useful to set "true hint bits" on standbys earlier, and maybe thinking about minRecoveryPoint would help with that problem, but that doesn't help your index-related patch -- because indexes simply don't have true hint bits. -- Peter Geoghegan
pgsql-hackers by date: