Re: [PATCH] Full support for index LP_DEAD hint bits on standby - Mailing list pgsql-hackers

From Michail Nikolaev
Subject Re: [PATCH] Full support for index LP_DEAD hint bits on standby
Date
Msg-id CANtu0oh28mX5gy5jburH+n1mcczK5_dCQnhbBnCM=Pfqh-A26Q@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Full support for index LP_DEAD hint bits on standby  (Michail Nikolaev <michail.nikolaev@gmail.com>)
Responses Re: [PATCH] Full support for index LP_DEAD hint bits on standby  (Michail Nikolaev <michail.nikolaev@gmail.com>)
List pgsql-hackers
Hello, hackers.

I think I was able to fix the issue related to minRecoveryPoint and crash recovery. To make sure standby will be consistent after crash recovery, we need to take the current value of minRecoveryPoint into account while setting LP_DEAD hints (almost the same way as it is done for *heap* hint bits already).

I have introduced new structure IndexHintBitsData:
-------
    /* guaranteed not visible for all backends */
    bool all_dead;

    /* latest removed xid if known */
    TransactionId latest_removed_xid;

     /* lsn of page where dead tuple located */
    XLogRecPtr page_lsn;
-------

This structure is filled by the `heap_hot_search_buffer` function. After, we decide to set or not `kill_prior_tuple` depending on its content (calling `IsMarkBufferDirtyIndexHintAllowed`).

For primary - it is always safe to set LP_DEAD in index if `all_dead` == true.

In the case of standby, we need to check `latest_removed_xid` (if available) first. If commit LSN of the latest removed xid is already lower than minRecoveryPoint (`XLogNeedsFlush`) - it is safe to set `kill_prior_tuple`.

Sometimes we are not sure about the latest removed xid - heap record could be marked dead by the XLOG_HEAP2_CLEAN record, for example. In such a case we check the LSN of the *heap* page containing the tuple (LSN could be updated by other transactions already - but it does not matter in that situation). If page LSN is lower than minRecoveryPoint - it is safe to set LP_DEAD in the index too. Otherwise - just leave the index tuple alive.


So, to bring it all together:

* Normal operation, proc->indexIgnoreKilledTuples is true:
      It is safe for standby to use hint bits from the primary FPI because of XLOG_INDEX_HINT_BITS_HORIZON conflict resolution.
      It is safe for standby to set its index hint bits because `ComputeXidHorizons` honors other read-only procs xmin and lowest xid on primary (`KnownAssignedXidsGetOldestXmin`).

* Normal operation, proc->indexIgnoreKilledTuples is false:
      Index hint bits are never set or taken into account.

* Crash recovery, proc->indexIgnoreKilledTuples is true:
      It is safe for standby to use hint bits from the primary FPW because XLOG_INDEX_HINT_BITS_HORIZON is always logged before FPI, and commit record of transaction removed the tuple is logged before XLOG_INDEX_HINT_BITS_HORIZON. So, if FPI with hints was flushed (and taken into account by minRecoveryPoint) - both transaction-remover and horizon records are replayed before reading queries.
      It is safe for standby to use its hint bits because they can be set only if the commit record of transaction-remover is lower than minRecoveryPoint or LSN of heap page with removed tuples is lower than minRecoveryPoint.

* Crash recovery, proc->indexIgnoreKilledTuples is false:
      Index hint bits are never set or taken into account.

So, now it seems correct to me.

Another interesting point here - now position of minRecoveryPoint affects performance a lot. It is happening already (because of *heap* hint bits) but after the patch, it is noticeable even more. Is there any sense to keep minRecoveryPoint at a low value?

Rebased and updated patch in attachment.

Will be happy if someone could recheck my ideas or even the code :)

Thanks a lot,
Michail.

Attachment

pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: Add MAIN_RELATION_CLEANUP and SECONDARY_RELATION_CLEANUP options to VACUUM
Next
From: Michail Nikolaev
Date:
Subject: Re: Thoughts on "killed tuples" index hint bits support on standby