Hi,
When analyzing some issues in another thread[1], I found a bug that fast forward
decoding could lead to premature advancement of catalog_xmin, resulting in
required catalog data being removed by vacuum.
The issue arises because we do not build a base snapshot when decoding changes
during fast forward mode, preventing reference to the minimum transaction ID
that remains visible in the snapshot when determining the candidate for
catalog_xmin. As a result, catalog_xmin was directly advanced to the oldest
running transaction ID found in the latest running_xacts record.
In code-level, SnapBuildProcessChange -> ReorderBufferSetBaseSnapshot could not
be reached during fast forward decoding, resulting in
rb->txns_by_base_snapshot_lsn being NULL. When advancing catalog_xmin, the
system attempts to refer to rb->txns_by_base_snapshot_lsn via
ReorderBufferGetOldestXmin(). However, since rb->txns_by_base_snapshot_lsn is
NULL, it defaults to directly using running->oldestRunningXid as the candidate
for catalog_xmin. For reference, see the details in
SnapBuildProcessRunningXacts.
See the attachment for a test(0002) to prove that the catalog data that are
still required would be removed after premature catalog_xmin advancement during
fast forward decoding.
To fix this, I think we can allow the base snapshot to be built during fast
forward decoding, as implemented in the patch 0001 (We already built base
snapshot in fast-forward mode for logical message in logicalmsg_decode()).
I also explored the possibility of further optimizing the fix to reduce the
overhead associated with building a snapshot in fast-forward mode. E.g., to
maintain a single base_snapshot_xmin rather than generating a full snapshot for
each transaction, and use this base_snapshot_xmin when determining the
candidate catalog_xmin. However, I am not feeling confident about maintaining
some fast-forward dedicated fields in common structures and
perhaps employing different handling for catalog_xmin.
Moreover, I conducted a basic test[2] to test the patch's impact, noting that
advancing the slot incurs roughly a 4% increase in processing time after
applying the patch, which appears to be acceptable. Additionally, the cost
associated with building the snapshot via SnapBuildBuildSnapshot() did not show
up in the profile. Therefore, I think it's reasonable to refrain from
further optimization at this stage.
[1]
https://www.postgresql.org/message-id/OS3PR01MB5718ED3F0123737C7380BBAD94BB2%40OS3PR01MB5718.jpnprd01.prod.outlook.com
[2]
Create a table test(a int);
Create a slot A.
pgbench postgres -T 60 -j 100 -c 100 -f simple-insert
simple-insert:
BEGIN;
INSERT INTO test VALUES(1);
COMMIT;
use pg_replication_slot_advance to advance the slot A to the latest position
and count the time.
HEAD: Time: 4189.257 ms
PATCH: Time: 4371.358 ms
Best Regards,
Hou zj