RE: Newly created replication slot may be invalidated by checkpoint - Mailing list pgsql-hackers

From Hayato Kuroda (Fujitsu)
Subject RE: Newly created replication slot may be invalidated by checkpoint
Date
Msg-id OS7PR01MB14968585820C4C31575BFA5FDF5F8A@OS7PR01MB14968.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: Newly created replication slot may be invalidated by checkpoint  ("Vitaly Davydov" <v.davydov@postgrespro.ru>)
List pgsql-hackers
Dear Vitaly,

Thanks for sharing the reproducer. Agreed this is a real but minor issue.

Firstly I considered an ad-hoc way, which sets the candidate restart_lsn as
replicationSlotMinLSN before using as slot->data.restart_lsn. PSA the idea.
It can fix your reproducer.

However it still has a corner case; Assuming the checkpointer finishes computing
removal WALs before setting the restart_lsn to system_wide one, and checkpointer
tries to invalidate slots after restart_lsn of the slot is set.
In this case the checkpointer detects the creating slot and its restart_lsn is
older than oldest one. The checkpointer terminates the backend and
invalidates the slot.

This can be reproduced by moving 1) checkpoint-before-old-wal-removal to
in-between KeepLogSeg() and InvalidateObsoleteReplicationSlots(), and
2) physical-slot-reserve-wal-get-redo before the XLogMaybeSetReplicationSlotMinimumLSN().

For now, I cannot come up with the good fix. How about others?

BTW, can you update meson.build as well when you add .pl test code? Otherwise, it
cannot be run for meson builders.

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

pgsql-hackers by date:

Previous
From: Xuneng Zhou
Date:
Subject: Re: Question on pg_stat_io showing zero reads/writes for I/O workers
Next
From: Bertrand Drouvot
Date:
Subject: Re: Consistently use the XLogRecPtrIsInvalid() macro