Re: Fix the race condition for updating slot minimum LSN - Mailing list pgsql-hackers

From surya poondla
Subject Re: Fix the race condition for updating slot minimum LSN
Date
Msg-id CAOVWO5oSyr_Ucgz75z0CMzWxv78ncOxjZh8FtPBQdcv+0K1xmA@mail.gmail.com
Whole thread
In response to Fix the race condition for updating slot minimum LSN  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
List pgsql-hackers
Hi Hou zj,

Thanks for the patch. The fix looks correct and the approach mirrors the one taken in commit 2a5225b for the xmin race.

I have a question about copy_replication_slot() in slotfuncs.c, the restart_lsn is written under only the spinlock, followed by ReplicationSlotsComputeRequiredLSN() the same pattern that this patch fixes. Could this path be affected by the same race?

Looking at the code, I think it is safe because create_logical_replication_slot() is called with src_restart_lsn which is always a valid LSN (if not valid, an error is thrown). Inside CreateInitDecodingContext(), since restart_lsn is valid, ReplicationSlotReserveWal() is skipped and the slot's restart_lsn is set directly to src_restart_lsn. So by the time the write of the slot happens, the destination slot already has a valid non-zero restart_lsn, the InvalidXLogRecPtr window never exists. Additionally, line the code errors out if copy_restart_lsn < src_restart_lsn, so the write never moves restart_lsn backward, meaning a concurrent scanner will always see a valid LSN and never skip this slot.

If that reasoning is correct, a comment near the ReplicationSlotsComputeRequiredLSN() in copy_replication_slot() explaining why it does not need the same protection would help future readers.

Regards,
Surya Poondla

pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Add missing period to HINT messages
Next
From: David Rowley
Date:
Subject: Re: Add bms_offset_members() function for bitshifting Bitmapsets