Hi hackers,
While working on [1], we discovered (thanks Alexander for the testing) that an
conflicting active logical slot on a standby could be "terminated" without
leading to an "obsolete" message (see [2]).
Indeed, in case of an active slot we proceed in 2 steps in
InvalidatePossiblyObsoleteSlot():
- terminate the backend holding the slot
- report the slot as obsolete
This is racy because between the two we release the mutex on the slot, which
means that the slot's effective_xmin and effective_catalog_xmin could advance
during that time (leading to exit the loop).
I think that holding the mutex longer is not an option (given what we'd to do
while holding it) so the attached proposal is to record the effective_xmin and
effective_catalog_xmin instead that was used during the backend termination.
[1]: https://www.postgresql.org/message-id/flat/bf67e076-b163-9ba3-4ade-b9fc51a3a8f6%40gmail.com
[2]: https://www.postgresql.org/message-id/7c025095-5763-17a6-8c80-899b35bd0459%40gmail.com
Looking forward to your feedback,
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com