On Mon, Feb 19, 2024 at 11:44 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> > Yeah, comments added in v3.
>
> The contents look rather OK, I may do some word-smithing for both.
Here are some comments on v3:
1.
+ XLogRecPtr initial_effective_xmin = InvalidXLogRecPtr;
+ XLogRecPtr initial_catalog_effective_xmin = InvalidXLogRecPtr;
+ XLogRecPtr initial_restart_lsn = InvalidXLogRecPtr;
Prefix 'initial_' makes the variable names a bit longer, I think we
can just use effective_xmin, catalog_effective_xmin and restart_lsn,
the code updating then only when if (!terminated) tells one that they
aren't updated every time.
2.
+ /*
+ * We'll release the slot's mutex soon, so it's possible that
+ * those values change since the process holding the slot has been
+ * terminated (if any), so record them here to ensure we would
+ * report the slot as obsolete correctly.
+ */
This needs a bit more info as to why and how effective_xmin,
catalog_effective_xmin and restart_lsn can move ahead after signaling
a backend and before the signalled backend reports back.
3.
+ /*
+ * Assert that the conflict cause recorded previously before we
+ * terminate the process did not change now for any reason.
+ */
+ Assert(!(conflict_prev != RS_INVAL_NONE && terminated &&
+ conflict_prev != conflict));
It took a while for me to understand the above condition, can we
simplify it like below using De Morgan's laws for better readability?
Assert((conflict_prev == RS_INVAL_NONE) || !terminated ||
(conflict_prev == conflict));
> > Are we going to fix this on master and 16 stable first and then later on add a
> > test on master with the injection points?
>
> Still, the other patch is likely going to take a couple of weeks
> before getting into the tree, so I have no objection to fix the bug
> first and backpatch, then introduce a test. Things have proved that
> failures could show up in the buildfarm in this area, so more time
> running this code across two branches is not a bad concept, either.
While I couldn't agree more on getting this fix in, it's worth pulling
in the required injection points patch here and writing the test to
reproduce this race issue.
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com