Hi Kuroda-san,
On Wed, Apr 02, 2025 at 07:16:25AM +0000, Hayato Kuroda (Fujitsu) wrote:
> Dear Amit, Bertrand,
>
> > You have not added any injection point for the above case. Isn't it
> > possible that if running_xact record is logged concurrently to the
> > pruning record, it should move the active slot on standby, and the
> > same failure should occur in this case as well?
>
> I considered that the timing failure can happen. Reproducer:
>
> ```
> $node_primary->safe_psql('testdb', qq[UPDATE prun SET s = 'D';]);
> +$node_primary->safe_psql('testdb', 'CHECKPOINT');
> +sleep(20);
> $node_primary->safe_psql('testdb', qq[UPDATE prun SET s = 'E';]);
> ```
Yeah, I was going to provide the exact same reproducer and then saw your email.
> Based on the fact, I've updated to use injection_points for scenario 5. Of course,
> PG16/17 patches won't use the active slot for that scenario.
Thanks for the updated patch!
As far v4-0001:
=== 1
+# would advance an active replication slot's catalog_xmin
s/would/could/? I mean the system also needs to be "slow" enough (so the
sleep() in the reproducer)
=== 2
+# Injection_point is used to avoid seeing an xl_running_xacts even here. In
+# scenario 5, we verify the case that the backend process detects the page has
+# enough tuples; thus, page pruning happens. If the record is generated just
+# before doing on-pruning, the catalog_xmin of the active slot would be
+# updated; hence, the conflict would not occur.
Not sure we need to explain what scenario 5 does here, but that does not hurt
if you feel the need.
Also maybe mention the last update in the comment and add some nuance (like
proposed in === 1), something like?
"
# Injection_point is used to avoid seeing a xl_running_xacts here. Indeed,
# if it is generated between the last 2 updates then the catalog_xmin of the active
# slot could be updated; hence, the conflict could not occur.
"
Apart from that the tests looks good to me and all the problematic scenarios
covered.
As far PG17-v4-0001:
=== 3
-# seeing a xl_running_xacts that would advance an active replication slot's
+# seeing the xl_running_xacts that would advance an active replication slot's
why?
=== 4
It looks like that check_slots_conflict_reason() is not called with checks_active_slot
as argument.
=== 5
I think that we could remove the need for the drop_active_slot parameter in
drop_logical_slots() and just check if an active slot exists (and if so drop
it). That said I'm not sure it's worth to go that far for backpatching.
As far PG16-v4:
=== 6
Same as === 3 and === 5 (=== 4 does not apply as check_slots_conflict_reason()
does not exist).
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com