Re: Fix 035_standby_logical_decoding.pl race conditions - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: Fix 035_standby_logical_decoding.pl race conditions
Date
Msg-id Z/OWmslXU1zz4izG@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to RE: Fix 035_standby_logical_decoding.pl race conditions  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
List pgsql-hackers
Hi Kuroda-san,

On Mon, Apr 07, 2025 at 06:15:13AM +0000, Hayato Kuroda (Fujitsu) wrote:
> I had been debugging and found the case that VACUUM FULL also has a timing issue.
> This means the we cannot keep the testcase.
> 
> PSA the reproducer for PG17. IIUC this can happen even in PG16.
> I considered what happened here;
> 
> 1. Run a CHECKPOINT and wait sometime in wait_until_vacuum_can_remove().
>    This ensures that RUNNING_XACTS record can be generated and catalog_xmin can
>    be advanced after the user SQLs.
> 2. Assuming that another RUNNING_XACTS record is generated *WHILE* doing a VACUUM
>    FULL. This can be done by the periodic checkpoint or the reproducer.
> 3. Logical walsender detects the RUNNING_XACTS record.
>    Note that this must be done before startup tries to invalidate slot.
> 4. In sometime the walsender receives the ack and advance the catalog_xmin.
>    Note again that this must be done before startup tries to invalidate slot.
> 5. Startup process detects the PRUNE_ON_ACCESS record and tries to invalidate the
>    slot. However, the catalog_xmin has been advanced so that the invalidation
>    cannot be done.

Thanks for the testing and explanation! I did apply your repro and I'm able to
see the test failing (with an active slot). The scenario is more unlikely
to happen (as compare to the non vacuum full cases) and that's why it was not
visible in drongo's reports in [1]. So yeah, let's do as you suggested and do
not make the slot active for the vacuum full case too.

[1]: https://www.postgresql.org/message-id/386386.1737736935@sss.pgh.pa.us

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Dmitry Dolgov
Date:
Subject: Re: Changing shared_buffers without restart
Next
From: Amit Kapila
Date:
Subject: Re: BUG #18815: Logical replication worker Segmentation fault