Home > mailing lists

Re: Fix 035_standby_logical_decoding.pl race conditions - Mailing list pgsql-hackers

From	Bertrand Drouvot
Subject	Re: Fix 035_standby_logical_decoding.pl race conditions
Date	April 7 12:10:50
Msg-id	Z/OWmslXU1zz4izG@ip-10-97-1-34.eu-west-3.compute.internal Whole thread Raw
In response to	RE: Fix 035_standby_logical_decoding.pl race conditions ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
List	pgsql-hackers

Tree view

Hi Kuroda-san,

On Mon, Apr 07, 2025 at 06:15:13AM +0000, Hayato Kuroda (Fujitsu) wrote:
> I had been debugging and found the case that VACUUM FULL also has a timing issue.
> This means the we cannot keep the testcase.
> 
> PSA the reproducer for PG17. IIUC this can happen even in PG16.
> I considered what happened here;
> 
> 1. Run a CHECKPOINT and wait sometime in wait_until_vacuum_can_remove().
>    This ensures that RUNNING_XACTS record can be generated and catalog_xmin can
>    be advanced after the user SQLs.
> 2. Assuming that another RUNNING_XACTS record is generated *WHILE* doing a VACUUM
>    FULL. This can be done by the periodic checkpoint or the reproducer.
> 3. Logical walsender detects the RUNNING_XACTS record.
>    Note that this must be done before startup tries to invalidate slot.
> 4. In sometime the walsender receives the ack and advance the catalog_xmin.
>    Note again that this must be done before startup tries to invalidate slot.
> 5. Startup process detects the PRUNE_ON_ACCESS record and tries to invalidate the
>    slot. However, the catalog_xmin has been advanced so that the invalidation
>    cannot be done.

Thanks for the testing and explanation! I did apply your repro and I'm able to
see the test failing (with an active slot). The scenario is more unlikely
to happen (as compare to the non vacuum full cases) and that's why it was not
visible in drongo's reports in [1]. So yeah, let's do as you suggested and do
not make the slot active for the vacuum full case too.

[1]: https://www.postgresql.org/message-id/386386.1737736935@sss.pgh.pa.us

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

pgsql-hackers by date:

From: Dmitry Dolgov
Date: 07 April, 11:43:22
Subject: Re: Changing shared_buffers without restart

From: Amit Kapila
Date: 07 April, 12:37:08
Subject: Re: BUG #18815: Logical replication worker Segmentation fault

Re: Fix 035_standby_logical_decoding.pl race conditions - Mailing list pgsql-hackers

Previous

Next