Re: Add CHECK_FOR_INTERRUPTS in polling loop code path in XactLockTableWait - Mailing list pgsql-hackers

From Kevin K Biju
Subject Re: Add CHECK_FOR_INTERRUPTS in polling loop code path in XactLockTableWait
Date
Msg-id CAM45KeF6+3GtR7dDTAM-kwpGWpnUmAeg=jh0N3cd=NGBgvVkkA@mail.gmail.com
Whole thread Raw
In response to Re: Add CHECK_FOR_INTERRUPTS in polling loop code path in XactLockTableWait  (Fujii Masao <masao.fujii@oss.nttdata.com>)
List pgsql-hackers
Hi Fujii,

Thanks for the review.

> Just an idea: how about calling pgstat_report_wait_start() and _end() around pg_usleep(), similar to what WaitExceedsMaxStandbyDelay() does?

I was thinking about this as well, but the question is what do we report the wait event here as? The one that makes sense to me is PG_WAIT_LOCK, but that wouldn't align with what pg_locks would display since there is no actual lock grabbed on the standby.

> Also, would waiting only 1ms per loop cycle be too aggressive?

I agree that it's aggressive for this case. But XactLockTableWait/ConditionalXactLockTableWait seem to be used in other places including in heapam so I'm hesitant on changing this behaviour for all of them. Should we have a different "wait logic" for this case?

Kevin

On Mon, May 26, 2025 at 9:02 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:


On 2025/05/24 5:41, Kevin K Biju wrote:
> Hi,
>
> While creating a logical replication slot, we wait for older transactions to complete to reach a "consistent point", which can take a while on busy databases. If we're creating a slot on a primary instance, it's pretty clear that we're waiting on a transaction:
>
>
> postgres=# SELECT pid,wait_event_type,wait_event,state,query FROM pg_stat_activity WHERE pid=804;
>   pid | wait_event_type |  wait_event   | state  |                             query
> -----+-----------------+---------------+--------+----------------------------------------------------------------
>   804 | Lock            | transactionid | active | SELECT pg_create_logical_replication_slot('wow1', 'pgoutput');
> (1 row)
> postgres=# SELECT locktype,transactionid,pid,mode FROM pg_locks WHERE pid=804 AND granted='f';
>     locktype    | transactionid | pid |   mode
> ---------------+---------------+-----+-----------
>   transactionid |          6077 | 804 | ShareLock
>
> However, creating a slot on a hot standby behaves very differently when blocked on a transaction. Querying pg_stat_activity doesn't give us any indication on what the issue is:
>
>
> postgres=# SELECT pg_is_in_recovery();
>   pg_is_in_recovery
> -------------------
>   t
> (1 row)
> postgres=# SELECT pid,wait_event_type,wait_event,state,query FROM pg_stat_activity WHERE pid=5074;
>   pid  | wait_event_type | wait_event | state  |                             query
> ------+-----------------+------------+--------+----------------------------------------------------------------
>   5074 |                 |            | active | SELECT pg_create_logical_replication_slot('wow1', 'pgoutput');
>
> And more importantly, a backend in this state cannot be terminated via the usual methods and sometimes requires a server restart.
>
>
> postgres=# SELECT pg_cancel_backend(5074), pg_terminate_backend(5074);
>   pg_cancel_backend | pg_terminate_backend
> -------------------+----------------------
>   t                 | t
> (1 row)
> postgres=# SELECT pid,wait_event_type,wait_event,state,query FROM pg_stat_activity WHERE pid=5074;
>   pid  | wait_event_type | wait_event | state  |                             query
> ------+-----------------+------------+--------+----------------------------------------------------------------
>   5074 |                 |            | active | SELECT pg_create_logical_replication_slot('wow1', 'pgoutput');
>
> I've taken a look at the code that "awaits" on transactions and the function of interest is lmgr.c/XactLockTableWait. On a primary, it is able to acquire a ShareLock on the xid and the lock manager does a good job on making this wait efficient as well as visible externally. However, on a standby, it cannot wait on the xid since it is not running the transaction. However, it knows the transaction is still running from KnownAssignedXids, and then ends up on this codepath:
>
> *
> * Some uses of this function don't involve tuple visibility -- such
> * as when building snapshots for logical decoding.  It is possible to
> * see a transaction in ProcArray before it registers itself in the
> * locktable.  The topmost transaction in that case is the same xid,
> * so we try again after a short sleep.  (Don't sleep the first time
> * through, to avoid slowing down the normal case.)
> */
> if (!first)
> pg_usleep(1000L);
>
> The attached comment suggests that this sleep was only meant to be hit a few times while we wait for the lock to appear so we can wait on it. However, in the hot standby case, this ends up becoming a polling loop since the lock will never appear. There is no interrupt processing in this loop either and so the only way out is to wait for the transaction on the primary to complete.

I agree with your analysis. It makes sense to me.


> Attached is a patch that adds CHECK_FOR_INTERRUPTS before the sleep in XactLockTableWait and ConditionalXactLockTableWait. This allows backends waiting on a transaction during slot creation on a standby to be interrupted.

+1

> It would also be nice if there was a way for users to tell what we're waiting on (maybe a different wait event?) but I'd like input on that.

Just an idea: how about calling pgstat_report_wait_start() and _end() around
pg_usleep(), similar to what WaitExceedsMaxStandbyDelay() does?
Since this is more of an improvement than a bug fix, I think
it would be better to make it as a separate patch from the CFI addition.


Also, would waiting only 1ms per loop cycle be too aggressive? If so, maybe
we could increase the sleep time gradually during recovery (but not beyond
1 second), again similar to WaitExceedsMaxStandbyDelay().

Regards,

--
Fujii Masao
NTT DATA Japan Corporation

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Fixing memory leaks in postgres_fdw
Next
From: Eduard Stefes
Date:
Subject: RE: Review/Pull Request: Adding new CRC32C implementation for IBM S390X