RE: Lock timeouts and unusual spikes in replication lag with logical parallel transaction streaming - Mailing list pgsql-bugs

From Hayato Kuroda (Fujitsu)
Subject RE: Lock timeouts and unusual spikes in replication lag with logical parallel transaction streaming
Date
Msg-id OSCPR01MB14966ED7F614AFF9EAD38353DF533A@OSCPR01MB14966.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Lock timeouts and unusual spikes in replication lag with logical parallel transaction streaming  (Zane Duffield <duffieldzane@gmail.com>)
List pgsql-bugs
Dear Zane,

While analyzing your post and code, I found that parallel apply worker could not
accept the lock timeout. IIUC that's why lock timeout rarely reported and parallel apply
worker exits automatically.

Lock timeout is implemented by sending a SIGINT to the process. Backends set a
signal hander to StatementCancelHandler, which the process will error out while
waiting something. See CHECK_FOR_INTERRUPTS->ProcessInterrupts. The error
message would be: "canceling statement due to lock timeout".

Regarding the parallel apply worker, however, it overwrites the signal hander for
SIGINT; it is used to detect the shutdown request from the leader process. When
parallel apply worker receives, it will exit when it reaches the main loop. Apart
from above case, the process does not exit while waiting the lock, it does after
becoming idle or receives next chunks. The message is same as normal shutdown case.

IIUC, lock timeout should be enabled for all the processes which accesses and
modifies database objects, hence current state should be fixed.

My idea is to use different signal to request shutdown to parallel apply workers.
Since checkpointer and walsender use SIGUSR2 for the similar purpose, this patch
also uses it for parallel apply worker. This issue has existed since PG16.

Note that this does not actually solve the issue what initially reported; this
allows pa worker to report and exit the lock timeout. The replication lag cannot
be resolved only by this.
Per document [1], it is not recommended to set lock_timeout globally.

[1]: https://www.postgresql.org/docs/17/runtime-config-client.html#GUC-LOCK-TIMEOUT

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

pgsql-bugs by date:

Previous
From: Richard Guo
Date:
Subject: Re: BUG #19007: Planner fails to choose partial index with spurious 'not null'
Next
From: shveta malik
Date:
Subject: Re: Unexpected Standby Shutdown on sync_replication_slots change