Thread: Re: Add “FOR UPDATE NOWAIT” lock details to the log.
On 2024/09/13 20:49, Seino Yuki wrote: > Hello, > > I would like to add the information of the PID that caused the failure > when acquiring a lock with "FOR UPDATE NOWAIT". > > When "FOR UPDATE" is executed and interrupted by lock_timeout, > xid and PID are output in the logs, Could you explain how to reproduce this? In my quick test, lock waits canceled by lock_timeout didn’t report either xid or PID, so I might be missing something. > but in the case of "FOR UPDATE NOWAIT", > no information is output, making it impossible to identify the cause of the lock failure. > Therefore, I would like to output information in the logs in the same way as > when "FOR UPDATE" is executed and interrupted by lock_timeout. I think NOWAIT is often used for lock waits that can fail frequently. Logging detailed information for each failed NOWAIT lock wait could lead to excessive and potentially noisy log messages for some users. On the other hand, I understand that even with NOWAIT, we might want to investigate why a lock wait failed. In such cases, detailed information about the locking process would be useful. Considering both points, I’m leaning toward adding a new GUC parameter to control whether detailed logs should be generated for failed NOWAIT locks (and possibly for those canceled by lock_timeout). Alternatively, if adding a new GUC is not ideal, we could extend log_lock_waits to accept a new value like 'error,' which would trigger detailed logging when a lock wait fails due to NOWAIT, lock_timeout, or cancellation. > The patch is attached as well. I got the following compiler errors; proc.c:1240:3: error: call to undeclared function 'CollectLockHoldersAndWaiters'; ISO C99 and later do not support implicitfunction declarations [-Wimplicit-function-declaration] 1240 | CollectLockHoldersAndWaiters(proclock, lock, &lock_holders_sbuf, &lock_waiters_sbuf, &lockHoldersNum); | ^ proc.c:1549:4: error: call to undeclared function 'CollectLockHoldersAndWaiters'; ISO C99 and later do not support implicitfunction declarations [-Wimplicit-function-declaration] 1549 | CollectLockHoldersAndWaiters(NULL, lock, &lock_holders_sbuf, &lock_waiters_sbuf, &lockHoldersNum); | ^ proc.c:1126:8: warning: unused variable 'first_holder' [-Wunused-variable] 1126 | bool first_holder = true, | ^~~~~~~~~~~~ proc.c:1127:5: warning: unused variable 'first_waiter' [-Wunused-variable] 1127 | first_waiter = true; | ^~~~~~~~~~~~ proc.c:1982:1: error: conflicting types for 'CollectLockHoldersAndWaiters' 1982 | CollectLockHoldersAndWaiters(PROCLOCK *waitProcLock, LOCK *lock, StringInfo lock_holders_sbuf, StringInfo lock_waiters_sbuf,int *lockHoldersNum) | ^ proc.c:1240:3: note: previous implicit declaration is here 1240 | CollectLockHoldersAndWaiters(proclock, lock, &lock_holders_sbuf, &lock_waiters_sbuf, &lockHoldersNum); | ^ 2 warnings and 3 errors generated. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Thank you for Review. > Could you explain how to reproduce this? In my quick test, > lock waits canceled by lock_timeout didn’t report either xid or PID, > so I might be missing something. It seems to be outputting on my end, how about you? ===== Console ===== postgres=# SHOW log_lock_waits; log_lock_waits ---------------- on (1 row) postgres=# SHOW lock_timeout; lock_timeout -------------- 2s (1 row) (tx1) postgres=# BEGIN; BEGIN postgres=*# SELECT aid FROM pgbench_accounts WHERE aid = 1 FOR UPDATE; aid ----- 1 (1 row) (tx2) postgres=# SELECT aid FROM pgbench_accounts WHERE aid = 1 FOR UPDATE; ERROR: canceling statement due to lock timeout CONTEXT: while locking tuple (0,1) in relation "pgbench_accounts" ===== Log Output ===== # lock start 2024-10-17 17:49:58.157 JST [1443346] LOG: process 1443346 still waiting for ShareLock on transaction 770 after 1000.096 ms 2024-10-17 17:49:58.157 JST [1443346] DETAIL: Process holding the lock: 1443241. Wait queue: 1443346. 2024-10-17 17:49:58.157 JST [1443346] CONTEXT: while locking tuple (0,1) in relation "pgbench_accounts" 2024-10-17 17:49:58.157 JST [1443346] STATEMENT: SELECT * FROM pgbench_accounts WHERE aid = 1 FOR UPDATE; # lock timeout 2024-10-17 17:49:59.157 JST [1443346] ERROR: canceling statement due to lock timeout 2024-10-17 17:49:59.157 JST [1443346] CONTEXT: while locking tuple (0,1) in relation "pgbench_accounts" 2024-10-17 17:49:59.157 JST [1443346] STATEMENT: SELECT * FROM pgbench_accounts WHERE aid = 1 FOR UPDATE; > I think NOWAIT is often used for lock waits that can fail frequently. > Logging detailed information for each failed NOWAIT lock wait could > lead to excessive and potentially noisy log messages for some users. > > On the other hand, I understand that even with NOWAIT, we might want > to investigate why a lock wait failed. In such cases, > detailed information about the locking process would be useful. > > Considering both points, I’m leaning toward adding a new GUC parameter > to control whether detailed logs should be generated for failed > NOWAIT locks (and possibly for those canceled by lock_timeout). > Alternatively, if adding a new GUC is not ideal, we could extend > log_lock_waits to accept a new value like 'error,' which would trigger > detailed logging when a lock wait fails due to NOWAIT, lock_timeout, > or cancellation. That's a good idea. I'll try that. > I got the following compiler errors; thank you. I missed it. Regards, -- Yuki Seino NTT DATA CORPORATION
>> Considering both points, I’m leaning toward adding a new GUC parameter >> to control whether detailed logs should be generated for failed >> NOWAIT locks (and possibly for those canceled by lock_timeout). >> Alternatively, if adding a new GUC is not ideal, we could extend >> log_lock_waits to accept a new value like 'error,' which would trigger >> detailed logging when a lock wait fails due to NOWAIT, lock_timeout, >> or cancellation. > That's a good idea. I'll try that. Sorry, I misunderstood. The pid and xid are output during deadlock_timeout. > LOG: process 1443346 still waiting for ShareLock on transaction 770 > after 1000.096 ms > DETAIL: Process holding the lock: 1443241. Wait queue: 1443346. This log output is not triggered by lock_timeout or cancellation. Therefore, it is difficult to support lock_timeout and cancellation. I am thinking of supporting only NOWAIT. What do you think? Regards, -- Yuki Seino NTT DATA CORPORATION
On 2024/10/17 22:15, Seino Yuki wrote: > >>> Considering both points, I’m leaning toward adding a new GUC parameter >>> to control whether detailed logs should be generated for failed >>> NOWAIT locks (and possibly for those canceled by lock_timeout). >>> Alternatively, if adding a new GUC is not ideal, we could extend >>> log_lock_waits to accept a new value like 'error,' which would trigger >>> detailed logging when a lock wait fails due to NOWAIT, lock_timeout, >>> or cancellation. >> That's a good idea. I'll try that. > > Sorry, I misunderstood. > The pid and xid are output during deadlock_timeout. > >> LOG: process 1443346 still waiting for ShareLock on transaction 770 after 1000.096 ms >> DETAIL: Process holding the lock: 1443241. Wait queue: 1443346. > This log output is not triggered by lock_timeout or cancellation. Yes, these log messages with PID or XID are generated by log_lock_waits, not lock_timeout. > Therefore, it is difficult to support lock_timeout and cancellation. > > I am thinking of supporting only NOWAIT. What do you think? I'm not sure why it's challenging to provide detailed log messages for lock waits canceled by lock_timeout or user cancellation, while it's considered feasible for the NOWAIT case. However, I think it's reasonable to implement this feature step by step. We can start by adding support for the NOWAIT case and consider extending it to handle lock_timeout and cancellation scenarios later if possible. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2024/10/18 9:37, Seino Yuki wrote: > Currently, we are discussing two improvements: > > 1. Log output when NOWAIT fails. > 2. Adding control via GUC parameters (NOWAIT, lock_timeout, cancellation). > >> I'm not sure why it's challenging to provide detailed log messages for lock waits canceled >> by lock_timeout or user cancellation, while it's considered feasible for the NOWAIT case. > > Does this statement mean that for 2, why can NOWAIT but lock_timeout,cancellation cannot? > > For item 2, the lock_timeout and cancellation will log outputs after the deadlock_timeout(e.g. 1s) has elapsed (by log_lock_waits). This log message isn't directly related to lock_timeout or cancellations. It appears if log_lock_waits is enabled and the lock wait time exceeds deadlock_timeout, regardless of whether lock_timeout is set or if users cancel the operation. I understood your original proposal as suggesting detailed log output whenever a NOWAIT lock attempt fails. However, as I mentioned earlier, NOWAIT failures can happen frequently, so logging every failure in detail could be too noisy for some users. That’s why I proposed adding a new GUC parameter (or extending log_lock_waits) to control whether such detailed logs should be generated for NOWAIT failures. During our discussion, I also thought it would be useful to provide detailed logs for lock waits canceled by user actions or lock_timeout. This behavior could be controlled by a similar GUC parameter. However, this is an additional feature beyond your original proposal, so I think it's fine to leave it out of the initial patch. > At the time this log is output, it is unclear whether the lock will be cancellation or lock_timeout. > This means that the timing at "error is determined" and "output logged" do not match. > >> However, I think it's reasonable to implement this feature step by step. We can start >> by adding support for the NOWAIT case and consider extending it to handle lock_timeout and >> cancellation scenarios later if possible. > > +1. > I will send the version with the GUC parameter added from the previous patch. The choice between adding a new GUC or extending the existing one (e.g., log_lock_waits) is debatable, but I prefer the latter. I'm considering extending log_lock_waits to accept a value like "fail". If set to "on" (the current behavior), detailed logs are generated when the lock wait time exceeds deadlock_timeout. If set to "fail", logs are generated whenever a lock wait fails. If both are specified, logs would be triggered when the wait time exceeds deadlock_timeout or when a lock wait fails. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
> During our discussion, I also thought it would be useful to provide detailed logs > > for lock waits canceled by user actions or lock_timeout. Thank you. I understand now. You want to output the logs based on a different trigger than log_lock_waits. At first, I found that strange, but I see now that there are cases where it's necessary to output logs for errors even when log_lock_waits is disabled. I have also understood the proposal for improving the GUC parameters. I will revise the patch. Regards, -- Yuki Seino NTT DATA CORPORATION
On 2024/10/18 19:07, Seino Yuki wrote: >> The choice between adding a new GUC or extending the existing one >> (e.g., log_lock_waits) >> is debatable, but I prefer the latter. I'm considering extending log_lock_waits >> to accept a value like "fail". If set to "on" (the current behavior), >> detailed logs are generated when the lock wait time exceeds deadlock_timeout. >> If set to "fail", logs are generated whenever a lock wait fails. If both are >> specified, logs would be triggered when the wait time exceeds >> deadlock_timeout or >> when a lock wait fails. > > Thanks for the idea. > Changed log_lock_waits to an enum type and added fail and all. > "off" : No log message(default). > "on" : If over deadlock_timeout(the current behavior). > "fail" : If lock failed. > "all" : All pettern. I'm still thinking about how we should handle logging for lock failures caused by the nowait option. Extending the existing log_lock_waits parameter has the advantage of avoiding a new GUC, but it might make configuration more complicated. I'm starting to think adding a new GUC might be a better option. Regarding the patch, when I applied it to HEAD, it failed to compile with the following errors. Could you update the patch to address this? proc.c:1538:20: error: use of undeclared identifier 'buf' 1538 | initStringInfo(&buf); | ^ proc.c:1539:20: error: use of undeclared identifier 'lock_waiters_sbuf' 1539 | initStringInfo(&lock_waiters_sbuf); | ^ proc.c:1540:20: error: use of undeclared identifier 'lock_holders_sbuf' 1540 | initStringInfo(&lock_holders_sbuf); | ^ .... Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
>>> ing a new GUC or extending the existing one >>> (e.g., log_lock_waits) >>> is debatable, but I prefer the latter. I'm considering extending >>> log_lock_waits >>> to accept a value like "fail". If set to "on" (the current behavior), >>> detailed logs are generated when the lock wait time exceeds >>> deadlock_timeout. >>> If set to "fail", logs are generated whenever a lock wait fails. If >>> both are >>> specified, logs would be triggered when the wait time exceeds >>> deadlock_timeout or >>> when a lock wait fails. >> >> Thanks for the idea. >> Changed log_lock_waits to an enum type and added fail and all. >> "off" : No log message(default). >> "on" : If over deadlock_timeout(the current behavior). >> "fail" : If lock failed. >> "all" : All pettern. > > I'm still thinking about how we should handle logging for lock failures > caused by the nowait option. Extending the existing log_lock_waits > parameter > has the advantage of avoiding a new GUC, but it might make configuration > more complicated. I'm starting to think adding a new GUC might be a > better option. Yes. It’s still simple for now, but reusing an existing GUC could complicate future expansions. I will design a new GUC while ensuring consistency with 'log_lock_waits'. > Regarding the patch, when I applied it to HEAD, it failed to compile with > the following errors. Could you update the patch to address this? > > proc.c:1538:20: error: use of undeclared identifier 'buf' > 1538 | initStringInfo(&buf); > | ^ > proc.c:1539:20: error: use of undeclared identifier 'lock_waiters_sbuf' > 1539 | initStringInfo(&lock_waiters_sbuf); > | ^ > proc.c:1540:20: error: use of undeclared identifier 'lock_holders_sbuf' > 1540 | initStringInfo(&lock_holders_sbuf); > | ^ > .... Conflicted with another commit [1]. I'll rebase it later. [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3c0fd64fec8ed6fa3987c33f076fcffbc3f268c3
On 2024/12/19 17:21, Yuki Seino wrote: >> [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3c0fd64fec8ed6fa3987c33f076fcffbc3f268c3 > Rebased and added new GUC log_lock_failure and minor fixes. Currently only NOWAIT errors are supported. Thanks for updating the patch! + Currently, only lock failures due to <literal>NOWAIT</literal> are + supported. The default is <literal>off</literal>. Only superusers This GUC also affects SKIP LOCKED, so that should be documented. Regarding the GUC name log_lock_failure, I'm not sure it's the best fit, but I don't have a better suggestion at the moment. I'll keep considering alternatives. If CollectLockHoldersAndWaiters() is used only in proc.c, it should be made a static function. +/* + * we loop over the lock's procLocks to gather a list of all + * holders and waiters. Thus we will be able to provide more + * detailed information for lock debugging purposes. For the source comment of CollectLockHoldersAndWaiters(), could you follow the style of other functions in proc.c? Why is the logic to report lock holders and waiters implemented within JoinWaitQueue()? Wouldn't it be better placed right after JoinWaitQueue() is called in LockAcquireExtended(), similar to DeadLockReport()? One issue with the current implementation is that partitionLock is held in exclusive mode while collecting and reporting lock holders and waiters. If the logic is moved after JoinWaitQueue() in LockAcquireExtended(), lock holders and waiters could be processed after releasing partitionLock. Note, however, that partitionLock might still need to be taken again in shared mode during the collection. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Thank you for your comment. Sorry for being late. > SELECT FOR UPDATE SKIP LOCKED might skip a large number of rows, > leading to > a high volume of log messages from log_lock_failure in your current > patch. > Could this be considered unintended behavior? Would it be better to limit > the number of log messages per query? It is necessary to suppress the generation of a large amount of logs due to SKIP LOCKED. But I think that when using SKIP LOCKED, the locks are often intentionally bypassed. It seems unnatural to log only the first write or to set a specific number of samples. What do you think if we simply don't log anything for SKIP LOCKED? Regards,
On 2025/02/03 21:30, Yuki Seino wrote: > Thank you for your comment. Sorry for being late. > > >> SELECT FOR UPDATE SKIP LOCKED might skip a large number of rows, leading to >> a high volume of log messages from log_lock_failure in your current patch. >> Could this be considered unintended behavior? Would it be better to limit >> the number of log messages per query? > It is necessary to suppress the generation of a large amount of logs due to SKIP LOCKED. > But I think that when using SKIP LOCKED, the locks are often intentionally bypassed. > It seems unnatural to log only the first write or to set a specific number of samples. I don't think so. Some users of SKIP LOCKED may want to understand why locks were skipped and identify the blockers. Even partial information could still be valuable to them, I think. > What do you think if we simply don't log anything for SKIP LOCKED? Implementing both NOWAIT and SKIP LOCKED could take time and make the patch more complex. I'm fine with focusing on the NOWAIT case first as an initial patch. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Wed, 12 Feb 2025 at 12:32, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > What do you think if we simply don't log anything for SKIP LOCKED? > > Implementing both NOWAIT and SKIP LOCKED could take time and make the patch > more complex. I'm fine with focusing on the NOWAIT case first as an initial patch. I think that makes sense. It's a fairly common pattern to use SKIP LOCKED to implement a concurrent job queue. Having such a usecase suddenly create lots of logs seems undesirable, especially since it created no logs at all before. Since NOWAIT already results in an error (and thus a log), having it add some additional info seems totally reasonable.
On 2025/02/13 2:31, Jelte Fennema-Nio wrote:
On Wed, 12 Feb 2025 at 12:32, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:What do you think if we simply don't log anything for SKIP LOCKED?Implementing both NOWAIT and SKIP LOCKED could take time and make the patch more complex. I'm fine with focusing on the NOWAIT case first as an initial patch.I think that makes sense. It's a fairly common pattern to use SKIP LOCKED to implement a concurrent job queue. Having such a usecase suddenly create lots of logs seems undesirable, especially since it created no logs at all before. Since NOWAIT already results in an error (and thus a log), having it add some additional info seems totally reasonable.
Thank you for the advice. For now, my goal is to output only NOWAIT. Since lock.c cannot reference LockWaitPolicy, I believe we need to extend the IF conditions in LockAcquire, LockAcquireExtended, and their higher-level functions. This could be a pretty significant modification. I’ll think about whether there’s a better approach. I welcome any good ideas from everyone too. As an aside, I also noticed that dontWait(=true) is routed not only from NOWAIT and SKIP LOCKED but also from vacuum and other parts. do_autovacuum(autovacuum.c) -> ConditionalLockDatabaseObject -> LockAcquireExtended Regards,
On 2025/02/18 18:33, Yuki Seino wrote: > > On 2025/02/13 2:31, Jelte Fennema-Nio wrote: >> On Wed, 12 Feb 2025 at 12:32, Fujii Masao<masao.fujii@oss.nttdata.com> wrote: >>>> What do you think if we simply don't log anything for SKIP LOCKED? >>> Implementing both NOWAIT and SKIP LOCKED could take time and make the patch >>> more complex. I'm fine with focusing on the NOWAIT case first as an initial patch. >> I think that makes sense. It's a fairly common pattern to use SKIP >> LOCKED to implement a concurrent job queue. Having such a usecase >> suddenly create lots of logs seems undesirable, especially since it >> created no logs at all before. Since NOWAIT already results in an >> error (and thus a log), having it add some additional info seems >> totally reasonable. > > Thank you for the advice. For now, my goal is to output only NOWAIT. Since lock.c cannot reference LockWaitPolicy, I believewe need to extend the IF conditions in LockAcquire, LockAcquireExtended, and their higher-level functions. This couldbe a pretty significant modification. I’ll think about whether there’s a better approach. I welcome any good ideas fromeveryone too. Just an idea, but how about updating ConditionalXactLockTableWait() to do the followings? - Use LockAcquireExtended() instead of LockAcquire() to retrieve the LOCALLOCK value. - Generate a log message about the lock holders, from the retrieved the LOCALLOCK value. - Return the generated log message to the caller (heap_lock_tuple()), allowing it to log the message. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2025/02/19 1:08, Fujii Masao wrote: > Just an idea, but how about updating ConditionalXactLockTableWait() to > do the followings? > - Use LockAcquireExtended() instead of LockAcquire() to retrieve the > LOCALLOCK value. > - Generate a log message about the lock holders, from the retrieved > the LOCALLOCK value. > - Return the generated log message to the caller (heap_lock_tuple()), > allowing it to log the message. Thank you for your suggestion. I have two issues to discuss: ### 1. Issue with LockAcquireExtended() It appears that in the dontWait case, LockAcquireExtended() is removing the local lock (locallock). ``` if (locallock->nLocks == 0) RemoveLocalLock(locallock); ``` Instead, the removal of locallock should be handled by ConditionalXactLockTableWait(). ### 2. Issue with the LOCK TABLE Case For the LOCK TABLE scenario, RangeVarGetRelidExtended() calls ConditionalLockRelationOid(), which seems to require a similar modification. ``` # tx1 BEGIN; LOCK TABLE pgbench_accounts; # tx2 BEGIN; LOCK TABLE pgbench_accounts NOWAIT; -- breakpoint -> LockAcquireExtended # parts of CALLSTACK LockAcquireExtended() ConditionalLockRelationOid() RangeVarGetRelidExtended() LockTableCommand() standard_ProcessUtility() ProcessUtility() ``` I don't see a problem with it, though, Do you think these are problems? Regards,
On 2025/02/20 15:27, Yuki Seino wrote: > > On 2025/02/19 1:08, Fujii Masao wrote: >> Just an idea, but how about updating ConditionalXactLockTableWait() to do the followings? >> - Use LockAcquireExtended() instead of LockAcquire() to retrieve the LOCALLOCK value. >> - Generate a log message about the lock holders, from the retrieved the LOCALLOCK value. >> - Return the generated log message to the caller (heap_lock_tuple()), allowing it to log the message. > Thank you for your suggestion. I have two issues to discuss: > > > ### 1. Issue with LockAcquireExtended() > > It appears that in the dontWait case, LockAcquireExtended() is removing the local lock (locallock). > > ``` > if (locallock->nLocks == 0) > RemoveLocalLock(locallock); > ``` > > Instead, the removal of locallock should be handled by ConditionalXactLockTableWait(). RemoveLocalLock() doesn't seem to remove the necessary LocalLock information for generating the lock failure log message. Is that correct? One idea I considered is using the LocalLock returned by LockAcquireExtended(), but this might complicate the code more than expected. A simpler approach could be adding a new argument, like logLockFailure, to LockAcquireExtended(), allowing it to log the failure message when set to true. This change would affect LockAcquireExtended() and some ConditionalLockXXX() functions, but it doesn't seem too invasive. As for LockAcquire(), I don't think it needs a new argument. If any ConditionalLockXXX() functions call LockAcquire(), we could modify them to call LockAcquireExtended() instead, enabling logging when needed. > ### 2. Issue with the LOCK TABLE Case > > For the LOCK TABLE scenario, RangeVarGetRelidExtended() calls ConditionalLockRelationOid(), which seems to require a similarmodification. Yes, if we want to support LOCK TABLE NOWAIT and other NOWAIT cases, these additional updates would be required. However, I suggest focusing on row-level locks with SELECT ... NOWAIT first, then expanding to other cases later. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2025/02/21 10:28, Fujii Masao wrote: > > > On 2025/02/20 15:27, Yuki Seino wrote: >> >> On 2025/02/19 1:08, Fujii Masao wrote: >>> Just an idea, but how about updating ConditionalXactLockTableWait() >>> to do the followings? >>> - Use LockAcquireExtended() instead of LockAcquire() to retrieve the >>> LOCALLOCK value. >>> - Generate a log message about the lock holders, from the retrieved >>> the LOCALLOCK value. >>> - Return the generated log message to the caller >>> (heap_lock_tuple()), allowing it to log the message. >> Thank you for your suggestion. I have two issues to discuss: >> >> >> ### 1. Issue with LockAcquireExtended() >> >> It appears that in the dontWait case, LockAcquireExtended() is >> removing the local lock (locallock). >> >> ``` >> if (locallock->nLocks == 0) >> RemoveLocalLock(locallock); >> ``` >> >> Instead, the removal of locallock should be handled by >> ConditionalXactLockTableWait(). > > RemoveLocalLock() doesn't seem to remove the necessary LocalLock > information > for generating the lock failure log message. Is that correct? > > One idea I considered is using the LocalLock returned by > LockAcquireExtended(), > but this might complicate the code more than expected. A simpler > approach could > be adding a new argument, like logLockFailure, to LockAcquireExtended(), > allowing it to log the failure message when set to true. This change > would > affect LockAcquireExtended() and some ConditionalLockXXX() functions, > but it doesn't seem too invasive. > > As for LockAcquire(), I don't think it needs a new argument. If any > ConditionalLockXXX() functions call LockAcquire(), we could modify > them to > call LockAcquireExtended() instead, enabling logging when needed. > > >> ### 2. Issue with the LOCK TABLE Case >> >> For the LOCK TABLE scenario, RangeVarGetRelidExtended() calls >> ConditionalLockRelationOid(), which seems to require a similar >> modification. > > Yes, if we want to support LOCK TABLE NOWAIT and other NOWAIT cases, > these additional updates would be required. However, I suggest focusing > on row-level locks with SELECT ... NOWAIT first, then expanding to > other cases later. Thanks for the advice. I have taken your advice and am working on a patch. But, I have a problem. There are 4 locations where LockWaitError is determined. 3 of them could be reproduced, but 1 could not. I will check more in depth. (1) https://github.com/postgres/postgres/blob/master/src/backend/access/heap/heapam.c#L4946 tx1=# begin; tx1=# select * from pgbench_accounts where aid = '1' for update; tx2=# select * from pgbench_accounts where aid = '1' for update nowait; (2) https://github.com/postgres/postgres/blob/master/src/backend/access/heap/heapam.c#L4905 tx1=# begin; tx1=# select * from pgbench_accounts where aid = '1' for share; tx2=# begin; tx2=# select * from pgbench_accounts where aid = '1' for share; tx3=# select * from pgbench_accounts where aid = '1' for update nowait; (3) https://github.com/postgres/postgres/blob/master/src/backend/access/heap/heapam.c#L5211 tx1=# begin; tx1=# select * from pgbench_accounts where aid = '1' for update; tx2=# begin; tx2=# select * from pgbench_accounts where aid = '1' for update; tx3=# select * from pgbench_accounts where aid = '1' for update nowait; (4) https://github.com/postgres/postgres/blob/master/src/backend/access/heap/heapam_handler.c#L463 ...I don't know how to reproduce it. Regards,
On 2025/02/27 15:44, Yuki Seino wrote: > (4) > https://github.com/postgres/postgres/blob/master/src/backend/access/heap/heapam_handler.c#L463 > > ...I don't know how to reproduce it. I have confirmed that (4) can be reproduced using the following procedure. (4) https://github.com/postgres/postgres/blob/master/src/backend/access/heap/heapam_handler.c#L463 tx1=# SELECT pg_advisory_lock(0); tx2=# SELECT * FROM pgbench_accounts WHERE pg_advisory_lock(0) IS NOT NULL FOR UPDATE NOWAIT; tx1=# UPDATE pgbench_accounts SET aid = aid WHERE aid = '1'; tx1=# BEGIN; tx1=# UPDATE pgbench_accounts SET aid = aid WHERE aid = '1'; tx1=# SELECT pg_advisory_unlock(0); Send the modified patch. Regard,
Attachment
On 2025/02/28 21:08, Yuki Seino wrote: > > On 2025/02/27 15:44, Yuki Seino wrote: >> (4) https://github.com/postgres/postgres/blob/master/src/backend/access/heap/heapam_handler.c#L463 >> ...I don't know how to reproduce it. > I have confirmed that (4) can be reproduced using the following procedure. > > (4) https://github.com/postgres/postgres/blob/master/src/backend/access/heap/heapam_handler.c#L463 > tx1=# SELECT pg_advisory_lock(0); > tx2=# SELECT * FROM pgbench_accounts WHERE pg_advisory_lock(0) IS NOT NULL FOR UPDATE NOWAIT; > tx1=# UPDATE pgbench_accounts SET aid = aid WHERE aid = '1'; > tx1=# BEGIN; > tx1=# UPDATE pgbench_accounts SET aid = aid WHERE aid = '1'; > tx1=# SELECT pg_advisory_unlock(0); > > Send the modified patch. Thanks for updating the patch! I encountered a compilation error with the patch. You can also see the error in the patch tester. https://cirrus-ci.com/task/5070779370438656 Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
> I encountered a compilation error with the patch. You can also see the > error in the patch tester. > https://cirrus-ci.com/task/5070779370438656 Sorry, removed some errors and made some fixes. Regard,
Attachment
On 2025/03/10 22:11, Yuki Seino wrote: >> I encountered a compilation error with the patch. You can also see the error in the patch tester. >> https://cirrus-ci.com/task/5070779370438656 > > Sorry, removed some errors and made some fixes. Thanks for updating the patch! You modified heap_lock_tuple() to log lock acquisition failures using the lock holders and waiters lists returned by LockAcquireExtended(). However, this results in almost identical logging code in both heapam.c and heapam_handler.c. This approach feels a bit complicated, and if we extend this feature to other lock failure cases, we'd have to duplicate the same logging logic elsewhere — which isn't ideal. Instead, wouldn't it be simpler to update LockAcquireExtended() to take a new argument, like logLockFailure, to control whether a lock failure should be logged directly? I’ve adjusted the patch accordingly and attached it. Please let me know what you think! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
> > Instead, wouldn't it be simpler to update LockAcquireExtended() to > take a new argument, like logLockFailure, to control whether > a lock failure should be logged directly? I’ve adjusted the patch > accordingly and attached it. Please let me know what you think! > > Regards, Thank you! It's very simple and nice. It seems like it can also handle other lock failure cases by extending logLockFailure. I agree with this propose. Regards,
On 2025/03/11 16:50, Yuki Seino wrote: >> >> Instead, wouldn't it be simpler to update LockAcquireExtended() to >> take a new argument, like logLockFailure, to control whether >> a lock failure should be logged directly? I’ve adjusted the patch >> accordingly and attached it. Please let me know what you think! >> >> Regards, > Thank you! > > It's very simple and nice. > It seems like it can also handle other lock failure cases by extending logLockFailure. > > I agree with this propose. Thanks for reviewing the patch! I've made some minor cosmetic adjustments. The updated patch is attached. Unless there are any objections, I'll proceed with committing it. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
On 2025/03/11 22:24, Fujii Masao wrote: > > > On 2025/03/11 16:50, Yuki Seino wrote: >>> >>> Instead, wouldn't it be simpler to update LockAcquireExtended() to >>> take a new argument, like logLockFailure, to control whether >>> a lock failure should be logged directly? I’ve adjusted the patch >>> accordingly and attached it. Please let me know what you think! >>> >>> Regards, >> Thank you! >> >> It's very simple and nice. >> It seems like it can also handle other lock failure cases by extending logLockFailure. >> > I agree with this propose. > > Thanks for reviewing the patch! > > I've made some minor cosmetic adjustments. The updated patch is attached. > > Unless there are any objections, I'll proceed with committing it. Pushed the patch. Thanks! Using the newly introduced mechanism, we can now easily extend the log_lock_failure GUC to support additional NOWAIT lock failures, such as LOCK TABLE ... NOWAIT, ALTER TABLE ... NOWAIT, ALTER MATERIALIZED VIEW ... NOWAIT, and ALTER INDEX ... NOWAIT. I've attached a patch implementing this. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
> Pushed the patch. Thanks! Thank you. I'm very happy !! > Using the newly introduced mechanism, we can now easily extend > the log_lock_failure GUC to support additional NOWAIT lock failures, > such as LOCK TABLE ... NOWAIT, ALTER TABLE ... NOWAIT, > ALTER MATERIALIZED VIEW ... NOWAIT, and ALTER INDEX ... NOWAIT. > > I've attached a patch implementing this. That's a very good suggestion. There is just one thing that bothers me. ConditionalLockRelationOid() seems to be used by autovacuum as well. Wouldn't this information be noise to the user? Regards,
On 2025/03/21 15:21, Yuki Seino wrote: >> Pushed the patch. Thanks! > > Thank you. I'm very happy !! > > >> Using the newly introduced mechanism, we can now easily extend >> the log_lock_failure GUC to support additional NOWAIT lock failures, >> such as LOCK TABLE ... NOWAIT, ALTER TABLE ... NOWAIT, >> ALTER MATERIALIZED VIEW ... NOWAIT, and ALTER INDEX ... NOWAIT. >> >> I've attached a patch implementing this. > That's a very good suggestion. > > There is just one thing that bothers me. > ConditionalLockRelationOid() seems to be used by autovacuum as well. > Wouldn't this information be noise to the user? Yes, logging every autovacuum lock failure would be too noisy. I was thinking that log_lock_failure should apply only to LOCK TABLE ... NOWAIT, but per the current code, restricting it that way seems difficult. Anyway, expanding log_lock_failure further requires more discussion and design work, which isn't feasible within this CommitFest. So, I'm withdrawing the patch for now. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION