Thread: Re: Add “FOR UPDATE NOWAIT” lock details to the log.
On 2024/09/13 20:49, Seino Yuki wrote: > Hello, > > I would like to add the information of the PID that caused the failure > when acquiring a lock with "FOR UPDATE NOWAIT". > > When "FOR UPDATE" is executed and interrupted by lock_timeout, > xid and PID are output in the logs, Could you explain how to reproduce this? In my quick test, lock waits canceled by lock_timeout didn’t report either xid or PID, so I might be missing something. > but in the case of "FOR UPDATE NOWAIT", > no information is output, making it impossible to identify the cause of the lock failure. > Therefore, I would like to output information in the logs in the same way as > when "FOR UPDATE" is executed and interrupted by lock_timeout. I think NOWAIT is often used for lock waits that can fail frequently. Logging detailed information for each failed NOWAIT lock wait could lead to excessive and potentially noisy log messages for some users. On the other hand, I understand that even with NOWAIT, we might want to investigate why a lock wait failed. In such cases, detailed information about the locking process would be useful. Considering both points, I’m leaning toward adding a new GUC parameter to control whether detailed logs should be generated for failed NOWAIT locks (and possibly for those canceled by lock_timeout). Alternatively, if adding a new GUC is not ideal, we could extend log_lock_waits to accept a new value like 'error,' which would trigger detailed logging when a lock wait fails due to NOWAIT, lock_timeout, or cancellation. > The patch is attached as well. I got the following compiler errors; proc.c:1240:3: error: call to undeclared function 'CollectLockHoldersAndWaiters'; ISO C99 and later do not support implicitfunction declarations [-Wimplicit-function-declaration] 1240 | CollectLockHoldersAndWaiters(proclock, lock, &lock_holders_sbuf, &lock_waiters_sbuf, &lockHoldersNum); | ^ proc.c:1549:4: error: call to undeclared function 'CollectLockHoldersAndWaiters'; ISO C99 and later do not support implicitfunction declarations [-Wimplicit-function-declaration] 1549 | CollectLockHoldersAndWaiters(NULL, lock, &lock_holders_sbuf, &lock_waiters_sbuf, &lockHoldersNum); | ^ proc.c:1126:8: warning: unused variable 'first_holder' [-Wunused-variable] 1126 | bool first_holder = true, | ^~~~~~~~~~~~ proc.c:1127:5: warning: unused variable 'first_waiter' [-Wunused-variable] 1127 | first_waiter = true; | ^~~~~~~~~~~~ proc.c:1982:1: error: conflicting types for 'CollectLockHoldersAndWaiters' 1982 | CollectLockHoldersAndWaiters(PROCLOCK *waitProcLock, LOCK *lock, StringInfo lock_holders_sbuf, StringInfo lock_waiters_sbuf,int *lockHoldersNum) | ^ proc.c:1240:3: note: previous implicit declaration is here 1240 | CollectLockHoldersAndWaiters(proclock, lock, &lock_holders_sbuf, &lock_waiters_sbuf, &lockHoldersNum); | ^ 2 warnings and 3 errors generated. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Thank you for Review. > Could you explain how to reproduce this? In my quick test, > lock waits canceled by lock_timeout didn’t report either xid or PID, > so I might be missing something. It seems to be outputting on my end, how about you? ===== Console ===== postgres=# SHOW log_lock_waits; log_lock_waits ---------------- on (1 row) postgres=# SHOW lock_timeout; lock_timeout -------------- 2s (1 row) (tx1) postgres=# BEGIN; BEGIN postgres=*# SELECT aid FROM pgbench_accounts WHERE aid = 1 FOR UPDATE; aid ----- 1 (1 row) (tx2) postgres=# SELECT aid FROM pgbench_accounts WHERE aid = 1 FOR UPDATE; ERROR: canceling statement due to lock timeout CONTEXT: while locking tuple (0,1) in relation "pgbench_accounts" ===== Log Output ===== # lock start 2024-10-17 17:49:58.157 JST [1443346] LOG: process 1443346 still waiting for ShareLock on transaction 770 after 1000.096 ms 2024-10-17 17:49:58.157 JST [1443346] DETAIL: Process holding the lock: 1443241. Wait queue: 1443346. 2024-10-17 17:49:58.157 JST [1443346] CONTEXT: while locking tuple (0,1) in relation "pgbench_accounts" 2024-10-17 17:49:58.157 JST [1443346] STATEMENT: SELECT * FROM pgbench_accounts WHERE aid = 1 FOR UPDATE; # lock timeout 2024-10-17 17:49:59.157 JST [1443346] ERROR: canceling statement due to lock timeout 2024-10-17 17:49:59.157 JST [1443346] CONTEXT: while locking tuple (0,1) in relation "pgbench_accounts" 2024-10-17 17:49:59.157 JST [1443346] STATEMENT: SELECT * FROM pgbench_accounts WHERE aid = 1 FOR UPDATE; > I think NOWAIT is often used for lock waits that can fail frequently. > Logging detailed information for each failed NOWAIT lock wait could > lead to excessive and potentially noisy log messages for some users. > > On the other hand, I understand that even with NOWAIT, we might want > to investigate why a lock wait failed. In such cases, > detailed information about the locking process would be useful. > > Considering both points, I’m leaning toward adding a new GUC parameter > to control whether detailed logs should be generated for failed > NOWAIT locks (and possibly for those canceled by lock_timeout). > Alternatively, if adding a new GUC is not ideal, we could extend > log_lock_waits to accept a new value like 'error,' which would trigger > detailed logging when a lock wait fails due to NOWAIT, lock_timeout, > or cancellation. That's a good idea. I'll try that. > I got the following compiler errors; thank you. I missed it. Regards, -- Yuki Seino NTT DATA CORPORATION
>> Considering both points, I’m leaning toward adding a new GUC parameter >> to control whether detailed logs should be generated for failed >> NOWAIT locks (and possibly for those canceled by lock_timeout). >> Alternatively, if adding a new GUC is not ideal, we could extend >> log_lock_waits to accept a new value like 'error,' which would trigger >> detailed logging when a lock wait fails due to NOWAIT, lock_timeout, >> or cancellation. > That's a good idea. I'll try that. Sorry, I misunderstood. The pid and xid are output during deadlock_timeout. > LOG: process 1443346 still waiting for ShareLock on transaction 770 > after 1000.096 ms > DETAIL: Process holding the lock: 1443241. Wait queue: 1443346. This log output is not triggered by lock_timeout or cancellation. Therefore, it is difficult to support lock_timeout and cancellation. I am thinking of supporting only NOWAIT. What do you think? Regards, -- Yuki Seino NTT DATA CORPORATION
On 2024/10/17 22:15, Seino Yuki wrote: > >>> Considering both points, I’m leaning toward adding a new GUC parameter >>> to control whether detailed logs should be generated for failed >>> NOWAIT locks (and possibly for those canceled by lock_timeout). >>> Alternatively, if adding a new GUC is not ideal, we could extend >>> log_lock_waits to accept a new value like 'error,' which would trigger >>> detailed logging when a lock wait fails due to NOWAIT, lock_timeout, >>> or cancellation. >> That's a good idea. I'll try that. > > Sorry, I misunderstood. > The pid and xid are output during deadlock_timeout. > >> LOG: process 1443346 still waiting for ShareLock on transaction 770 after 1000.096 ms >> DETAIL: Process holding the lock: 1443241. Wait queue: 1443346. > This log output is not triggered by lock_timeout or cancellation. Yes, these log messages with PID or XID are generated by log_lock_waits, not lock_timeout. > Therefore, it is difficult to support lock_timeout and cancellation. > > I am thinking of supporting only NOWAIT. What do you think? I'm not sure why it's challenging to provide detailed log messages for lock waits canceled by lock_timeout or user cancellation, while it's considered feasible for the NOWAIT case. However, I think it's reasonable to implement this feature step by step. We can start by adding support for the NOWAIT case and consider extending it to handle lock_timeout and cancellation scenarios later if possible. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2024/10/18 9:37, Seino Yuki wrote: > Currently, we are discussing two improvements: > > 1. Log output when NOWAIT fails. > 2. Adding control via GUC parameters (NOWAIT, lock_timeout, cancellation). > >> I'm not sure why it's challenging to provide detailed log messages for lock waits canceled >> by lock_timeout or user cancellation, while it's considered feasible for the NOWAIT case. > > Does this statement mean that for 2, why can NOWAIT but lock_timeout,cancellation cannot? > > For item 2, the lock_timeout and cancellation will log outputs after the deadlock_timeout(e.g. 1s) has elapsed (by log_lock_waits). This log message isn't directly related to lock_timeout or cancellations. It appears if log_lock_waits is enabled and the lock wait time exceeds deadlock_timeout, regardless of whether lock_timeout is set or if users cancel the operation. I understood your original proposal as suggesting detailed log output whenever a NOWAIT lock attempt fails. However, as I mentioned earlier, NOWAIT failures can happen frequently, so logging every failure in detail could be too noisy for some users. That’s why I proposed adding a new GUC parameter (or extending log_lock_waits) to control whether such detailed logs should be generated for NOWAIT failures. During our discussion, I also thought it would be useful to provide detailed logs for lock waits canceled by user actions or lock_timeout. This behavior could be controlled by a similar GUC parameter. However, this is an additional feature beyond your original proposal, so I think it's fine to leave it out of the initial patch. > At the time this log is output, it is unclear whether the lock will be cancellation or lock_timeout. > This means that the timing at "error is determined" and "output logged" do not match. > >> However, I think it's reasonable to implement this feature step by step. We can start >> by adding support for the NOWAIT case and consider extending it to handle lock_timeout and >> cancellation scenarios later if possible. > > +1. > I will send the version with the GUC parameter added from the previous patch. The choice between adding a new GUC or extending the existing one (e.g., log_lock_waits) is debatable, but I prefer the latter. I'm considering extending log_lock_waits to accept a value like "fail". If set to "on" (the current behavior), detailed logs are generated when the lock wait time exceeds deadlock_timeout. If set to "fail", logs are generated whenever a lock wait fails. If both are specified, logs would be triggered when the wait time exceeds deadlock_timeout or when a lock wait fails. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
> During our discussion, I also thought it would be useful to provide detailed logs > > for lock waits canceled by user actions or lock_timeout. Thank you. I understand now. You want to output the logs based on a different trigger than log_lock_waits. At first, I found that strange, but I see now that there are cases where it's necessary to output logs for errors even when log_lock_waits is disabled. I have also understood the proposal for improving the GUC parameters. I will revise the patch. Regards, -- Yuki Seino NTT DATA CORPORATION