Thread: PostgreSQL debug log doesn't record whole procedure(from receiving request to sending response)
PostgreSQL debug log doesn't record whole procedure(from receiving request to sending response)
Hi,
We recently encounter one issue about PostgreSQL ODBC client doesn’t receive response from PostgreSQL server in time (client set 5 seconds timeout) occasionally (1 or 2 times per 24 hours). Both PostgreSQL and its client are deployed in VM against. It took us days to debug where cause the timeout. We enable PostgreSQL server debug log via below configuration.
log_min_messages = debug5
log_min_error_statement = debug5
log_line_prefix = '[%p-%c-%l] %m <%h %i> '
log_error_verbosity = verbose
log_statement = 'all'
logging_collector = on
log_destination = 'stderr'
log_directory = '/tmp'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_duration = on
log_autovacuum_min_duration = 2
Client log indicate it received response 10 seconds after sending the request:
Nov 13 22:25:58 CFPU-0 rnwman[2701607]: RU:OMU-0|PNAME:QNRNWMAN|LPID:2701607|PID:0x50A 0x11 0x0|TIME:2021-11-14 00:25:58:050678|TEXT:WQTST before call db:1|DATA:
Nov 13 22:25:58 CFPU-0 raktor[2701622]: RU:OMU-0|PNAME:QNRAKTOR|LPID:2701622|PID:0x4EE 0x31 0x0|TIME:2021-11-14 00:26:03:054718|TEXT:RAV:set_wcel_state_r-db response timeout|DATA:
Nov 13 22:26:08 CFPU-0 rnwman[2701607]: RU:OMU-0|PNAME:QNRNWMAN|LPID:2701607|PID:0x50A 0x11 0x0|TIME:2021-11-14 00:26:08:012336|TEXT:RAV Exit: set_wcel_state__r:= |DATA:
Nov 13 22:26:08 CFPU-0 rnwman[2701607]: RU:OMU-0|PNAME:QNRNWMAN|LPID:2701607|PID:0x50A 0x11 0x0|TIME:2021-11-14 00:26:08:012501|TEXT:WQTST after call db:1|DATA:
While PostgreSQL log recorded for the procedure indicate it executed quickly.
[2701833-618d1b70.293a09-173237] 2021-11-13 22:25:58.050 GMT <169.254.0.21 idle> DEBUG: 00000: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
[2701833-618d1b70.293a09-173238] 2021-11-13 22:25:58.050 GMT <169.254.0.21 idle> LOCATION: ShowTransactionStateRec, xact.c:5333
[2701833-618d1b70.293a09-173239] 2021-11-13 22:25:58.050 GMT <169.254.0.21 idle> STATEMENT: DEALLOCATE "_PLAN0x7f2ec021ab60"
[2701833-618d1b70.293a09-173240] 2021-11-13 22:25:58.050 GMT <169.254.0.21 idle> LOG: 00000: statement: DEALLOCATE "_PLAN0x7f2ec021ab60"
[2701833-618d1b70.293a09-173241] 2021-11-13 22:25:58.050 GMT <169.254.0.21 idle> LOCATION: exec_simple_query, postgres.c:1042
[2701833-618d1b70.293a09-173242] 2021-11-13 22:25:58.050 GMT <169.254.0.21 DEALLOCATE> DEBUG: 00000: CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 0/1/0
[2701833-618d1b70.293a09-173243] 2021-11-13 22:25:58.050 GMT <169.254.0.21 DEALLOCATE> LOCATION: ShowTransactionStateRec, xact.c:5333
[2701833-618d1b70.293a09-173244] 2021-11-13 22:25:58.050 GMT <169.254.0.21 DEALLOCATE> STATEMENT: DEALLOCATE "_PLAN0x7f2ec021ab60"
[2701833-618d1b70.293a09-173245] 2021-11-13 22:25:58.050 GMT <169.254.0.21 DEALLOCATE> LOG: 00000: duration: 0.098 ms
[2701833-618d1b70.293a09-173246] 2021-11-13 22:25:58.050 GMT <169.254.0.21 DEALLOCATE> LOCATION: exec_simple_query, postgres.c:1287
[2701833-618d1b70.293a09-173247] 2021-11-13 22:25:58.050 GMT <169.254.0.21 PARSE> DEBUG: 00000: parse _PLAN0x7f2ec021ab60: UPDATE act_wcel SET wcel_state=$1 WHERE wbts_id=74 and lcr_id=7402
[2701833-618d1b70.293a09-173248] 2021-11-13 22:25:58.050 GMT <169.254.0.21 PARSE> LOCATION: exec_parse_message, postgres.c:1342
[2701833-618d1b70.293a09-173249] 2021-11-13 22:25:58.050 GMT <169.254.0.21 PARSE> STATEMENT: UPDATE act_wcel SET wcel_state=$1 WHERE wbts_id=74 and lcr_id=7402
[2701833-618d1b70.293a09-173250] 2021-11-13 22:25:58.051 GMT <169.254.0.21 PARSE> DEBUG: 00000: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
[2701833-618d1b70.293a09-173251] 2021-11-13 22:25:58.051 GMT <169.254.0.21 PARSE> LOCATION: ShowTransactionStateRec, xact.c:5333
[2701833-618d1b70.293a09-173252] 2021-11-13 22:25:58.051 GMT <169.254.0.21 PARSE> STATEMENT: UPDATE act_wcel SET wcel_state=$1 WHERE wbts_id=74 and lcr_id=7402
[2701833-618d1b70.293a09-173253] 2021-11-13 22:25:58.051 GMT <169.254.0.21 PARSE> LOG: 00000: duration: 0.089 ms
[2701833-618d1b70.293a09-173254] 2021-11-13 22:25:58.051 GMT <169.254.0.21 PARSE> LOCATION: exec_parse_message, postgres.c:1550
[2701833-618d1b70.293a09-173255] 2021-11-13 22:25:58.051 GMT <169.254.0.21 PARSE> DEBUG: 00000: CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 0/1/0
[2701833-618d1b70.293a09-173256] 2021-11-13 22:25:58.051 GMT <169.254.0.21 PARSE> LOCATION: ShowTransactionStateRec, xact.c:5333
[2701833-618d1b70.293a09-173257] 2021-11-13 22:25:58.051 GMT <169.254.0.21 idle> DEBUG: 00000: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
[2701833-618d1b70.293a09-173258] 2021-11-13 22:25:58.051 GMT <169.254.0.21 idle> LOCATION: ShowTransactionStateRec, xact.c:5333
[2701833-618d1b70.293a09-173259] 2021-11-13 22:25:58.051 GMT <169.254.0.21 idle> DEBUG: 00000: CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 0/1/0
[2701833-618d1b70.293a09-173260] 2021-11-13 22:25:58.051 GMT <169.254.0.21 idle> LOCATION: ShowTransactionStateRec, xact.c:5333
[2701833-618d1b70.293a09-173261] 2021-11-13 22:25:58.051 GMT <169.254.0.21 idle> DEBUG: 00000: bind <unnamed> to _PLAN0x7f2ec021ab60
[2701833-618d1b70.293a09-173262] 2021-11-13 22:25:58.051 GMT <169.254.0.21 idle> LOCATION: exec_bind_message, postgres.c:1600
[2701833-618d1b70.293a09-173263] 2021-11-13 22:25:58.051 GMT <169.254.0.21 BIND> DEBUG: 00000: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
[2701833-618d1b70.293a09-173264] 2021-11-13 22:25:58.051 GMT <169.254.0.21 BIND> LOCATION: ShowTransactionStateRec, xact.c:5333
[2701833-618d1b70.293a09-173265] 2021-11-13 22:25:58.051 GMT <169.254.0.21 BIND> STATEMENT: UPDATE act_wcel SET wcel_state=$1 WHERE wbts_id=74 and lcr_id=7402
[2701833-618d1b70.293a09-173266] 2021-11-13 22:25:58.051 GMT <169.254.0.21 BIND> LOG: 00000: duration: 0.107 ms
[2701833-618d1b70.293a09-173267] 2021-11-13 22:25:58.051 GMT <169.254.0.21 BIND> LOCATION: exec_bind_message, postgres.c:1917
[2701833-618d1b70.293a09-173268] 2021-11-13 22:25:58.051 GMT <169.254.0.21 UPDATE> LOG: 00000: execute _PLAN0x7f2ec021ab60: UPDATE act_wcel SET wcel_state=$1 WHERE wbts_id=74 and lcr_id=7402
[2701833-618d1b70.293a09-173269] 2021-11-13 22:25:58.051 GMT <169.254.0.21 UPDATE> DETAIL: parameters: $1 = '0'
[2701833-618d1b70.293a09-173270] 2021-11-13 22:25:58.051 GMT <169.254.0.21 UPDATE> LOCATION: exec_execute_message, postgres.c:2055
[2701833-618d1b70.293a09-173271] 2021-11-13 22:25:58.051 GMT <169.254.0.21 UPDATE> LOG: 00000: duration: 0.080 ms
[2701833-618d1b70.293a09-173272] 2021-11-13 22:25:58.051 GMT <169.254.0.21 UPDATE> LOCATION: exec_execute_message, postgres.c:2138
[2701833-618d1b70.293a09-173273] 2021-11-13 22:25:58.051 GMT <169.254.0.21 UPDATE> DEBUG: 00000: CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 23280/1/1
[2701833-618d1b70.293a09-173274] 2021-11-13 22:25:58.051 GMT <169.254.0.21 UPDATE> LOCATION: ShowTransactionStateRec, xact.c:5333
Finally, the tcpdump indicate the delay happened at PostgreSQL server.
This mail is not to investigate why PostgreSQL server delay sending the response message. We suspect the backend I/O performance issue.
This mail is to ask why PostgreSQL debug log doesn’t really include the response message delay which may cause misleading why troubleshooting. It looks to me the debug log doesn’t record the whole procedure. If there are some developer options include the missing part but we didn’t enable?
Best Regards
It always takes longer than you expect,
even when you take into account
----------------------------------------------------------------------------------------------
Yan-Jack Chen (陈雁)
Tel: +8613957141340
Addr: No.567 XinCheng Rd, Binjiang District, Hangzhou, China, 310053
Attachment
Re: PostgreSQL debug log doesn't record whole procedure(from receiving request to sending response)
"Chen, Yan-Jack (NSB - CN/Hangzhou)" <yan-jack.chen@nokia-sbell.com> writes: > We recently encounter one issue about PostgreSQL ODBC client doesn¡¯t receive response from PostgreSQL server in time(client set 5 seconds timeout) occasionally (1 or 2 times per 24 hours). Both PostgreSQL and its client are deployedin VM against. It took us days to debug where cause the timeout. We enable PostgreSQL server debug log via belowconfiguration. We can see from this that the server spent 10 seconds in CommitTransaction, so the question is what took so long. I'd wonder first about end-of-transaction triggers (have you got foreign keys on that table? maybe an event trigger?), and then second about delays in writing/fsyncing WAL (what's the underlying storage? do you have synchronous_commit replication turned on?). > This mail is to ask why PostgreSQL debug log doesn¡¯t really include the response message delay which may cause misleadingwhy troubleshooting. It looks to me the debug log doesn¡¯t record the whole procedure. If there are some developeroptions include the missing part but we didn¡¯t enable? [ shrug... ] Sure, we could put an elog(DEBUG) after every line of code in the server, and then high-level debugging logs would be even more impossibly voluminous than they are now. I'd say the existing logging gave you plenty of clue where to look. regards, tom lane
RE: PostgreSQL debug log doesn't record whole procedure(from receiving request to sending response)
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > We can see from this that the server spent 10 seconds in CommitTransaction, so the question is what took so long. I'd wonder first about end-of-transaction triggers (have you got foreign keys on that table? maybe an event trigger?), andthen second about delays in writing/fsyncing WAL (what's the underlying storage? do you have synchronous_commit replicationturned on?). We have the same as your second suspicion. Delays in fsyncing WAL as the backend ceph cluster storage and we see there wassignificant latency increased during that time. Yes. the synchronous_commit is on. fsync | on | Forces synchronizationof updates to disk. synchronous_commit | on | Sets the current transaction's synchronizationlevel. > [ shrug... ] Sure, we could put an elog(DEBUG) after every line of code in the server, and then high-level debugging logswould be even more impossibly voluminous than they are now. I'd say the existing logging gave you plenty of clue whereto look. Actually, do not need to write log for every line code. The last debug log we can see is for CommitTransaction which stateis INPROGRESS. But we can't see when CommitTransaction state is DONE/COMPLETE from the debug log which I think thiskind of log is valuable which can help to identify where the delay comes, from server or client from the postgres serverdebug log only shall be enough without to monitoring and analyzing the TCP message. [2701833-618d1b70.293a09-173273] 2021-11-13 22:25:58.051 GMT <169.254.0.21 UPDATE> DEBUG: 00000: CommitTransaction(1) name:unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 23280/1/1 [2701833-618d1b70.293a09-173274] 2021-11-13 22:25:58.051 GMT <169.254.0.21 UPDATE> LOCATION: ShowTransactionStateRec, xact.c:5333 Best Regards It always takes longer than you expect, even when you take into account ---------------------------------------------------------------------------------------------- Yan-Jack Chen (陈雁) Tel: +8613957141340 Addr: No.567 XinCheng Rd, Binjiang District, Hangzhou, China, 310053 -----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: 2021年11月15日 23:09 To: Chen, Yan-Jack (NSB - CN/Hangzhou) <yan-jack.chen@nokia-sbell.com> Cc: pgsql-general@postgresql.org Subject: Re: PostgreSQL debug log doesn't record whole procedure(from receiving request to sending response) "Chen, Yan-Jack (NSB - CN/Hangzhou)" <yan-jack.chen@nokia-sbell.com> writes: > We recently encounter one issue about PostgreSQL ODBC client doesn¡¯t receive response from PostgreSQL server in time(client set 5 seconds timeout) occasionally (1 or 2 times per 24 hours). Both PostgreSQL and its client are deployedin VM against. It took us days to debug where cause the timeout. We enable PostgreSQL server debug log via belowconfiguration. We can see from this that the server spent 10 seconds in CommitTransaction, so the question is what took so long. I'd wonder first about end-of-transaction triggers (have you got foreign keys on that table? maybe an event trigger?), andthen second about delays in writing/fsyncing WAL (what's the underlying storage? do you have synchronous_commit replicationturned on?). > This mail is to ask why PostgreSQL debug log doesn¡¯t really include the response message delay which may cause misleadingwhy troubleshooting. It looks to me the debug log doesn¡¯t record the whole procedure. If there are some developeroptions include the missing part but we didn¡¯t enable? [ shrug... ] Sure, we could put an elog(DEBUG) after every line of code in the server, and then high-level debugging logswould be even more impossibly voluminous than they are now. I'd say the existing logging gave you plenty of clue whereto look. regards, tom lane