Re: Idle processes chewing up CPU? - Mailing list pgsql-general
From | Brendan Hill |
---|---|
Subject | Re: Idle processes chewing up CPU? |
Date | |
Msg-id | 000001ca39de$4f660750$ee3215f0$@net Whole thread Raw |
In response to | Re: Idle processes chewing up CPU? (Craig Ringer <craig@postnewspapers.com.au>) |
Responses |
Re: Idle processes chewing up CPU?
|
List | pgsql-general |
Hi Craig, I've debugged the runaway process, though I'm not sure of the solution yet. My best interpretation is that an SSL client dirty disconnected while running a request. This caused an infinite loop in pq_recvbuf(), calling secure_read(), triggering my_sock_read() over and over. Calling SSL_get_error() in secure_read() returns 10045 (either connection reset, or WSAEOPNOTSUPP, I'm not sure) - after this, pq_recvbuf() appears to think errno=EINTR has occurred, so it immediately tries again. I'd appreciate any further interpretation and advice on this, how to diagnose it further, or whether it's been addressed in a later version. Hope you appreciate I've erred on the side of too much detail in the following analysis. As a side note, I had the postgres.exe process in debug mode for 2 hours, and experienced no server downtime at all, which is good to know. Regards, -Brendan ========================================= Main call stack: ========================================= > postgres.exe!pq_recvbuf() Line 746 + 0x1a bytes postgres.exe!pq_getbyte() Line 795 + 0x5 bytes postgres.exe!SocketBackend(StringInfoData * inBuf=0x00000000) Line 317 + 0x5 bytes postgres.exe!PostgresMain(int argc=4, char * * argv=0x0155db48, const char * username=0x01064e20) Line 3595 + 0x23 bytes postgres.exe!BackendRun(Port * port=0x00bdfd40) Line 3208 postgres.exe!SubPostmasterMain(int argc=3, char * * argv=0x01063f28) Line 3674 + 0x8 bytes postgres.exe!main(int argc=3, char * * argv=0x01063f28) Line 165 + 0x7 bytes postgres.exe!__tmainCRTStartup() Line 597 + 0x17 bytes kernel32.dll!_BaseProcessStart@4() + 0x23 bytes Relevant variables: PqRecvLength: 0 PqRecvPointer: 0 PqRecvBuffer: Q...,SELECT COUNT(*) FROM. (SELECT tgargs from pg_trigger tr. LEFT JOIN p-g_depend dep ON dep.objid=tr.oid AND deptype = 'i'. LEFT JOIN pg_constraint co ON refobjid = co.oid AND contype = 'f'. WHERE tgisconstraint AND co.oid IS NULL. GROUP BY tgargs. HAVING count(1) = 3) AS foo.M pg_namespace nsp. LEFT OUTER JOIN pg_description des ON des.objoid=nsp.oid. WHERE NOT ((nspname = 'pg_catalog' and (SELECT count(*) FROM pg_class WHERE relname = 'pg_class' AND relnamespace = nsp.oid) > 0) OR.(nspname = 'pgagent' and (SELECT count(*) FROM pg_class WHERE relname = 'pga_job' AND relnamespace = nsp.oid) > 0) OR.(nspname = 'information_schema' and (SELECT count(*) FROM pg_class WHERE relname = 'tables' AND relnamespace = nsp.oid) >0x00789E10 20 30 29 20 4f 52 0a 28 6e 73 70 6e 61 6d 65 20 0) OR.(nspname = 'dbo' and (SELECT count(*) FROM pg_class WHERE relname = 'systables' AND relnamespace = nsp.oid) > 0) OR.(nspname = 'sys' and (SELECT count(*) FROM pg_class WHERE relname = 'all_tables' AND relnamespace = nsp.oid) > 0)). ORDER BY 1, nspname MyProcPort: 0x00bdfd40 sock 2044 proto 196608 laddr {addr={...} salen=16 } raddr {addr={...} salen=16 } remote_host 0x0106acf0 "<removed>" remote_port 0x010694a0 "3307" canAcceptConnections CAC_OK database_name 0x01064df8 "<removed>" user_name 0x01064e20 "<removed>" cmdline_options 0x00000000 <Bad Ptr> guc_options 0x00000000 {type=??? length=??? head=??? ...} auth_method uaMD5 auth_arg 0x00000000 <Bad Ptr> md5Salt 0x00bdfe7c "<removed>" cryptSalt 0x00bdfe80 "UA" SessionStartTime 303975637670000 default_keepalives_idle 0 default_keepalives_interval 0 default_keepalives_count 0 keepalives_idle 0 keepalives_interval 0 keepalives_count 0 gss 0x010694e0 {outbuf={...} cred=0x00000000 ctx=0x00000000 ...} ssl 0x003fbb78 {version=769 type=8192 method=0x1002b298 ...} peer 0x00000000 {cert_info=??? sig_alg=??? signature=??? ...} peer_dn 0x00bdfeb4 "(anonymous)" peer_cn 0x00bdff35 "(anonymous)" count 5287 unsigned long pq_recvbuf() calls "secure_read(MyProcPort, PqRecvBuffer + PqRecvLength, PQ_BUFFER_SIZE - PqRecvLength);", which returns -1. I've inferred from control flow that errno=EINTR (can't debug the value unfortunately), so it continues the loop. ========================================= Within secure_read() ========================================= SSL_get_error() returns 10054 (connection reset or WSAEOPNOTSUPP, I'm not sure) The switch() command jumps to SSL_ERROR_SYSCALL, then immediately jumps to SSL_ERROR_NONE !?, and port->count NOT incremented - this was strange, suggesting a source code mismatch or debugger problem perhaps. secure_read() returns -1, popping back to pq_recvbuf() ========================================= Within my_sock_read() ========================================= secure_read() causes a call to my_sock_read(), which has a separate stack trace (waiting on single object, I assume): > postgres.exe!my_sock_read(bio_st * h=0x003fbcf0, char * buf=0x0165bc98, int size=5) Line 443 C libeay32.dll!0026a78f() [Frames below may be incorrect and/or missing, no symbols loaded for libeay32.dll] mswsock.dll!_WSPEventSelect@16() + 0x92 bytes ffffffff() Within prepare_for_client_read(): DoingCommandRead = 1 Within EnableNotifyInterrupt(): IsTransactionOrTransactionBlock() = 0, notifyInterruptOccurred = 0 Within EnableCatchupInterrupt(): catchupInterruptOccurred = 0 After "res = std_sock_read(h, buf, size);" (can't debug result, unfortunately), (buf, size, h) hadn't change at all Within client_read_ended(): DoingCommandRead = 1 Exiting my_sock_read() and wandering through some assembly, it finds itself back into socket_read(), then back in pg_recvbuf(), and the loop continues.
pgsql-general by date: