Re: Idle processes chewing up CPU? - Mailing list pgsql-general

From Brendan Hill
Subject Re: Idle processes chewing up CPU?
Date
Msg-id 000001ca39de$4f660750$ee3215f0$@net
Whole thread Raw
In response to Re: Idle processes chewing up CPU?  (Craig Ringer <craig@postnewspapers.com.au>)
Responses Re: Idle processes chewing up CPU?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Hi Craig, I've debugged the runaway process, though I'm not sure of the
solution yet.

My best interpretation is that an SSL client dirty disconnected while
running a request. This caused an infinite loop in pq_recvbuf(), calling
secure_read(), triggering my_sock_read() over and over. Calling
SSL_get_error() in secure_read() returns 10045 (either connection reset, or
WSAEOPNOTSUPP, I'm not sure) - after this, pq_recvbuf() appears to think
errno=EINTR has occurred, so it immediately tries again.

I'd appreciate any further interpretation and advice on this, how to
diagnose it further, or whether it's been addressed in a later version. Hope
you appreciate I've erred on the side of too much detail in the following
analysis.

As a side note, I had the postgres.exe process in debug mode for 2 hours,
and experienced no server downtime at all, which is good to know.

Regards,
-Brendan

=========================================
Main call stack:
=========================================

>    postgres.exe!pq_recvbuf()  Line 746 + 0x1a bytes
     postgres.exe!pq_getbyte()  Line 795 + 0x5 bytes
     postgres.exe!SocketBackend(StringInfoData * inBuf=0x00000000)  Line
317 + 0x5 bytes
     postgres.exe!PostgresMain(int argc=4, char * * argv=0x0155db48,
const char * username=0x01064e20)  Line 3595 + 0x23 bytes
     postgres.exe!BackendRun(Port * port=0x00bdfd40)  Line 3208
     postgres.exe!SubPostmasterMain(int argc=3, char * * argv=0x01063f28)
Line 3674 + 0x8 bytes
     postgres.exe!main(int argc=3, char * * argv=0x01063f28)  Line 165 +
0x7 bytes
     postgres.exe!__tmainCRTStartup()  Line 597 + 0x17 bytes
     kernel32.dll!_BaseProcessStart@4()  + 0x23 bytes

Relevant variables:

PqRecvLength: 0

PqRecvPointer: 0

PqRecvBuffer: Q...,SELECT COUNT(*) FROM.   (SELECT tgargs from pg_trigger
tr.      LEFT JOIN p-g_depend dep ON dep.objid=tr.oid AND deptype = 'i'.
LEFT JOIN pg_constraint co ON refobjid = co.oid AND contype = 'f'.     WHERE
tgisconstraint AND co.oid IS NULL.     GROUP BY tgargs.    HAVING count(1) =
3) AS foo.M pg_namespace nsp.  LEFT OUTER JOIN pg_description des ON
des.objoid=nsp.oid. WHERE NOT ((nspname = 'pg_catalog' and (SELECT count(*)
FROM pg_class WHERE relname = 'pg_class' AND relnamespace = nsp.oid) > 0)
OR.(nspname = 'pgagent' and (SELECT count(*) FROM pg_class WHERE relname =
'pga_job' AND relnamespace = nsp.oid) > 0) OR.(nspname =
'information_schema' and (SELECT count(*) FROM pg_class WHERE relname =
'tables' AND relnamespace = nsp.oid) >0x00789E10  20 30 29 20 4f 52 0a 28 6e
73 70 6e 61 6d 65 20   0) OR.(nspname = 'dbo' and (SELECT count(*) FROM
pg_class WHERE relname = 'systables' AND relnamespace = nsp.oid) > 0)
OR.(nspname = 'sys' and (SELECT count(*) FROM pg_class WHERE relname =
'all_tables' AND relnamespace = nsp.oid) > 0)). ORDER BY 1, nspname

MyProcPort:    0x00bdfd40
    sock    2044
    proto    196608
    laddr    {addr={...} salen=16 }
    raddr    {addr={...} salen=16 }
    remote_host    0x0106acf0 "<removed>"
    remote_port    0x010694a0 "3307"
    canAcceptConnections    CAC_OK
    database_name    0x01064df8 "<removed>"
    user_name    0x01064e20 "<removed>"
    cmdline_options    0x00000000 <Bad Ptr>
    guc_options    0x00000000 {type=??? length=??? head=??? ...}
    auth_method    uaMD5
    auth_arg    0x00000000 <Bad Ptr>
    md5Salt    0x00bdfe7c "<removed>"
    cryptSalt    0x00bdfe80 "UA"
    SessionStartTime    303975637670000
    default_keepalives_idle    0
    default_keepalives_interval    0
    default_keepalives_count    0
    keepalives_idle    0
    keepalives_interval    0
    keepalives_count    0
    gss    0x010694e0 {outbuf={...} cred=0x00000000 ctx=0x00000000 ...}
    ssl    0x003fbb78 {version=769 type=8192 method=0x1002b298 ...}
    peer    0x00000000 {cert_info=??? sig_alg=??? signature=??? ...}
    peer_dn    0x00bdfeb4 "(anonymous)"
    peer_cn    0x00bdff35 "(anonymous)"
    count    5287    unsigned long

pq_recvbuf() calls "secure_read(MyProcPort, PqRecvBuffer + PqRecvLength,
PQ_BUFFER_SIZE - PqRecvLength);", which returns -1. I've inferred from
control flow that errno=EINTR (can't debug the value unfortunately), so it
continues the loop.


=========================================
Within secure_read()
=========================================

SSL_get_error() returns 10054 (connection reset or WSAEOPNOTSUPP, I'm not
sure)

The switch() command jumps to SSL_ERROR_SYSCALL, then immediately jumps to
SSL_ERROR_NONE !?, and port->count NOT incremented - this was strange,
suggesting a source code mismatch or debugger problem perhaps.

secure_read() returns -1, popping back to pq_recvbuf()


=========================================
Within my_sock_read()
=========================================

secure_read() causes a call to my_sock_read(), which has a separate stack
trace (waiting on single object, I assume):

>    postgres.exe!my_sock_read(bio_st * h=0x003fbcf0, char *
buf=0x0165bc98, int size=5)  Line 443    C
     libeay32.dll!0026a78f()
     [Frames below may be incorrect and/or missing, no symbols loaded for
libeay32.dll]
     mswsock.dll!_WSPEventSelect@16()  + 0x92 bytes
     ffffffff()

Within prepare_for_client_read(): DoingCommandRead = 1

Within EnableNotifyInterrupt(): IsTransactionOrTransactionBlock() = 0,
notifyInterruptOccurred = 0

Within EnableCatchupInterrupt(): catchupInterruptOccurred = 0

After "res = std_sock_read(h, buf, size);" (can't debug result,
unfortunately), (buf, size, h) hadn't change at all

Within client_read_ended(): DoingCommandRead = 1

Exiting my_sock_read() and wandering through some assembly, it finds itself
back into socket_read(), then back in pg_recvbuf(), and the loop continues.



pgsql-general by date:

Previous
From: Jerry LeVan
Date:
Subject: How can I make a two arch libpq for snow leopard?
Next
From: Tom Lane
Date:
Subject: Re: Idle processes chewing up CPU?