Thread: server crash => libpq poll() hangs forever (Linux)
Hi, we had a kernel panic crashing our DB server today and all libpq clients (C and Perl clients) got stuck in poll() for hours even after the server was back up, i.e. longer than the tcp timeout should be: #0 0x00002b2283b31c8f in poll () from /lib/libc.so.6 #1 0x00002b228446f4af in PQmblen () from /usr/lib/libpq.so.4 #2 0x00002b228446f590 in pqWaitTimed () from /usr/lib/libpq.so.4 #3 0x00002b228446ee72 in PQgetResult () from /usr/lib/libpq.so.4 #4 0x00002b228446ef4e in PQgetResult () from /usr/lib/libpq.so.4 #5 0x00002b2284341ffe in pg_st_prepare_statement () from /usr/local/lib/perl/5.8.8/auto/DBD/Pg/Pg.so #6 0x00002b228434eb25 in pg_st_execute () [...] It seems that poll() never receives a connection closed notification under Linux (https://lists.linux-foundation.org/pipermail/bugme-new/2003-April/008335.html - very old report, I can't find any newer information), so I am unsure how to handle such a case gracefully. I guess I'm having the same problem as reported in http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg105844.html but there's no real conclusion there. Any suggestions? Can libpq be configured to use epoll or select perhaps? Is the libpq (8.1.19-0etch1) too old? Server version is 8.4.4, using tcp (no SSL). Regards, Marinos
Marinos Yannikos <mjy@geizhals.at> writes: > It seems that poll() never receives a connection closed notification under Linux > (https://lists.linux-foundation.org/pipermail/bugme-new/2003-April/008335.html - > very old report, "very old report" is right. What makes you think that has anything to do with modern kernel versions? regards, tom lane
On 9 Juni, 16:37, t...@sss.pgh.pa.us (Tom Lane) wrote: > Marinos Yannikos <m...@geizhals.at> writes: > > It seems that poll() never receives a connection closed notification under Linux > > (https://lists.linux-foundation.org/pipermail/bugme-new/2003-April/008...- > > very old report, > > "very old report" is right. What makes you think that has anything to > do with modern kernel versions? Interesting. The bug report includes a short code snippet which compiles to a c program, that shows the bug is still present. I'm on bnl@tova:~$ uname -a Linux tova 2.6.31-22-generic #60-Ubuntu SMP Thu May 27 00:22:23 UTC 2010 i686 GNU/Linux is it really so, that the bug is still valid, or does the code snippet show something else? /Björn
Excerpts from björn lundin's message of mié jun 09 16:17:57 -0400 2010: > On 9 Juni, 16:37, t...@sss.pgh.pa.us (Tom Lane) wrote: > > Marinos Yannikos <m...@geizhals.at> writes: > > > It seems that poll() never receives a connection closed notification under Linux > > > (https://lists.linux-foundation.org/pipermail/bugme-new/2003-April/008...- > > > very old report, > > > > "very old report" is right. What makes you think that has anything to > > do with modern kernel versions? > > Interesting. The bug report includes a short code snippet which > compiles to a c program, > that shows the bug is still present. I'm on That test program uses UDP sockets. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
=?ISO-8859-1?Q?bj=F6rn_lundin?= <b.f.lundin@gmail.com> writes: > On 9 Juni, 16:37, t...@sss.pgh.pa.us (Tom Lane) wrote: >> "very old report" is right. �What makes you think that has anything to >> do with modern kernel versions? > Interesting. The bug report includes a short code snippet which > compiles to a c program, > that shows the bug is still present. I'm on Mph. Reading the bug report and the code snippet more closely, the complaint is totally irrelevant to libpq anyway. What he's complaining about is a case where another thread of a multithreaded application close()s the descriptor that a poll() is using. That is *not* related to the other end of the connection closing the connection, which is the case the OP was concerned about. regards, tom lane
Am 09.06.2010 16:37, schrieb Tom Lane: > Marinos Yannikos<mjy@geizhals.at> writes: >> It seems that poll() never receives a connection closed notification under Linux >> (https://lists.linux-foundation.org/pipermail/bugme-new/2003-April/008335.html - >> very old report, > > "very old report" is right. What makes you think that has anything to > do with modern kernel versions? Mainly that I have no other explanation for multiple clients hanging on multiple client machines (all kernel 2.6.18, some were Xen instances, the server uses 2.6.26) in libpq's poll(), for up to 2 days 10 hours after the server crash (until we found/restarted them). 2.6.18 is probably not "modern", but reliable, does anyone have more information regarding poll() changes in more recent kernels? We'll upgrade some boxes and see if anything changes. Regards, Marinos