Thread: No timeout to establish Connection?
Please forgive my ignorance if this has already been covered, but the list archives are down. (If anyone is interested,the archive page returns: DB err: FATAL: No pg_hba.conf entry for host 64.117.224.193, user pgsql, database 186_archives). Anyway, on to my problem. I am using pg73jdbc3.jar (dated 11/30/2002) on a production Tomcat 4.1.24 Linux server. The PostgreSQLserver is a separate machine running 7.3.1, also on Linux. (Of course I plan on upgrading to 7.3.3 and the latestJDBC driver ASAP). The symptom: Tomcat would almost completely crash whenever the Postgres database was accessed. While I did not have time to fully quantifythe exact Java problem, it appears that the opening of a new Connection set caused the crash. (Note, JVM did notcrash, just all the webapps). My guess is that the the DriverManager.getConnection(...) statement just runs and runsbut never completes. No timeout, so the whole JVM is waiting on it to finish. My workaround is to create a separate thread to get the Connection, then call myThread.join(5000) so I only wait around 5seconds for the Connection. If it didn't return a Connection by then, resume processing in the main thread and abandonthat Thread. Should I be concerned about a memory leak here and try to kill the thread somehow? The cause: The PostgreSQL server had a SCSI RAID backplane start to go bad, with lots of random disk errors. The drives on the badbackplane contained the OS and /usr/local/pgsql, but /usr/local/pgsql/data is on another backplane that was fine. Theserver would generally boot fine and operate normally until a disk error occured. At that point any Linux processes thattried to access drives on the bad backplane would just hang. The kernel still works since it's in memory, it will acceptnew TCP connections, all daemons are still running. Postgres itself still runs fine for existing connections, sincethe postmaster was loaded in memory and the data drives were still accessible on the good backplane. I presume thatopening a new Connection would create a new pid, which the kernel would try to write to the bad drives....and hang. The real question: Is there a connection timeout in the JDBC driver? Should there be? This seems like a reasonable situation to provide handlingfor. Thanks for listening to my problem! Roman Fail POS Portal, Inc. 916-563-1943
Scot: That's a great suggestion, and one that I will definitely make use of in the future. However, the hardware has now beenreplaced and everything is working again. It's not the kind of problem that I can think of a way to replicate, so I'mstuck providing anecdotes and not hard facts. Thanks for your input! Roman -----Original Message----- From: Scot Floess [mailto:floesss@dstm.com] Sent: Thu 7/3/2003 12:11 PM To: Roman Fail; pgsql-jdbc@postgresql.org Cc: Subject: RE: [JDBC] No timeout to establish Connection? Roman: I really don't have an answer to this...but I would like to make a comment. You "think" that your code is stuck in DriverManager.getConnection(). To confirm, you can issue a kill -QUIT on the pid of the java process. This will give you a thread dump of all threads...which will include what methods are currently executing. If you do this twice over a period of time you might better see where the problem is...for instance it may not be DriverManager.getConnection(). Again, I really don't have an answer but I find issuing the kill -QUIT <PID> to be very useful when debugging java problems... Scot -----Original Message----- From: Roman Fail [mailto:rfail@posportal.com] Sent: Thursday, July 03, 2003 2:40 PM To: pgsql-jdbc@postgresql.org Subject: [JDBC] No timeout to establish Connection? Please forgive my ignorance if this has already been covered, but the list archives are down. (If anyone is interested, the archive page returns: DB err: FATAL: No pg_hba.conf entry for host 64.117.224.193, user pgsql, database 186_archives). Anyway, on to my problem. I am using pg73jdbc3.jar (dated 11/30/2002) on a production Tomcat 4.1.24 Linux server. The PostgreSQL server is a separate machine running 7.3.1, also on Linux. (Of course I plan on upgrading to 7.3.3 and the latest JDBC driver ASAP). The symptom: Tomcat would almost completely crash whenever the Postgres database was accessed. While I did not have time to fully quantify the exact Java problem, it appears that the opening of a new Connection set caused the crash. (Note, JVM did not crash, just all the webapps). My guess is that the the DriverManager.getConnection(...) statement just runs and runs but never completes. No timeout, so the whole JVM is waiting on it to finish. My workaround is to create a separate thread to get the Connection, then call myThread.join(5000) so I only wait around 5 seconds for the Connection. If it didn't return a Connection by then, resume processing in the main thread and abandon that Thread. Should I be concerned about a memory leak here and try to kill the thread somehow? The cause: The PostgreSQL server had a SCSI RAID backplane start to go bad, with lots of random disk errors. The drives on the bad backplane contained the OS and /usr/local/pgsql, but /usr/local/pgsql/data is on another backplane that was fine. The server would generally boot fine and operate normally until a disk error occured. At that point any Linux processes that tried to access drives on the bad backplane would just hang. The kernel still works since it's in memory, it will accept new TCP connections, all daemons are still running. Postgres itself still runs fine for existing connections, since the postmaster was loaded in memory and the data drives were still accessible on the good backplane. I presume that opening a new Connection would create a new pid, which the kernel would try to write to the bad drives....and hang. The real question: Is there a connection timeout in the JDBC driver? Should there be? This seems like a reasonable situation to provide handling for. Thanks for listening to my problem! Roman Fail POS Portal, Inc. 916-563-1943 ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Roman: I really don't have an answer to this...but I would like to make a comment. You "think" that your code is stuck in DriverManager.getConnection(). To confirm, you can issue a kill -QUIT on the pid of the java process. This will give you a thread dump of all threads...which will include what methods are currently executing. If you do this twice over a period of time you might better see where the problem is...for instance it may not be DriverManager.getConnection(). Again, I really don't have an answer but I find issuing the kill -QUIT <PID> to be very useful when debugging java problems... Scot -----Original Message----- From: Roman Fail [mailto:rfail@posportal.com] Sent: Thursday, July 03, 2003 2:40 PM To: pgsql-jdbc@postgresql.org Subject: [JDBC] No timeout to establish Connection? Please forgive my ignorance if this has already been covered, but the list archives are down. (If anyone is interested, the archive page returns: DB err: FATAL: No pg_hba.conf entry for host 64.117.224.193, user pgsql, database 186_archives). Anyway, on to my problem. I am using pg73jdbc3.jar (dated 11/30/2002) on a production Tomcat 4.1.24 Linux server. The PostgreSQL server is a separate machine running 7.3.1, also on Linux. (Of course I plan on upgrading to 7.3.3 and the latest JDBC driver ASAP). The symptom: Tomcat would almost completely crash whenever the Postgres database was accessed. While I did not have time to fully quantify the exact Java problem, it appears that the opening of a new Connection set caused the crash. (Note, JVM did not crash, just all the webapps). My guess is that the the DriverManager.getConnection(...) statement just runs and runs but never completes. No timeout, so the whole JVM is waiting on it to finish. My workaround is to create a separate thread to get the Connection, then call myThread.join(5000) so I only wait around 5 seconds for the Connection. If it didn't return a Connection by then, resume processing in the main thread and abandon that Thread. Should I be concerned about a memory leak here and try to kill the thread somehow? The cause: The PostgreSQL server had a SCSI RAID backplane start to go bad, with lots of random disk errors. The drives on the bad backplane contained the OS and /usr/local/pgsql, but /usr/local/pgsql/data is on another backplane that was fine. The server would generally boot fine and operate normally until a disk error occured. At that point any Linux processes that tried to access drives on the bad backplane would just hang. The kernel still works since it's in memory, it will accept new TCP connections, all daemons are still running. Postgres itself still runs fine for existing connections, since the postmaster was loaded in memory and the data drives were still accessible on the good backplane. I presume that opening a new Connection would create a new pid, which the kernel would try to write to the bad drives....and hang. The real question: Is there a connection timeout in the JDBC driver? Should there be? This seems like a reasonable situation to provide handling for. Thanks for listening to my problem! Roman Fail POS Portal, Inc. 916-563-1943 ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org