Thread: Deadlock detection
As we know from recent versions, the JDBC driver can deadlock. We also know that the deadlock detector as exists does not detect all types of deadlock and isn't foolproof at that either. Not criticism, just summarising. We've recently tried to diagnose some sporadic hang situations and would love to be able to rule out the JDBC driver. This has not yet been possible. (Substantial work has already been done server side, so this is not finger pointing, just working backwards). Currently we can take stack traces, but knowing when to take them is key and since in problem situations the priority is often on cure rather than prevention it has not so far been possible to acquire useful diagnostic information. It is possible that the cases under investigation are linked to the occasional reports we get on various lists about idle-in-transaction. On server side, I'm expecting to implement SIGINT to cancel idle-in-transaction sessions. This may just further bury any problems in the driver. The best solution would be to have a logging mode that would advise us of the presence of a deadlock and log details for later analysis. If this worked in a objective way that could be sure of detecting the general case we would be in a much better situation. We may discover some unknown cases, though if not we can at least rule out the driver from these situations and we know how important that can be in complex failure analysis. With me so far? Perhaps, perhaps not. I'm thinking of some simple bookkeeping, so that before any session issues a blockable call it takes note that it is doing so in a shared array, and clears that state again once released. A single watcher thread can then regularly scan the state array and see if a session is marked as waiting on both send and receive. Side threads are already used for login timeout handling, so that mechanism is already in use. Some loss of performance would be acceptable. Would you agree this would be worthwhile? Any practical blockers? If it is unhelpful in this specific case, would that feature be generally usable? If not, other suggestions and viewpoints welcome. Thanks, You might say the best way is to fix the underlying architecture. May be, but we are a long way from that and even if we did, some second and third order effects would still exist for which we need new log modes to diagnose correctly. Is there a deadline for the JDBC driver for 8.4? (When was it?) -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Simon Riggs wrote: > I'm thinking of some simple bookkeeping, so that before any session > issues a blockable call it takes note that it is doing so in a shared > array, and clears that state again once released. A single watcher > thread can then regularly scan the state array and see if a session is > marked as waiting on both send and receive. Side threads are already > used for login timeout handling, so that mechanism is already in use. > Some loss of performance would be acceptable. I don't see how this can work. The whole point is that if we're blocked on send, we're *not* blocked on receive by definition. -O
Simon Riggs wrote: > If not, other suggestions and viewpoints welcome. Thanks, If you want a minimally invasive approach (in terms of rewriting driver code) and some extra overhead isn't a problem then I would suggest something like this: (1) spawn a separate write thread for each connection; it reads from an unbounded buffer (2)instead of writes from the thread running a query going directly to the real stream, they instead go into the buffer (without blocking), and the corresponding write thread picks them up for immediate writing (3) implement OutputStream.flush() to wait for the buffer to empty, *with a timeout*. If you hit the timeout while waiting to flush, then you've detected a potential deadlock case and can dump out some state and then continue anyway. The usual behaviour of the driver is (approximately): write some query data, write a flush or sync message, flush the socket output stream, start to read results. In the deadlock case, we run out of (socket) write buffer space at some point before completing the stream flush, and the socket buffer never empties because the server is blocked trying to write to us. With the above changes, in that case the write thread will block and the internal Java-side write buffer will grow. Then OutputStream.flush() will block and eventually timeout, and the deadlock should be broken. The obvious costs of this are the extra thread per connection, the context switches needed to actually do the writes, and however much space you need for all the write buffers. Turn OutputStream.flush() into a no-op and you actually have a fix for the deadlock case entirely - it's just not a very cheap fix. You may want to put an upper bound on the internal buffer size and throw IOException if you hit it. I suppose that a connection error is better than either a silent deadlock or exhausting your heap.. -O
Oliver Jowett wrote: > If you want a minimally invasive approach (in terms of rewriting driver > code) and some extra overhead isn't a problem then I would suggest > something like this: > > (1) spawn a separate write thread for each connection; it reads from an > unbounded buffer > > (2)instead of writes from the thread running a query going directly to > the real stream, they instead go into the buffer (without blocking), and > the corresponding write thread picks them up for immediate writing > > (3) implement OutputStream.flush() to wait for the buffer to empty, > *with a timeout*. If you hit the timeout while waiting to flush, then > you've detected a potential deadlock case and can dump out some state > and then continue anyway. Also, if it wasn't obvious, I'm thinking you do this by a custom OutputStream implementation that just sits on top of the real stream; you don't need to change any of the callers so it's just a bit of setup code then the vast majority of the driver guts are unchanged. You could even make it conditional on a connection parameter .. -O
Simon Riggs wrote: > As we know from recent versions, the JDBC driver can deadlock. We also > know that the deadlock detector as exists does not detect all types of > deadlock and isn't foolproof at that either. Not criticism, just > summarising. Do you mean the potential read-write socket deadlock (where we're blocked on write waiting for the server to consume some data, but it's blocked waiting for us to consume some data), or something else? -O
Simon Riggs wrote: > I think your proposal would work, but sounds fairly invasive. I'm > concerned that if we interfere with the main line of processing we'll > just introduce a new set of bugs that won't move us forwards. I would > prefer a "watcher" approach, which doesn't directly interfere with the > main line of processing and can be turned on/off. Perhaps it isn't > possible? It's not really invasive at all, you need an OutputStream implementation (that is completely new code and separate from the rest of the driver) and a small amount of setup code when first establishing the connection to hook it in as the stream that's actually written to. The setup code can be conditional. The rest of the driver doesn't change at all. > How about a watcher thread that waits for main thread to respond, if no > response within a timeout it wakes. That could work. Note that there is no single "main thread", it's just whatever application threads happen to be running queries. You would need to arm the watchdog before any write or flush, and disarm it before reading or returning to the client (via a normal return or an exception). If the watchdog on a particular thread stays armed for more than some period, you trigger a warning. It might be tricky to get useful state out of the driver when that timer goes off - you'll be in the wrong thread to get a useful stacktrace at that point, I guess you'll have to go and abruptly close the underlying socket to unblock the deadlocked query thread - which means you need to have some way to correlate threads with sockets. And you kill the connection by closing down the socket, obviously. Seems like it would get a bit complicated.. -O
On Wed, 2009-01-21 at 23:38 +1300, Oliver Jowett wrote: > Simon Riggs wrote: > > > If not, other suggestions and viewpoints welcome. Thanks, > > If you want a minimally invasive approach (in terms of rewriting driver > code) and some extra overhead isn't a problem then I would suggest > something like this: ... > The obvious costs of this are the extra thread per connection, the > context switches needed to actually do the writes, and however much > space you need for all the write buffers. Thanks very much for such a detailed design. I'd like to identify situations where deadlocks occur so we can fix them, or confirm they haven't happened at all. I think your proposal would work, but sounds fairly invasive. I'm concerned that if we interfere with the main line of processing we'll just introduce a new set of bugs that won't move us forwards. I would prefer a "watcher" approach, which doesn't directly interfere with the main line of processing and can be turned on/off. Perhaps it isn't possible? How about a watcher thread that waits for main thread to respond, if no response within a timeout it wakes. We then pass information to another agent which can compare against server-side information and identify deadlocks. A client/server deadlock detector, rather than within the driver as I described before. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Simon Riggs schrieb: > As we know from recent versions, the JDBC driver can deadlock. We also > know that the deadlock detector as exists does not detect all types of > deadlock and isn't foolproof at that either. Not criticism, just > summarising. > I don't know if this of help for you, but one case of deadlocking can occur because of filled stream buffers on both ends of the connections. I had a similar situation where I issued a large script with 100.000 INSERT statements to the database by JDBC. Those Statements all returned their answer bytes after each invokation on the server, then those bytes filled the client receive buffer, after that the server send buffer got filled, and then the server process got blocked. Now the receive buffer of the server fills, and then the client send buffer. I "solved" the problem for scripts of our size by drastically increasing the input buffer size on the client side. I modified PGStream.java to include the following line in "changeConnection()": connection.setReceiveBufferSize(256*1024); I don't know if there is a more elegant way to handle this, but for us it works. Maybe for you, too. As far as i know (which is not much about the driver internals) this is the only way to make the driver hang. With best regards, Daniel Migowski
On Thu, 2009-01-22 at 00:38 +1300, Oliver Jowett wrote: > Note that there is no single "main thread" Yeh, I meant "the thread that executes the code for that session". -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
On Thu, 2009-01-22 at 00:38 +1300, Oliver Jowett wrote: > Simon Riggs wrote: > > > I think your proposal would work, but sounds fairly invasive. I'm > > concerned that if we interfere with the main line of processing we'll > > just introduce a new set of bugs that won't move us forwards. I would > > prefer a "watcher" approach, which doesn't directly interfere with the > > main line of processing and can be turned on/off. Perhaps it isn't > > possible? > > It's not really invasive at all, you need an OutputStream implementation > (that is completely new code and separate from the rest of the driver) > and a small amount of setup code when first establishing the connection > to hook it in as the stream that's actually written to. The setup code > can be conditional. The rest of the driver doesn't change at all. > > > How about a watcher thread that waits for main thread to respond, if no > > response within a timeout it wakes. > > That could work. Note that there is no single "main thread", it's just > whatever application threads happen to be running queries. You would > need to arm the watchdog before any write or flush, and disarm it before > reading or returning to the client (via a normal return or an > exception). If the watchdog on a particular thread stays armed for more > than some period, you trigger a warning. > > It might be tricky to get useful state out of the driver when that timer > goes off - you'll be in the wrong thread to get a useful stacktrace at > that point, I guess you'll have to go and abruptly close the underlying > socket to unblock the deadlocked query thread - which means you need to > have some way to correlate threads with sockets. And you kill the > connection by closing down the socket, obviously. > > Seems like it would get a bit complicated.. OK, I'll consider. Thanks for your advice. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
On Wed, Jan 21, 2009 at 8:35 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
Well, that's actually the client's thread. There isn't really a "monitor" thread.
Dave
Yeh, I meant "the thread that executes the code for that session".
On Thu, 2009-01-22 at 00:38 +1300, Oliver Jowett wrote:
> Note that there is no single "main thread"
Well, that's actually the client's thread. There isn't really a "monitor" thread.
Dave
On Wed, 2009-01-21 at 09:01 -0500, Dave Cramer wrote: > On Wed, Jan 21, 2009 at 8:35 AM, Simon Riggs <simon@2ndquadrant.com> > wrote: > > On Thu, 2009-01-22 at 00:38 +1300, Oliver Jowett wrote: > > Note that there is no single "main thread" > Yeh, I meant "the thread that executes the code for that > session". > Well, that's actually the client's thread. There isn't really a > "monitor" thread. Yup, I know how it works. Just trying to find language to show that the main code is executed by one thread for each client, hence "main thread", but yes that is the client's thread. I understand there currently is not something that we might call a monitor thread, but I was interested in creating one to allow us to independently inspect the state of the client threads. Watcher, monitor thread, call it whatever, but an objective observer of what is occurring to allow us to record information and take action. I would prefer to have a worker (client) thread and a watcher thread than to break the task of the client thread into two. Anyway, if you guys can see where I'm coming from then that's good. The important thing for me is the objective, not any specific design: being able to confirm using automated logging suitable for use in a busy production system that the client/server interaction is not deadlocked, so we can argue reasonably that the search for root cause of problems can exclude (or not) Postgres related code. JDBC is the most important client, IMHO. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
On Wed, 2009-01-21 at 13:17 +0100, Daniel Migowski wrote: > I don't know if there is a more elegant way to handle this, but for us > it works. Maybe for you, too. As far as i know (which is not much > about the driver internals) this is the only way to make the driver > hang. Thank you. This seems like simple, practical advice. My "watcher" ideas seem like they would take too long and at best would only identify a problem, not solve it. Thanks to those that helped out there also. If the only known way of making the driver hang is running out of buffers then the best way to react is to increase the buffer allocation. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Simon Riggs wrote: > On Wed, 2009-01-21 at 13:17 +0100, Daniel Migowski wrote: > >> I don't know if there is a more elegant way to handle this, but for us >> it works. Maybe for you, too. As far as i know (which is not much >> about the driver internals) this is the only way to make the driver >> hang. > > Thank you. This seems like simple, practical advice. > > My "watcher" ideas seem like they would take too long and at best would > only identify a problem, not solve it. Thanks to those that helped out > there also. > > If the only known way of making the driver hang is running out of > buffers then the best way to react is to increase the buffer allocation. Well, the problem is that there's no "right" value for the buffer size with the current strategy. The default works for all but the most extreme cases (like Daniel's 100,000 INSERTs), but raising the default just means that you need a larger set of queries to trigger it. I have vague memories that another way to trigger the deadlock was to have a relatively small number of queries that generated a lot of NOTIFYs or, perhaps, server warning messages? Can't remember the details, but it was an unusual case involving triggers. I have a bit of time spare today, I might look at putting together that OutputStream wrapper. -O
On Thu, 2009-01-22 at 13:19 +1300, Oliver Jowett wrote: > Simon Riggs wrote: > > If the only known way of making the driver hang is running out of > > buffers then the best way to react is to increase the buffer allocation. > > Well, the problem is that there's no "right" value for the buffer size > with the current strategy. The default works for all but the most > extreme cases (like Daniel's 100,000 INSERTs), but raising the default > just means that you need a larger set of queries to trigger it. > > I have vague memories that another way to trigger the deadlock was to > have a relatively small number of queries that generated a lot of > NOTIFYs or, perhaps, server warning messages? Can't remember the > details, but it was an unusual case involving triggers. > > I have a bit of time spare today, I might look at putting together that > OutputStream wrapper. That would be good. Thanks. There was another hang earlier today. One session just went idle in transaction on server right in the middle of a long transaction. 1500 statements in about a second on the session, then nothing, while holding locks. People are adamant it is not the application; much checking has been done. I've got this strange feeling that this happens a lot, but DBAs ignore it as some quirk the application guys have missed. And that's why we have calls for an idle in transaction timeout and requests to support cancelling them. (I'm want both but I wonder about the reasons why). -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Oliver Jowett wrote: > I have a bit of time spare today, I might look at putting together that > OutputStream wrapper. Try this. I have not tested at all - it compiles but that's as far as I got - but it should give you an idea of what I had in mind. -O ? org/postgresql/core/AntiDeadlockStream.java Index: org/postgresql/core/PGStream.java =================================================================== RCS file: /cvsroot/jdbc/pgjdbc/org/postgresql/core/PGStream.java,v retrieving revision 1.22 diff -u -r1.22 PGStream.java --- org/postgresql/core/PGStream.java 8 Jan 2008 06:56:27 -0000 1.22 +++ org/postgresql/core/PGStream.java 22 Jan 2009 01:10:13 -0000 @@ -34,6 +34,7 @@ { private final String host; private final int port; + private final boolean antiDeadlock; private final byte[] _int4buf; private final byte[] _int2buf; @@ -52,12 +53,14 @@ * * @param host the hostname to connect to * @param port the port number that the postmaster is sitting on + * @param antiDeadlock true to insert an anti-deadlock outputstream * @exception IOException if an IOException occurs below it. */ - public PGStream(String host, int port) throws IOException + public PGStream(String host, int port, boolean antiDeadlock) throws IOException { this.host = host; this.port = port; + this.antiDeadlock = antiDeadlock; changeSocket(new Socket(host, port)); setEncoding(Encoding.getJVMEncoding("US-ASCII")); @@ -74,6 +77,10 @@ return port; } + public boolean getAntiDeadlock() { + return antiDeadlock; + } + public Socket getSocket() { return connection; } @@ -110,6 +117,8 @@ // Buffer sizes submitted by Sverre H Huseby <sverrehu@online.no> pg_input = new VisibleBufferedInputStream(connection.getInputStream(), 8192); pg_output = new BufferedOutputStream(connection.getOutputStream(), 8192); + if (antiDeadlock) + pg_output = new AntiDeadlockStream(pg_output, 8192, 30000); if (encoding != null) setEncoding(encoding); Index: org/postgresql/core/v2/ConnectionFactoryImpl.java =================================================================== RCS file: /cvsroot/jdbc/pgjdbc/org/postgresql/core/v2/ConnectionFactoryImpl.java,v retrieving revision 1.17 diff -u -r1.17 ConnectionFactoryImpl.java --- org/postgresql/core/v2/ConnectionFactoryImpl.java 30 Sep 2008 03:42:48 -0000 1.17 +++ org/postgresql/core/v2/ConnectionFactoryImpl.java 22 Jan 2009 01:10:14 -0000 @@ -59,7 +59,7 @@ PGStream newStream = null; try { - newStream = new PGStream(host, port); + newStream = new PGStream(host, port, Boolean.valueOf(info.getProperty("antiDeadlock")).booleanValue()); // Construct and send an ssl startup packet if requested. if (trySSL) @@ -147,7 +147,7 @@ // We have to reconnect to continue. pgStream.close(); - return new PGStream(pgStream.getHost(), pgStream.getPort()); + return new PGStream(pgStream.getHost(), pgStream.getPort(), pgStream.getAntiDeadlock()); case 'N': if (logger.logDebug()) Index: org/postgresql/core/v2/ProtocolConnectionImpl.java =================================================================== RCS file: /cvsroot/jdbc/pgjdbc/org/postgresql/core/v2/ProtocolConnectionImpl.java,v retrieving revision 1.12 diff -u -r1.12 ProtocolConnectionImpl.java --- org/postgresql/core/v2/ProtocolConnectionImpl.java 1 Apr 2008 07:19:20 -0000 1.12 +++ org/postgresql/core/v2/ProtocolConnectionImpl.java 22 Jan 2009 01:10:14 -0000 @@ -90,7 +90,7 @@ if (logger.logDebug()) logger.debug(" FE=> CancelRequest(pid=" + cancelPid + ",ckey=" + cancelKey + ")"); - cancelStream = new PGStream(pgStream.getHost(), pgStream.getPort()); + cancelStream = new PGStream(pgStream.getHost(), pgStream.getPort(), false); cancelStream.SendInteger4(16); cancelStream.SendInteger2(1234); cancelStream.SendInteger2(5678); Index: org/postgresql/core/v3/ConnectionFactoryImpl.java =================================================================== RCS file: /cvsroot/jdbc/pgjdbc/org/postgresql/core/v3/ConnectionFactoryImpl.java,v retrieving revision 1.19 diff -u -r1.19 ConnectionFactoryImpl.java --- org/postgresql/core/v3/ConnectionFactoryImpl.java 29 Nov 2008 07:40:30 -0000 1.19 +++ org/postgresql/core/v3/ConnectionFactoryImpl.java 22 Jan 2009 01:10:14 -0000 @@ -73,7 +73,7 @@ PGStream newStream = null; try { - newStream = new PGStream(host, port); + newStream = new PGStream(host, port, Boolean.valueOf(info.getProperty("antiDeadlock")).booleanValue()); // Construct and send an ssl startup packet if requested. if (trySSL) @@ -178,7 +178,7 @@ // We have to reconnect to continue. pgStream.close(); - return new PGStream(pgStream.getHost(), pgStream.getPort()); + return new PGStream(pgStream.getHost(), pgStream.getPort(), pgStream.getAntiDeadlock()); case 'N': if (logger.logDebug()) Index: org/postgresql/core/v3/ProtocolConnectionImpl.java =================================================================== RCS file: /cvsroot/jdbc/pgjdbc/org/postgresql/core/v3/ProtocolConnectionImpl.java,v retrieving revision 1.13 diff -u -r1.13 ProtocolConnectionImpl.java --- org/postgresql/core/v3/ProtocolConnectionImpl.java 1 Apr 2008 07:19:20 -0000 1.13 +++ org/postgresql/core/v3/ProtocolConnectionImpl.java 22 Jan 2009 01:10:14 -0000 @@ -90,7 +90,7 @@ if (logger.logDebug()) logger.debug(" FE=> CancelRequest(pid=" + cancelPid + ",ckey=" + cancelKey + ")"); - cancelStream = new PGStream(pgStream.getHost(), pgStream.getPort()); + cancelStream = new PGStream(pgStream.getHost(), pgStream.getPort(), false); cancelStream.SendInteger4(16); cancelStream.SendInteger2(1234); cancelStream.SendInteger2(5678); package org.postgresql.core; import java.io.*; /** * Temporary hack to try to detect/avoid socket deadlocks caused * by blocking on write while we have lots of pending data to read * from the server (i.e. the server is also blocked on write). * * see the thread at http://archives.postgresql.org/pgsql-jdbc/2009-01/msg00045.php * * @author Oliver Jowett <oliver@opencloud.com> */ class AntiDeadlockStream extends OutputStream implements Runnable { private static final class BufferLock {} private final BufferLock bufferLock = new BufferLock(); private final OutputStream wrapped; private final long flushTimeout; private byte[] buffer; private int bufferSize; private byte[] swapBuffer; private boolean closeRequest; private boolean flushRequest; private boolean closeComplete; private IOException failedException; AntiDeadlockStream(OutputStream wrapped, int initialSize, long flushTimeout) { this.wrapped = wrapped; this.flushTimeout = flushTimeout; this.buffer = new byte[initialSize]; this.swapBuffer = new byte[initialSize]; new Thread(this, "AntiDeadlock thread").start(); } public void close() throws IOException { synchronized (bufferLock) { closeRequest = true; bufferLock.notifyAll(); while (!closeComplete) { if (failedException != null) throw (IOException) (new IOException("Write thread reported an error").initCause(failedException)); try { bufferLock.wait(); } catch (InterruptedException ie) { throw new InterruptedIOException(); } } } } public void flush() throws IOException { synchronized (bufferLock) { long expiry = -1; flushRequest = true; bufferLock.notifyAll(); while (true) { if (failedException != null) throw (IOException) (new IOException("Write thread reported an error").initCause(failedException)); if (closeRequest) throw new IOException("Stream is closed"); if (bufferSize == 0) return; long delay; if (expiry == -1) { delay = flushTimeout; expiry = System.currentTimeMillis() + delay; } else { delay = expiry - System.currentTimeMillis(); } if (delay <= 0) { System.err.println("Warning: possible socket deadlock detected (timeout=" + flushTimeout + ", remainingbuffer=" + bufferSize); new Throwable("Deadlock call stack").fillInStackTrace().printStackTrace(System.err); return; } try { bufferLock.wait(delay); } catch (InterruptedException ie) { throw new InterruptedIOException(); } } } } public void write(int b) throws IOException { write(new byte[] { (byte)b }, 0, 1); } public void write(byte[] b) throws IOException { write(b, 0, b.length); } public void write(byte[] b, int off, int len) throws IOException { if (b == null) throw new NullPointerException(); if (off < 0 || len < 0 || off+len > b.length) throw new IndexOutOfBoundsException(); synchronized (bufferLock) { if (closeRequest) throw new IOException("Stream is closed"); if (failedException != null) throw (IOException) (new IOException("Write thread reported an error").initCause(failedException)); int needs = bufferSize + len; int newSize = buffer.length; while (newSize < needs) newSize *= 2; if (newSize != buffer.length) { byte[] newBuffer = new byte[newSize]; System.arraycopy(buffer, 0, newBuffer, 0, bufferSize); buffer = newBuffer; } if (bufferSize == 0) bufferLock.notifyAll(); System.arraycopy(b, off, buffer, bufferSize, len); bufferSize += len; } } // // Runnable // public void run() { while (true) { boolean doFlush; boolean doClose; int writeLength; synchronized (bufferLock) { if (bufferSize == 0 && !closeRequest && !flushRequest) { try { bufferLock.wait(); } catch (InterruptedException ie) { failedException = new InterruptedIOException("write thread interrupted"); bufferLock.notifyAll(); return; } continue; } byte[] oldBuffer = buffer; buffer = swapBuffer; swapBuffer = buffer; writeLength = bufferSize; doFlush = flushRequest; doClose = closeRequest; flushRequest = false; bufferLock.notifyAll(); } try { if (writeLength > 0) wrapped.write(swapBuffer, 0, writeLength); if (flushRequest) wrapped.flush(); if (closeRequest) { wrapped.close(); synchronized (bufferLock) { closeComplete = true; bufferLock.notifyAll(); } return; } } catch (IOException ioe) { synchronized (bufferLock) { failedException = ioe; bufferLock.notifyAll(); try { wrapped.close(); } catch (IOException ioe2) { // Ignore it. } return; } } } } }
Oliver Jowett wrote: > Oliver Jowett wrote: > >> I have a bit of time spare today, I might look at putting together that >> OutputStream wrapper. > > Try this. I have not tested at all - it compiles but that's as far as I > got - but it should give you an idea of what I had in mind. And here are the fixes for the deliberate mistakes in the first version ;-) This passes the regression testsuite with antiDeadlock=true, but I don't have a deadlock testcase on hand to try unfortunately. -O --- crypt/pgjdbc/org/postgresql/core/AntiDeadlockStream.java.orig 2009-01-22 14:47:06.000000000 +1300 +++ crypt/pgjdbc/org/postgresql/core/AntiDeadlockStream.java 2009-01-22 14:44:47.000000000 +1300 @@ -159,23 +159,23 @@ byte[] oldBuffer = buffer; buffer = swapBuffer; - swapBuffer = buffer; + swapBuffer = oldBuffer; writeLength = bufferSize; doFlush = flushRequest; doClose = closeRequest; flushRequest = false; - + bufferSize = 0; bufferLock.notifyAll(); } try { if (writeLength > 0) wrapped.write(swapBuffer, 0, writeLength); - if (flushRequest) + if (doFlush) wrapped.flush(); - if (closeRequest) { + if (doClose) { wrapped.close(); synchronized (bufferLock) {
On Thu, 22 Jan 2009, Oliver Jowett wrote: > Oliver Jowett wrote: > > This passes the regression testsuite with antiDeadlock=true, but I don't > have a deadlock testcase on hand to try unfortunately. > This has a test case which deadlocks consistently: http://archives.postgresql.org/pgsql-jdbc/2008-10/msg00045.php Kris Jurka
Kris Jurka wrote: > > > On Thu, 22 Jan 2009, Oliver Jowett wrote: > >> Oliver Jowett wrote: >> >> This passes the regression testsuite with antiDeadlock=true, but I don't >> have a deadlock testcase on hand to try unfortunately. >> > > This has a test case which deadlocks consistently: > > http://archives.postgresql.org/pgsql-jdbc/2008-10/msg00045.php Perfect, thanks My changes seem to pass BatchDeadLock with antiDeadlock=true (and still hang with antiDeadlock=false): > 14:59:45.233 (1) FE=> Execute(portal=null,limit=1) > 14:59:45.233 (1) FE=> Sync > Warning: possible socket deadlock detected (timeout=30000, remaining buffer=1299975 > java.lang.Throwable: Deadlock call stack > at org.postgresql.core.AntiDeadlockStream.flush(AntiDeadlockStream.java:83) > at org.postgresql.core.PGStream.flush(PGStream.java:517) > at org.postgresql.core.v3.QueryExecutorImpl.sendSync(QueryExecutorImpl.java:692) > at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:342) > at org.postgresql.jdbc2.AbstractJdbc2Statement.executeBatch(AbstractJdbc2Statement.java:2693) > at BatchDeadLock.main(BatchDeadLock.java:40) > 15:00:15.234 (1) <=BE ParseComplete [S_1] > 15:00:15.234 (1) <=BE ParameterDescription > 15:00:15.235 (1) <=BE NoData > 15:00:15.235 (1) <=BE BindComplete [null] > 15:00:15.235 (1) <=BE CommandStatus(INSERT 0 1) > 15:00:15.235 (1) <=BE ParseComplete [S_2] > 15:00:15.235 (1) <=BE ParameterDescription > 15:00:15.235 (1) <=BE NoData > [.. etc ..] -O
Simon Riggs wrote: > There was another hang earlier today. One session just went idle in > transaction on server right in the middle of a long transaction. 1500 > statements in about a second on the session, then nothing, while holding > locks. People are adamant it is not the application; much checking has > been done. It's worth noting that this does not sound like the socket deadlock case. In the socket deadlock case, the backend will be stuck in PARSE or something similar, blocked trying to write to the JDBC client. -O
We came across this thread today researching our issue. Tersely as possible: JBoss 4.2.3, Solaris 10 (x64), PG 8.2 (with GIS), Hibernate/EJB2. We were storing a GIS column as a LOB in WKT format (e.g., POINT(23.22 23.22)) and ocassionly would have apparent transactions timeout trying to insert into this table. (turns out that a T doesn't timeout per se, it just never finishes and the timeout our customers see are Ajax/session related). - JBoss logs would show the transaction committing from Hibernate. - Thread dumps on the JVM would show the stuck thread in "http-0.0.0.0-8443-4" daemon prio=3 tid=0x00000000019ee400 nid=0x70 runnable [0xfffffd7eab923000..0xfffffd7eab9268a0] java.lang.Thread.State: RUNNABLE at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) - locked <0xfffffd7fad49c858> (a java.io.BufferedOutputStream) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.postgresql.core.PGStream.Send(PGStream.java:208) at org.postgresql.core.v3.SimpleParameterList.writeV3Value(SimpleParameterList.java:258) at org.postgresql.core.v3.QueryExecutorImpl.sendBind(QueryExecutorImpl.java:861) at org.postgresql.core.v3.QueryExecutorImpl.sendOneQuery(QueryExecutorImpl.java:1052) at org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:643) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:344) - locked <0xfffffd7fad4a1af8> (a org.postgresql.core.v3.QueryExecutorImpl) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeBatch(AbstractJdbc2Statement.java:2592) at sun.reflect.GeneratedMethodAccessor152.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.postgresql.ds.jdbc23.AbstractJdbc23PooledConnection$StatementHandler.invoke(AbstractJdbc23PooledConnection.java:471) at $Proxy291.executeBatch(Unknown Source) at org.jboss.resource.adapter.jdbc.WrappedStatement.executeBatch(WrappedStatement.java:774) at org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:48) at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:246) at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:266) at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:167) at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:298) ... (no other monitor/blocks - ONLY thread "running" at the time!) - PGAdmin3 server stats/locks would show a slew of locks on this table (no other locks, so no traditional deadlock per se). - truss(1M) on the postgres pid holding the locks showed: % truss -p 14713 send(9, 0x082FBF00, 8192, 0) (sleeping...) pfiles(1M) on this pid showed that fd9 was this same socket connected to our JVM and blocked clearly (at least to us) - the database was writing to this socket (seen from truss) and the jvm was writing to this socket (seen from jvm stack) - netstat(1M) showed no send/recvQ data on this socket - PG log showed -- query was simple insert into FeatureName(xxx) values ($1, ...) -- the geom text/LOB had a 22K bind (large country MULTIPOLYGON) WORKAROUND We're going to remove the string (we determined our client app doesn't use the geom for this table) or at least convert it to the binary version long term when we start using it again. WAY FORWARD We're planning on upgrading to 8.3.17 (? whatever is the latest) and hope that this deadlock in the driver doesn't happen again. Just posting this if it might be useful for your testing. -- View this message in context: http://www.nabble.com/Deadlock-detection-tp21580039p23067564.html Sent from the PostgreSQL - jdbc mailing list archive at Nabble.com.
On Thu, 22 Jan 2009, Oliver Jowett wrote: > Kris Jurka wrote: >> >> On Thu, 22 Jan 2009, Oliver Jowett wrote: >>> >>> This passes the regression testsuite with antiDeadlock=true, but I don't >>> have a deadlock testcase on hand to try unfortunately. >>> >> >> This has a test case which deadlocks consistently: >> >> http://archives.postgresql.org/pgsql-jdbc/2008-10/msg00045.php > > Perfect, thanks > > My changes seem to pass BatchDeadLock with antiDeadlock=true (and still > hang with antiDeadlock=false): I'm not really sure what to do with this code. It does two different things and what is does best really wasn't the thrust of this thread. 1) It avoids deadlocks by buffering the writes in local memory. That's great, but wasn't really the point of the thread. Would we suggest people run with antiDeadlock=true in a production environment to avoid deadlocks? If so, why wouldn't it be the default. 2) It warns when a deadlock has potentially occurred with stack trace information to provide the call location. That's good, but it can't be 100% accurate. With a small timeout it may warn when there wouldn't be an actual deadlock and with a large timeout it may not warn when the call would have deadlocked without the intermediate buffer. Having the intermediate buffer means that you aren't testing the same thing, so a success with antiDeadlock=true has no implications as to whether your application will deadlock with it turned off. Previously it was discussed that you'd really want to cap the buffer size, so perhaps relating the maximum buffer size to Socket.getSendBufferSize might make the results of testing with antiDeadlock more representative of the behavior with it turned off. So it's a useful piece of code, but I'm not sure what context it should really be used in. Is anyone using this? For what purpose? Kris Jurka
Kris Jurka wrote: > So it's a useful piece of code, but I'm not sure what context it should > really be used in. Is anyone using this? For what purpose? I'm not using it, it was just a quick hack to try to narrow down a deadlock problem IIRC .. -O
On Mon, 2009-06-01 at 19:30 -0400, Kris Jurka wrote: > 1) It avoids deadlocks by buffering the writes in local memory. That's > great, but wasn't really the point of the thread. Would we suggest people > run with antiDeadlock=true in a production environment to avoid deadlocks? > If so, why wouldn't it be the default. > So it's a useful piece of code, but I'm not sure what context it should > really be used in. Is anyone using this? For what purpose? The reason the code was written went away before the code could sensibly be used. We removed the contention in the database by redesign rather than by JDBC changes. Thanks to Oliver for taking the time. Having said that, it is clear there were some issues there that are not fully explained. It may be that it is a JDBC driver issue, but also it may be a protocol/handling issue. Intermittent issues are hard to get a full state on to debug, and we are left looking at spoor or wishing we had some. I would like to see code added to the driver for (1), or other code. My main request is that we have some way of responding to the problems that we know can exist. The main argument for not buffering everything would be performance, I think. 99.9% of users are happy as is, so to act in a way that effects them would be bad. Sometimes, a "safe mode" would be a useful thing to be able to enable, as a way of eliminating doubt. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support