Thread: Deadlock detection

Deadlock detection

From
Simon Riggs
Date:
As we know from recent versions, the JDBC driver can deadlock. We also
know that the deadlock detector as exists does not detect all types of
deadlock and isn't foolproof at that either. Not criticism, just
summarising.

We've recently tried to diagnose some sporadic hang situations and would
love to be able to rule out the JDBC driver. This has not yet been
possible. (Substantial work has already been done server side, so this
is not finger pointing, just working backwards). Currently we can take
stack traces, but knowing when to take them is key and since in problem
situations the priority is often on cure rather than prevention it has
not so far been possible to acquire useful diagnostic information.

It is possible that the cases under investigation are linked to the
occasional reports we get on various lists about idle-in-transaction. On
server side, I'm expecting to implement SIGINT to cancel
idle-in-transaction sessions. This may just further bury any problems in
the driver.

The best solution would be to have a logging mode that would advise us
of the presence of a deadlock and log details for later analysis. If
this worked in a objective way that could be sure of detecting the
general case we would be in a much better situation. We may discover
some unknown cases, though if not we can at least rule out the driver
from these situations and we know how important that can be in complex
failure analysis.

With me so far? Perhaps, perhaps not.

I'm thinking of some simple bookkeeping, so that before any session
issues a blockable call it takes note that it is doing so in a shared
array, and clears that state again once released. A single watcher
thread can then regularly scan the state array and see if a session is
marked as waiting on both send and receive. Side threads are already
used for login timeout handling, so that mechanism is already in use.
Some loss of performance would be acceptable.

Would you agree this would be worthwhile? Any practical blockers? If it
is unhelpful in this specific case, would that feature be generally
usable?

If not, other suggestions and viewpoints welcome. Thanks,

You might say the best way is to fix the underlying architecture. May
be, but we are a long way from that and even if we did, some second and
third order effects would still exist for which we need new log modes to
diagnose correctly.

Is there a deadline for the JDBC driver for 8.4? (When was it?)

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: Deadlock detection

From
Oliver Jowett
Date:
Simon Riggs wrote:

> I'm thinking of some simple bookkeeping, so that before any session
> issues a blockable call it takes note that it is doing so in a shared
> array, and clears that state again once released. A single watcher
> thread can then regularly scan the state array and see if a session is
> marked as waiting on both send and receive. Side threads are already
> used for login timeout handling, so that mechanism is already in use.
> Some loss of performance would be acceptable.

I don't see how this can work. The whole point is that if we're blocked
on send, we're *not* blocked on receive by definition.

-O

Re: Deadlock detection

From
Oliver Jowett
Date:
Simon Riggs wrote:

> If not, other suggestions and viewpoints welcome. Thanks,

If you want a minimally invasive approach (in terms of rewriting driver
code) and some extra overhead isn't a problem then I would suggest
something like this:

(1) spawn a separate write thread for each connection; it reads from an
unbounded buffer

(2)instead of writes from the thread running a query going directly to
the real stream, they instead go into the buffer (without blocking), and
the corresponding write thread picks them up for immediate writing

(3) implement OutputStream.flush() to wait for the buffer to empty,
*with a timeout*. If you hit the timeout while waiting to flush, then
you've detected a potential deadlock case and can dump out some state
and then continue anyway.

The usual behaviour of the driver is (approximately): write some query
data, write a flush or sync message, flush the socket output stream,
start to read results. In the deadlock case, we run out of (socket)
write buffer space at some point before completing the stream flush, and
the socket buffer never empties because the server is blocked trying to
write to us.

With the above changes, in that case the write thread will block and the
internal Java-side write buffer will grow. Then OutputStream.flush()
will block and eventually timeout, and the deadlock should be broken.

The obvious costs of this are the extra thread per connection, the
context switches needed to actually do the writes, and however much
space you need for all the write buffers.

Turn OutputStream.flush() into a no-op and you actually have a fix for
the deadlock case entirely - it's just not a very cheap fix.

You may want to put an upper bound on the internal buffer size and throw
IOException if you hit it. I suppose that a connection error is better
than either a silent deadlock or exhausting your heap..

-O

Re: Deadlock detection

From
Oliver Jowett
Date:
Oliver Jowett wrote:

> If you want a minimally invasive approach (in terms of rewriting driver
> code) and some extra overhead isn't a problem then I would suggest
> something like this:
>
> (1) spawn a separate write thread for each connection; it reads from an
> unbounded buffer
>
> (2)instead of writes from the thread running a query going directly to
> the real stream, they instead go into the buffer (without blocking), and
> the corresponding write thread picks them up for immediate writing
>
> (3) implement OutputStream.flush() to wait for the buffer to empty,
> *with a timeout*. If you hit the timeout while waiting to flush, then
> you've detected a potential deadlock case and can dump out some state
> and then continue anyway.

Also, if it wasn't obvious, I'm thinking you do this by a custom
OutputStream implementation that just sits on top of the real stream;
you don't need to change any of the callers so it's just a bit of setup
code then the vast majority of the driver guts are unchanged. You could
even make it conditional on a connection parameter ..

-O


Re: Deadlock detection

From
Oliver Jowett
Date:
Simon Riggs wrote:
> As we know from recent versions, the JDBC driver can deadlock. We also
> know that the deadlock detector as exists does not detect all types of
> deadlock and isn't foolproof at that either. Not criticism, just
> summarising.

Do you mean the potential read-write socket deadlock (where we're
blocked on write waiting for the server to consume some data, but it's
blocked waiting for us to consume some data), or something else?

-O

Re: Deadlock detection

From
Oliver Jowett
Date:
Simon Riggs wrote:

> I think your proposal would work, but sounds fairly invasive. I'm
> concerned that if we interfere with the main line of processing we'll
> just introduce a new set of bugs that won't move us forwards. I would
> prefer a "watcher" approach, which doesn't directly interfere with the
> main line of processing and can be turned on/off.  Perhaps it isn't
> possible?

It's not really invasive at all, you need an OutputStream implementation
(that is completely new code and separate from the rest of the driver)
and a small amount of setup code when first establishing the connection
to hook it in as the stream that's actually written to. The setup code
can be conditional. The rest of the driver doesn't change at all.

> How about a watcher thread that waits for main thread to respond, if no
> response within a timeout it wakes.

That could work. Note that there is no single "main thread", it's just
whatever application threads happen to be running queries. You would
need to arm the watchdog before any write or flush, and disarm it before
reading or returning to the client (via a normal return or an
exception). If the watchdog on a particular thread stays armed for more
than some period, you trigger a warning.

It might be tricky to get useful state out of the driver when that timer
goes off - you'll be in the wrong thread to get a useful stacktrace at
that point, I guess you'll have to go and abruptly close the underlying
socket to unblock the deadlocked query thread - which means you need to
have some way to correlate threads with sockets. And you kill the
connection by closing down the socket, obviously.

Seems like it would get a bit complicated..

-O

Re: Deadlock detection

From
Simon Riggs
Date:
On Wed, 2009-01-21 at 23:38 +1300, Oliver Jowett wrote:
> Simon Riggs wrote:
>
> > If not, other suggestions and viewpoints welcome. Thanks,
>
> If you want a minimally invasive approach (in terms of rewriting driver
> code) and some extra overhead isn't a problem then I would suggest
> something like this:

...

> The obvious costs of this are the extra thread per connection, the
> context switches needed to actually do the writes, and however much
> space you need for all the write buffers.

Thanks very much for such a detailed design.

I'd like to identify situations where deadlocks occur so we can fix
them, or confirm they haven't happened at all.

I think your proposal would work, but sounds fairly invasive. I'm
concerned that if we interfere with the main line of processing we'll
just introduce a new set of bugs that won't move us forwards. I would
prefer a "watcher" approach, which doesn't directly interfere with the
main line of processing and can be turned on/off.  Perhaps it isn't
possible?

How about a watcher thread that waits for main thread to respond, if no
response within a timeout it wakes. We then pass information to another
agent which can compare against server-side information and identify
deadlocks. A client/server deadlock detector, rather than within the
driver as I described before.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: Deadlock detection

From
Daniel Migowski
Date:
Simon Riggs schrieb:
> As we know from recent versions, the JDBC driver can deadlock. We also
> know that the deadlock detector as exists does not detect all types of
> deadlock and isn't foolproof at that either. Not criticism, just
> summarising.
>
I don't know if this of help for you, but one case of deadlocking can
occur because of filled stream buffers on both ends of the connections.
I had a similar situation where I issued a large script with 100.000
INSERT statements to the database by JDBC. Those Statements all returned
their answer bytes after each invokation on the server, then those bytes
filled the client receive buffer, after that the server send buffer got
filled, and then the server process got blocked. Now the receive buffer
of the server fills, and then the client send buffer.

I "solved" the problem for scripts of our size by drastically increasing
the input buffer size on the client side. I modified PGStream.java to
include the following line in "changeConnection()":

        connection.setReceiveBufferSize(256*1024);

I don't know if there is a more elegant way to handle this, but for us
it works. Maybe for you, too. As far as i know (which is not much about
the driver internals) this is the only way to make the driver hang.

With best regards,
Daniel Migowski


Re: Deadlock detection

From
Simon Riggs
Date:
On Thu, 2009-01-22 at 00:38 +1300, Oliver Jowett wrote:
> Note that there is no single "main thread"

Yeh, I meant "the thread that executes the code for that session".

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: Deadlock detection

From
Simon Riggs
Date:
On Thu, 2009-01-22 at 00:38 +1300, Oliver Jowett wrote:
> Simon Riggs wrote:
>
> > I think your proposal would work, but sounds fairly invasive. I'm
> > concerned that if we interfere with the main line of processing we'll
> > just introduce a new set of bugs that won't move us forwards. I would
> > prefer a "watcher" approach, which doesn't directly interfere with the
> > main line of processing and can be turned on/off.  Perhaps it isn't
> > possible?
>
> It's not really invasive at all, you need an OutputStream implementation
> (that is completely new code and separate from the rest of the driver)
> and a small amount of setup code when first establishing the connection
> to hook it in as the stream that's actually written to. The setup code
> can be conditional. The rest of the driver doesn't change at all.
>
> > How about a watcher thread that waits for main thread to respond, if no
> > response within a timeout it wakes.
>
> That could work. Note that there is no single "main thread", it's just
> whatever application threads happen to be running queries. You would
> need to arm the watchdog before any write or flush, and disarm it before
> reading or returning to the client (via a normal return or an
> exception). If the watchdog on a particular thread stays armed for more
> than some period, you trigger a warning.
>
> It might be tricky to get useful state out of the driver when that timer
> goes off - you'll be in the wrong thread to get a useful stacktrace at
> that point, I guess you'll have to go and abruptly close the underlying
> socket to unblock the deadlocked query thread - which means you need to
> have some way to correlate threads with sockets. And you kill the
> connection by closing down the socket, obviously.
>
> Seems like it would get a bit complicated..

OK, I'll consider. Thanks for your advice.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: Deadlock detection

From
Dave Cramer
Date:


On Wed, Jan 21, 2009 at 8:35 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Thu, 2009-01-22 at 00:38 +1300, Oliver Jowett wrote:
> Note that there is no single "main thread"

Yeh, I meant "the thread that executes the code for that session".


Well, that's actually the client's thread. There isn't really a "monitor" thread.

Dave

Re: Deadlock detection

From
Simon Riggs
Date:
On Wed, 2009-01-21 at 09:01 -0500, Dave Cramer wrote:

> On Wed, Jan 21, 2009 at 8:35 AM, Simon Riggs <simon@2ndquadrant.com>
> wrote:
>
>         On Thu, 2009-01-22 at 00:38 +1300, Oliver Jowett wrote:
>         > Note that there is no single "main thread"

>         Yeh, I meant "the thread that executes the code for that
>         session".

> Well, that's actually the client's thread. There isn't really a
> "monitor" thread.

Yup, I know how it works. Just trying to find language to show that the
main code is executed by one thread for each client, hence "main
thread", but yes that is the client's thread.

I understand there currently is not something that we might call a
monitor thread, but I was interested in creating one to allow us to
independently inspect the state of the client threads. Watcher, monitor
thread, call it whatever, but an objective observer of what is occurring
to allow us to record information and take action.

I would prefer to have a worker (client) thread and a watcher thread
than to break the task of the client thread into two.

Anyway, if you guys can see where I'm coming from then that's good. The
important thing for me is the objective, not any specific design: being
able to confirm using automated logging suitable for use in a busy
production system that the client/server interaction is not deadlocked,
so we can argue reasonably that the search for root cause of problems
can exclude (or not) Postgres related code. JDBC is the most important
client, IMHO.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: Deadlock detection

From
Simon Riggs
Date:
On Wed, 2009-01-21 at 13:17 +0100, Daniel Migowski wrote:

> I don't know if there is a more elegant way to handle this, but for us
> it works. Maybe for you, too. As far as i know (which is not much
> about the driver internals) this is the only way to make the driver
> hang.

Thank you. This seems like simple, practical advice.

My "watcher" ideas seem like they would take too long and at best would
only identify a problem, not solve it. Thanks to those that helped out
there also.

If the only known way of making the driver hang is running out of
buffers then the best way to react is to increase the buffer allocation.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: Deadlock detection

From
Oliver Jowett
Date:
Simon Riggs wrote:
> On Wed, 2009-01-21 at 13:17 +0100, Daniel Migowski wrote:
>
>> I don't know if there is a more elegant way to handle this, but for us
>> it works. Maybe for you, too. As far as i know (which is not much
>> about the driver internals) this is the only way to make the driver
>> hang.
>
> Thank you. This seems like simple, practical advice.
>
> My "watcher" ideas seem like they would take too long and at best would
> only identify a problem, not solve it. Thanks to those that helped out
> there also.
>
> If the only known way of making the driver hang is running out of
> buffers then the best way to react is to increase the buffer allocation.

Well, the problem is that there's no "right" value for the buffer size
with the current strategy. The default works for all but the most
extreme cases (like Daniel's 100,000 INSERTs), but raising the default
just means that you need a larger set of queries to trigger it.

I have vague memories that another way to trigger the deadlock was to
have a relatively small number of queries that generated a lot of
NOTIFYs or, perhaps, server warning messages? Can't remember the
details, but it was an unusual case involving triggers.

I have a bit of time spare today, I might look at putting together that
OutputStream wrapper.

-O

Re: Deadlock detection

From
Simon Riggs
Date:
On Thu, 2009-01-22 at 13:19 +1300, Oliver Jowett wrote:
> Simon Riggs wrote:

> > If the only known way of making the driver hang is running out of
> > buffers then the best way to react is to increase the buffer allocation.
>
> Well, the problem is that there's no "right" value for the buffer size
> with the current strategy. The default works for all but the most
> extreme cases (like Daniel's 100,000 INSERTs), but raising the default
> just means that you need a larger set of queries to trigger it.
>
> I have vague memories that another way to trigger the deadlock was to
> have a relatively small number of queries that generated a lot of
> NOTIFYs or, perhaps, server warning messages? Can't remember the
> details, but it was an unusual case involving triggers.
>
> I have a bit of time spare today, I might look at putting together that
> OutputStream wrapper.

That would be good. Thanks.

There was another hang earlier today. One session just went idle in
transaction on server right in the middle of a long transaction. 1500
statements in about a second on the session, then nothing, while holding
locks. People are adamant it is not the application; much checking has
been done.

I've got this strange feeling that this happens a lot, but DBAs ignore
it as some quirk the application guys have missed. And that's why we
have calls for an idle in transaction timeout and requests to support
cancelling them. (I'm want both but I wonder about the reasons why).

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: Deadlock detection

From
Oliver Jowett
Date:
Oliver Jowett wrote:

> I have a bit of time spare today, I might look at putting together that
> OutputStream wrapper.

Try this. I have not tested at all - it compiles but that's as far as I
got - but it should give you an idea of what I had in mind.

-O
? org/postgresql/core/AntiDeadlockStream.java
Index: org/postgresql/core/PGStream.java
===================================================================
RCS file: /cvsroot/jdbc/pgjdbc/org/postgresql/core/PGStream.java,v
retrieving revision 1.22
diff -u -r1.22 PGStream.java
--- org/postgresql/core/PGStream.java    8 Jan 2008 06:56:27 -0000    1.22
+++ org/postgresql/core/PGStream.java    22 Jan 2009 01:10:13 -0000
@@ -34,6 +34,7 @@
 {
     private final String host;
     private final int port;
+    private final boolean antiDeadlock;

     private final byte[] _int4buf;
     private final byte[] _int2buf;
@@ -52,12 +53,14 @@
      *
      * @param host the hostname to connect to
      * @param port the port number that the postmaster is sitting on
+     * @param antiDeadlock true to insert an anti-deadlock outputstream
      * @exception IOException if an IOException occurs below it.
      */
-    public PGStream(String host, int port) throws IOException
+    public PGStream(String host, int port, boolean antiDeadlock) throws IOException
     {
         this.host = host;
         this.port = port;
+        this.antiDeadlock = antiDeadlock;

         changeSocket(new Socket(host, port));
         setEncoding(Encoding.getJVMEncoding("US-ASCII"));
@@ -74,6 +77,10 @@
         return port;
     }

+    public boolean getAntiDeadlock() {
+        return antiDeadlock;
+    }
+
     public Socket getSocket() {
         return connection;
     }
@@ -110,6 +117,8 @@
         // Buffer sizes submitted by Sverre H Huseby <sverrehu@online.no>
         pg_input = new VisibleBufferedInputStream(connection.getInputStream(), 8192);
         pg_output = new BufferedOutputStream(connection.getOutputStream(), 8192);
+        if (antiDeadlock)
+            pg_output = new AntiDeadlockStream(pg_output, 8192, 30000);

         if (encoding != null)
             setEncoding(encoding);
Index: org/postgresql/core/v2/ConnectionFactoryImpl.java
===================================================================
RCS file: /cvsroot/jdbc/pgjdbc/org/postgresql/core/v2/ConnectionFactoryImpl.java,v
retrieving revision 1.17
diff -u -r1.17 ConnectionFactoryImpl.java
--- org/postgresql/core/v2/ConnectionFactoryImpl.java    30 Sep 2008 03:42:48 -0000    1.17
+++ org/postgresql/core/v2/ConnectionFactoryImpl.java    22 Jan 2009 01:10:14 -0000
@@ -59,7 +59,7 @@
         PGStream newStream = null;
         try
         {
-            newStream = new PGStream(host, port);
+            newStream = new PGStream(host, port, Boolean.valueOf(info.getProperty("antiDeadlock")).booleanValue());

             // Construct and send an ssl startup packet if requested.
             if (trySSL)
@@ -147,7 +147,7 @@

             // We have to reconnect to continue.
             pgStream.close();
-            return new PGStream(pgStream.getHost(), pgStream.getPort());
+            return new PGStream(pgStream.getHost(), pgStream.getPort(), pgStream.getAntiDeadlock());

         case 'N':
             if (logger.logDebug())
Index: org/postgresql/core/v2/ProtocolConnectionImpl.java
===================================================================
RCS file: /cvsroot/jdbc/pgjdbc/org/postgresql/core/v2/ProtocolConnectionImpl.java,v
retrieving revision 1.12
diff -u -r1.12 ProtocolConnectionImpl.java
--- org/postgresql/core/v2/ProtocolConnectionImpl.java    1 Apr 2008 07:19:20 -0000    1.12
+++ org/postgresql/core/v2/ProtocolConnectionImpl.java    22 Jan 2009 01:10:14 -0000
@@ -90,7 +90,7 @@
             if (logger.logDebug())
                 logger.debug(" FE=> CancelRequest(pid=" + cancelPid + ",ckey=" + cancelKey + ")");

-            cancelStream = new PGStream(pgStream.getHost(), pgStream.getPort());
+            cancelStream = new PGStream(pgStream.getHost(), pgStream.getPort(), false);
             cancelStream.SendInteger4(16);
             cancelStream.SendInteger2(1234);
             cancelStream.SendInteger2(5678);
Index: org/postgresql/core/v3/ConnectionFactoryImpl.java
===================================================================
RCS file: /cvsroot/jdbc/pgjdbc/org/postgresql/core/v3/ConnectionFactoryImpl.java,v
retrieving revision 1.19
diff -u -r1.19 ConnectionFactoryImpl.java
--- org/postgresql/core/v3/ConnectionFactoryImpl.java    29 Nov 2008 07:40:30 -0000    1.19
+++ org/postgresql/core/v3/ConnectionFactoryImpl.java    22 Jan 2009 01:10:14 -0000
@@ -73,7 +73,7 @@
         PGStream newStream = null;
         try
         {
-            newStream = new PGStream(host, port);
+            newStream = new PGStream(host, port, Boolean.valueOf(info.getProperty("antiDeadlock")).booleanValue());

             // Construct and send an ssl startup packet if requested.
             if (trySSL)
@@ -178,7 +178,7 @@

             // We have to reconnect to continue.
             pgStream.close();
-            return new PGStream(pgStream.getHost(), pgStream.getPort());
+            return new PGStream(pgStream.getHost(), pgStream.getPort(), pgStream.getAntiDeadlock());

         case 'N':
             if (logger.logDebug())
Index: org/postgresql/core/v3/ProtocolConnectionImpl.java
===================================================================
RCS file: /cvsroot/jdbc/pgjdbc/org/postgresql/core/v3/ProtocolConnectionImpl.java,v
retrieving revision 1.13
diff -u -r1.13 ProtocolConnectionImpl.java
--- org/postgresql/core/v3/ProtocolConnectionImpl.java    1 Apr 2008 07:19:20 -0000    1.13
+++ org/postgresql/core/v3/ProtocolConnectionImpl.java    22 Jan 2009 01:10:14 -0000
@@ -90,7 +90,7 @@
             if (logger.logDebug())
                 logger.debug(" FE=> CancelRequest(pid=" + cancelPid + ",ckey=" + cancelKey + ")");

-            cancelStream = new PGStream(pgStream.getHost(), pgStream.getPort());
+            cancelStream = new PGStream(pgStream.getHost(), pgStream.getPort(), false);
             cancelStream.SendInteger4(16);
             cancelStream.SendInteger2(1234);
             cancelStream.SendInteger2(5678);
package org.postgresql.core;

import java.io.*;

/**
 * Temporary hack to try to detect/avoid socket deadlocks caused
 * by blocking on write while we have lots of pending data to read
 * from the server (i.e. the server is also blocked on write).
 *
 * see the thread at http://archives.postgresql.org/pgsql-jdbc/2009-01/msg00045.php
 *
 * @author Oliver Jowett <oliver@opencloud.com>
 */
class AntiDeadlockStream extends OutputStream implements Runnable {
    private static final class BufferLock {}
    private final BufferLock bufferLock = new BufferLock();

    private final OutputStream wrapped;
    private final long flushTimeout;

    private byte[] buffer;
    private int bufferSize;
    private byte[] swapBuffer;

    private boolean closeRequest;
    private boolean flushRequest;
    private boolean closeComplete;

    private IOException failedException;

    AntiDeadlockStream(OutputStream wrapped, int initialSize, long flushTimeout) {
        this.wrapped = wrapped;
        this.flushTimeout = flushTimeout;
        this.buffer = new byte[initialSize];
        this.swapBuffer = new byte[initialSize];

        new Thread(this, "AntiDeadlock thread").start();
    }

    public void close() throws IOException {
        synchronized (bufferLock) {
            closeRequest = true;
            bufferLock.notifyAll();

            while (!closeComplete) {
                if (failedException != null)
                    throw (IOException) (new IOException("Write thread reported an error").initCause(failedException));

                try {
                    bufferLock.wait();
                } catch (InterruptedException ie) {
                    throw new InterruptedIOException();
                }
            }
        }
    }

    public void flush() throws IOException {
        synchronized (bufferLock) {
            long expiry = -1;

            flushRequest = true;
            bufferLock.notifyAll();

            while (true) {
                if (failedException != null)
                    throw (IOException) (new IOException("Write thread reported an error").initCause(failedException));
                if (closeRequest)
                    throw new IOException("Stream is closed");
                if (bufferSize == 0)
                    return;

                long delay;
                if (expiry == -1) {
                    delay = flushTimeout;
                    expiry = System.currentTimeMillis() + delay;
                } else {
                    delay = expiry - System.currentTimeMillis();
                }

                if (delay <= 0) {
                    System.err.println("Warning: possible socket deadlock detected (timeout=" + flushTimeout + ",
remainingbuffer=" + bufferSize); 
                    new Throwable("Deadlock call stack").fillInStackTrace().printStackTrace(System.err);
                    return;
                }

                try {
                    bufferLock.wait(delay);
                } catch (InterruptedException ie) {
                    throw new InterruptedIOException();
                }
            }
        }
    }

    public void write(int b) throws IOException {
        write(new byte[] { (byte)b }, 0, 1);
    }

    public void write(byte[] b) throws IOException {
        write(b, 0, b.length);
    }

    public void write(byte[] b, int off, int len) throws IOException {
        if (b == null)
            throw new NullPointerException();

        if (off < 0 || len < 0 || off+len > b.length)
            throw new IndexOutOfBoundsException();

        synchronized (bufferLock) {
            if (closeRequest)
                throw new IOException("Stream is closed");

            if (failedException != null)
                throw (IOException) (new IOException("Write thread reported an error").initCause(failedException));

            int needs = bufferSize + len;
            int newSize = buffer.length;
            while (newSize < needs)
                newSize *= 2;

            if (newSize != buffer.length) {
                byte[] newBuffer = new byte[newSize];
                System.arraycopy(buffer, 0, newBuffer, 0, bufferSize);
                buffer = newBuffer;
            }

            if (bufferSize == 0)
                bufferLock.notifyAll();

            System.arraycopy(b, off, buffer, bufferSize, len);
            bufferSize += len;
        }
    }

    //
    // Runnable
    //

    public void run() {
        while (true) {
            boolean doFlush;
            boolean doClose;
            int writeLength;

            synchronized (bufferLock) {
                if (bufferSize == 0 && !closeRequest && !flushRequest) {
                    try {
                        bufferLock.wait();
                    } catch (InterruptedException ie) {
                        failedException = new InterruptedIOException("write thread interrupted");
                        bufferLock.notifyAll();
                        return;
                    }

                    continue;
                }

                byte[] oldBuffer = buffer;
                buffer = swapBuffer;
                swapBuffer = buffer;

                writeLength = bufferSize;
                doFlush = flushRequest;
                doClose = closeRequest;

                flushRequest = false;

                bufferLock.notifyAll();
            }

            try {
                if (writeLength > 0)
                    wrapped.write(swapBuffer, 0, writeLength);
                if (flushRequest)
                    wrapped.flush();
                if (closeRequest) {
                    wrapped.close();

                    synchronized (bufferLock) {
                        closeComplete = true;
                        bufferLock.notifyAll();
                    }

                    return;
                }
            } catch (IOException ioe) {
                synchronized (bufferLock) {
                    failedException = ioe;
                    bufferLock.notifyAll();
                    try {
                        wrapped.close();
                    } catch (IOException ioe2) {
                        // Ignore it.
                    }
                    return;
                }
            }
        }
    }
}

Re: Deadlock detection

From
Oliver Jowett
Date:
Oliver Jowett wrote:
> Oliver Jowett wrote:
>
>> I have a bit of time spare today, I might look at putting together that
>> OutputStream wrapper.
>
> Try this. I have not tested at all - it compiles but that's as far as I
> got - but it should give you an idea of what I had in mind.

And here are the fixes for the deliberate mistakes in the first version ;-)

This passes the regression testsuite with antiDeadlock=true, but I don't
have a deadlock testcase on hand to try unfortunately.

-O
--- crypt/pgjdbc/org/postgresql/core/AntiDeadlockStream.java.orig    2009-01-22 14:47:06.000000000 +1300
+++ crypt/pgjdbc/org/postgresql/core/AntiDeadlockStream.java    2009-01-22 14:44:47.000000000 +1300
@@ -159,23 +159,23 @@

                 byte[] oldBuffer = buffer;
                 buffer = swapBuffer;
-                swapBuffer = buffer;
+                swapBuffer = oldBuffer;

                 writeLength = bufferSize;
                 doFlush = flushRequest;
                 doClose = closeRequest;

                 flushRequest = false;
-
+                bufferSize = 0;
                 bufferLock.notifyAll();
             }

             try {
                 if (writeLength > 0)
                     wrapped.write(swapBuffer, 0, writeLength);
-                if (flushRequest)
+                if (doFlush)
                     wrapped.flush();
-                if (closeRequest) {
+                if (doClose) {
                     wrapped.close();

                     synchronized (bufferLock) {

Re: Deadlock detection

From
Kris Jurka
Date:

On Thu, 22 Jan 2009, Oliver Jowett wrote:

> Oliver Jowett wrote:
>
> This passes the regression testsuite with antiDeadlock=true, but I don't
> have a deadlock testcase on hand to try unfortunately.
>

This has a test case which deadlocks consistently:

http://archives.postgresql.org/pgsql-jdbc/2008-10/msg00045.php

Kris Jurka

Re: Deadlock detection

From
Oliver Jowett
Date:
Kris Jurka wrote:
>
>
> On Thu, 22 Jan 2009, Oliver Jowett wrote:
>
>> Oliver Jowett wrote:
>>
>> This passes the regression testsuite with antiDeadlock=true, but I don't
>> have a deadlock testcase on hand to try unfortunately.
>>
>
> This has a test case which deadlocks consistently:
>
> http://archives.postgresql.org/pgsql-jdbc/2008-10/msg00045.php

Perfect, thanks

My changes seem to pass BatchDeadLock with antiDeadlock=true (and still
hang with antiDeadlock=false):

> 14:59:45.233 (1)  FE=> Execute(portal=null,limit=1)
> 14:59:45.233 (1)  FE=> Sync
> Warning: possible socket deadlock detected (timeout=30000, remaining buffer=1299975
> java.lang.Throwable: Deadlock call stack
>     at org.postgresql.core.AntiDeadlockStream.flush(AntiDeadlockStream.java:83)
>     at org.postgresql.core.PGStream.flush(PGStream.java:517)
>     at org.postgresql.core.v3.QueryExecutorImpl.sendSync(QueryExecutorImpl.java:692)
>     at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:342)
>     at org.postgresql.jdbc2.AbstractJdbc2Statement.executeBatch(AbstractJdbc2Statement.java:2693)
>     at BatchDeadLock.main(BatchDeadLock.java:40)
> 15:00:15.234 (1)  <=BE ParseComplete [S_1]
> 15:00:15.234 (1)  <=BE ParameterDescription
> 15:00:15.235 (1)  <=BE NoData
> 15:00:15.235 (1)  <=BE BindComplete [null]
> 15:00:15.235 (1)  <=BE CommandStatus(INSERT 0 1)
> 15:00:15.235 (1)  <=BE ParseComplete [S_2]
> 15:00:15.235 (1)  <=BE ParameterDescription
> 15:00:15.235 (1)  <=BE NoData
> [.. etc ..]

-O

Re: Deadlock detection

From
Oliver Jowett
Date:
Simon Riggs wrote:

> There was another hang earlier today. One session just went idle in
> transaction on server right in the middle of a long transaction. 1500
> statements in about a second on the session, then nothing, while holding
> locks. People are adamant it is not the application; much checking has
> been done.

It's worth noting that this does not sound like the socket deadlock case.

In the socket deadlock case, the backend will be stuck in PARSE or
something similar, blocked trying to write to the JDBC client.

-O

Re: Deadlock detection

From
bdbusch
Date:
We came across this thread today researching our issue. Tersely as possible:

JBoss 4.2.3, Solaris 10 (x64), PG 8.2 (with GIS), Hibernate/EJB2.

We were storing a GIS column as a LOB in WKT format (e.g., POINT(23.22
23.22)) and ocassionly would have apparent transactions timeout trying to
insert into this table. (turns out that a T doesn't timeout per se, it just
never finishes and the timeout our customers see are Ajax/session related).

- JBoss logs would show the transaction committing from Hibernate.

- Thread dumps on the JVM would show the stuck thread in

"http-0.0.0.0-8443-4" daemon prio=3 tid=0x00000000019ee400 nid=0x70 runnable
[0xfffffd7eab923000..0xfffffd7eab9268a0]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
        - locked <0xfffffd7fad49c858> (a java.io.BufferedOutputStream)
        at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
        at org.postgresql.core.PGStream.Send(PGStream.java:208)
        at
org.postgresql.core.v3.SimpleParameterList.writeV3Value(SimpleParameterList.java:258)
        at
org.postgresql.core.v3.QueryExecutorImpl.sendBind(QueryExecutorImpl.java:861)
        at
org.postgresql.core.v3.QueryExecutorImpl.sendOneQuery(QueryExecutorImpl.java:1052)
        at
org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:643)
        at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:344)
        - locked <0xfffffd7fad4a1af8> (a
org.postgresql.core.v3.QueryExecutorImpl)
        at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeBatch(AbstractJdbc2Statement.java:2592)
        at sun.reflect.GeneratedMethodAccessor152.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at

org.postgresql.ds.jdbc23.AbstractJdbc23PooledConnection$StatementHandler.invoke(AbstractJdbc23PooledConnection.java:471)
        at $Proxy291.executeBatch(Unknown Source)
        at
org.jboss.resource.adapter.jdbc.WrappedStatement.executeBatch(WrappedStatement.java:774)
        at
org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:48)
        at
org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:246)
        at
org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:266)
        at
org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:167)
        at
org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:298)
...

(no other monitor/blocks - ONLY thread "running" at the time!)

- PGAdmin3 server stats/locks would show a slew of locks on this table (no
other locks, so no traditional  deadlock per se).

- truss(1M) on the postgres pid holding the locks showed:

% truss -p 14713
send(9, 0x082FBF00, 8192, 0)    (sleeping...)

pfiles(1M) on this pid showed that fd9 was this same socket connected to our
JVM and blocked

clearly (at least to us) - the database was writing to this socket (seen
from truss) and the jvm was writing to this socket (seen from jvm stack)


- netstat(1M) showed no send/recvQ data on this socket

- PG log showed

-- query was simple insert into FeatureName(xxx) values ($1, ...)
-- the geom text/LOB had a 22K bind (large country MULTIPOLYGON)

WORKAROUND

We're going to remove the string (we determined our client app doesn't use
the geom for this table) or at least convert it to the binary version long
term when we start using it again.

WAY FORWARD

We're planning on upgrading to 8.3.17 (? whatever is the latest) and hope
that this deadlock in the driver doesn't happen again.

Just posting this if it might be useful for your testing.



--
View this message in context: http://www.nabble.com/Deadlock-detection-tp21580039p23067564.html
Sent from the PostgreSQL - jdbc mailing list archive at Nabble.com.


Re: Deadlock detection

From
Kris Jurka
Date:

On Thu, 22 Jan 2009, Oliver Jowett wrote:

> Kris Jurka wrote:
>>
>> On Thu, 22 Jan 2009, Oliver Jowett wrote:
>>>
>>> This passes the regression testsuite with antiDeadlock=true, but I don't
>>> have a deadlock testcase on hand to try unfortunately.
>>>
>>
>> This has a test case which deadlocks consistently:
>>
>> http://archives.postgresql.org/pgsql-jdbc/2008-10/msg00045.php
>
> Perfect, thanks
>
> My changes seem to pass BatchDeadLock with antiDeadlock=true (and still
> hang with antiDeadlock=false):

I'm not really sure what to do with this code.  It does two different
things and what is does best really wasn't the thrust of this thread.

1) It avoids deadlocks by buffering the writes in local memory.  That's
great, but wasn't really the point of the thread.  Would we suggest people
run with antiDeadlock=true in a production environment to avoid deadlocks?
If so, why wouldn't it be the default.

2) It warns when a deadlock has potentially occurred with stack trace
information to provide the call location.  That's good, but it can't be
100% accurate.  With a small timeout it may warn when there wouldn't be an
actual deadlock and with a large timeout it may not warn when the call
would have deadlocked without the intermediate buffer.  Having the
intermediate buffer means that you aren't testing the same thing, so a
success with antiDeadlock=true has no implications as to whether your
application will deadlock with it turned off.  Previously it was discussed
that you'd really want to cap the buffer size, so perhaps relating the
maximum buffer size to Socket.getSendBufferSize might make the results of
testing with antiDeadlock more representative of the behavior with it
turned off.

So it's a useful piece of code, but I'm not sure what context it should
really be used in.  Is anyone using this?  For what purpose?

Kris Jurka

Re: Deadlock detection

From
Oliver Jowett
Date:
Kris Jurka wrote:

> So it's a useful piece of code, but I'm not sure what context it should
> really be used in.  Is anyone using this?  For what purpose?

I'm not using it, it was just a quick hack to try to narrow down a
deadlock problem IIRC ..

-O

Re: Deadlock detection

From
Simon Riggs
Date:
On Mon, 2009-06-01 at 19:30 -0400, Kris Jurka wrote:

> 1) It avoids deadlocks by buffering the writes in local memory.  That's
> great, but wasn't really the point of the thread.  Would we suggest people
> run with antiDeadlock=true in a production environment to avoid deadlocks?
> If so, why wouldn't it be the default.

> So it's a useful piece of code, but I'm not sure what context it should
> really be used in.  Is anyone using this?  For what purpose?

The reason the code was written went away before the code could sensibly
be used. We removed the contention in the database by redesign rather
than by JDBC changes. Thanks to Oliver for taking the time.

Having said that, it is clear there were some issues there that are not
fully explained. It may be that it is a JDBC driver issue, but also it
may be a protocol/handling issue. Intermittent issues are hard to get a
full state on to debug, and we are left looking at spoor or wishing we
had some.

I would like to see code added to the driver for (1), or other code. My
main request is that we have some way of responding to the problems that
we know can exist. The main argument for not buffering everything would
be performance, I think. 99.9% of users are happy as is, so to act in a
way that effects them would be bad. Sometimes, a "safe mode" would be a
useful thing to be able to enable, as a way of eliminating doubt.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support