Re: Infinite waitOnLock - Mailing list pgsql-jdbc

From Dave Cramer
Subject Re: Infinite waitOnLock
Date
Msg-id CADK3HH+v9i6-0BW0tj3AZafSqN8mpNA3BaiKL1SMVEpZBbnUOg@mail.gmail.com
Whole thread Raw
In response to Re: Infinite waitOnLock  (Leonard Meyer <lmeyer@excilys.com>)
Responses Re: Infinite waitOnLock  (Leonard Meyer <lmeyer@excilys.com>)
List pgsql-jdbc
So I'm curious why doesn't your app send a SIGTERM which would interrupt the threads.
Other than that I can think of two strategies

1) make isValid smarter
2) put a timeout in the wait, but of course we don't know how long to wait? However I am thinking that very large copies can't be that common.

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca

On 10 December 2014 at 08:21, Leonard Meyer <lmeyer@excilys.com> wrote:
Same behavior with jar from master. All threads are logging "org.postgresql.util.PSQLException: Database connection failed when canceling copy operation" and go into waiting state.

2014-12-10 12:34 GMT+01:00 Dave Cramer <pg@fastcrypt.com>:
So sure enough we do not timeout. The question is how long to wait. I haven't looked at this in detail, but copy could take a very long time depending on what people are copying in/out

Before we go chasing this can you build a driver from master and try it? 

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca

On 10 December 2014 at 03:47, Leonard Meyer <lmeyer@excilys.com> wrote:
Hi,

I was advised to come here for this instead of the pgsql-bugs ML. As I already explained it, I'll just copy paste the problem :

We're doing an ETL Akka application using PostgreSQL 9.3.5 using driver 9.3-1102-jdbc41. We're currently testing our application resilience by shutting down the database in the middle of our processing. Basically we have 8 threads working and when shutting down, some threads just get stuck forever waiting on a lock.
We think it's coming from a bug inside the driver because we would expect the lock acquisition to timeout at some point. Here's the stacktrace of a frozen thread :

"xxxxxx-akka.actor.pinned-dispatcher-26" prio=10 tid=0x00007f3a1c007800 nid=0x3125 in Object.wait() [0x00007f3acaae8000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x0000000744a14aa0> (a org.postgresql.core.v3.QueryExecutorImpl)
at java.lang.Object.wait(Object.java:503)
at org.postgresql.core.v3.QueryExecutorImpl.waitOnLock(QueryExecutorImpl.java:91)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:228)
- locked <0x0000000744a14aa0> (a org.postgresql.core.v3.QueryExecutorImpl)
at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:561)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:405)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:285)
at org.postgresql.jdbc4.AbstractJdbc4Connection.isValid(AbstractJdbc4Connection.java:130)
at org.postgresql.jdbc4.Jdbc4Connection.isValid(Jdbc4Connection.java:21)
at com.zaxxer.hikari.proxy.ConnectionProxy.isValid(ConnectionProxy.java:357)
at xx.xxxx.actors.flux.mclu.McluProcessor$$anonfun$receive$1.applyOrElse(McluProcessor.scala:65)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at xx.xxxx.actors.flux.mclu.McluProcessor.aroundReceive(McluProcessor.scala:30)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
at akka.dispatch.Mailbox.run(Mailbox.scala:221)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
   Locked ownable synchronizers:
- <0x000000074482e6d0> (a java.util.concurrent.ThreadPoolExecutor$Worker)

 We've first dropped by the Akka group, thinking it was a problem with our code, but nothing came up. HikariCP's main developper showed up with an explanation though :

It appears that you are using the PostgreSQL CopyManager, correct?  Looking at QueryExecutorImpl it appears that rollback() is trying to obtain a lock that was not released by the CopyManager.  I recommend using the CopyManager.copyIn() method that returns a CopyIn object, rather than using the convenience method that takes a reader.  Use the writeToCopy() to pump the data in, and be sure to catch SQLException.  If you get an SQLException, call cancelCopy() and retry or whatever your recovery scenario is, otherwise call endCopy().  I would have expected PostgreSQL to handle the severing of a Connection in the middle of a bulk copy better, but that is probably a question for the PostgreSQL group.

Tried it, no luck. Here the link for reference : https://groups.google.com/forum/#!topic/akka-user/Ehzioy3jVoU

 Here's what we can show you of our (scala) code :
 
 val cpManager = connection.unwrap(classOf[PGConnection]).getCopyAPI
    val stringBytes: Array[Byte] = batchStrings.map(_.toByte).toArray
    val copy = cpManager.copyIn(s"COPY temp FROM STDIN WITH CSV")
    try {
      copy.writeToCopy(stringBytes, 0, stringBytes.size)
      copy.endCopy()
    } finally {
      if (copy.isActive) {
        copy.cancelCopy()
      }
    }

 Connection used is closed further down in a finally block.

We're rather sure the lock is coming from here since we have another exactly similar processing but with regular inserts instead of this, and no issues so far. Thanks for any help.



pgsql-jdbc by date:

Previous
From: Leonard Meyer
Date:
Subject: Re: Infinite waitOnLock
Next
From: Leonard Meyer
Date:
Subject: Re: Infinite waitOnLock