Thread: Can't throw the dreaded 'idle in transaction' - need help!
Hello,
Much like other people on this list, we are a Java based developer that uses PG7.4.6 for our database. For a long time we have had intermittent and generally non-reproducible 'idle in transaction' problems that would occur from time to time. Now we're trying to resolve this issue once and for all and I was hoping if someone could enlighten me on the current 'state of play' with this issue and how I might go about finding what's causing it.
Our application makes several threaded servlet connections to PG via Jetty and a managed connection pool. We have tried several pool managers of various complexity and even tried direct (per thread) connections to the DB. We have tried the 7.4 production version of the JDBC driver and the newer 8.0 beta version (type 2 and type 3) all with the same issue (although the 8.0 beta seems better, it still occurs regularly).
From what we can establish, a singular (non-blocked) transaction is updating a row via the pool and then holding the row locked (idle in transaction). Mostly the driver seems to clear these events, but regularly (within 3 seconds) another event will occur seeking to acquire a lock for update on the same row. When this happens, the pool allocates all it's threads (set to 20) and Jetty finally blocks waiting for the update to commit. It almost seems like the second update is issued before the first has a chance to clear it's locks and the driver can't recover ( or something like this), requiring a Jetty restart.
Our problem is in debugging this:
1. We don't know how we can find out exactly which statement caused the idle-in-transaction, all we know is which is waiting for the row via pg_stat_activity. How can we find this information?
2. Is the idle in transaction a "normal" event for all update transactions? Should the database release the lock immediately after the commit, in all instances? Should we assume that when we see an 'idle in transaction' that some error event has occurred that we're not seeing, like a failed query etc, that's causing the connection to remain open?
3. We've tried to reduce all large transactions and complex queries effectively to small transaction 'blocks' or simply leave auto.commit to true and issue only single line updates. But this hasn't helped.
Most importantly, we'd like to understand why this event would cause the pool or driver to lockup, as this may help us in avoiding this issue.
Any help would be most appreciated.
Kind regards,
Andrew.
On Mon, 22 Nov 2004, Temp02 wrote: > 2. Is the idle in transaction a "normal" event for all update > transactions? Should the database release the lock immediately after the > commit, in all instances? Should we assume that when we see an 'idle in > transaction' that some error event has occurred that we're not seeing, > like a failed query etc, that's causing the connection to remain open? Idle in transaction is a normal state for every transaction that is not run in autocommit mode. When the first statement is run a transaction is started. When this statement completes the connection is "idle in transaction". It stays this way until the next statement is run, such that the transaction is no longer idle, or the transaction is ended by either a commit or rollback. > 3. We've tried to reduce all large transactions and complex queries > effectively to small transaction 'blocks' or simply leave auto.commit to > true and issue only single line updates. But this hasn't helped. Potential problems could be error handling code that bails out leaving the connection and transaction still in the idle-in-transaction state. Another problem in a multi-threaded multi-connection application is a deadlock involving both database and application resources. A deadlock in the database will be detected and broken, but the database doesn't know anything about locking code in the application. Kris Jurka
Temp02 wrote: > 1. We don't know how we can find out exactly which statement caused the > idle-in-transaction, all we know is which is waiting for the row via > pg_stat_activity. How can we find this information? You might want to turn on statement logging on the DB side. Once you identify the row that is locked, you should be able to backtrack through the logs and find a connection that locked that row but has not subsequently committed or rolled back. Note that there is no query that "causes" idle-in-transaction. The connection is *idle*; it is not processing a query. > 2. Is the idle in transaction a "normal" event for all update > transactions? It doesn't really have anything to do with update transactions; it's just that you only see problems if it is an update transaction that goes idle, as it will be holding locks. Idle in transaction just means that the connection is in a transaction (at the JDBC level, autocommit is off and a query has been issued) and the backend is waiting for a new query to arrive. > Should the database release the lock immediately after the > commit, in all instances? Yes. > Should we assume that when we see an 'idle in > transaction' that some error event has occurred that we're not seeing, > like a failed query etc, that's causing the connection to remain open? The usual cause is that your application is not closing (via commit/rollback) a transaction it has started. There are many possible reasons for this, but failing to deal with errors is a common one. I'd suggest checking your application code to make sure that transactions are always closed, even in the face of exceptions or other failures. Alternatively, you might have an application/db deadlock happening (thread 1 acquires DB lock; thread 2 acquires Java lock; thread 1 blocks waiting for thread 2 to release the Java lock; thread 2 blocks waiting for the transaction started by thread 1 to complete and release the DB lock; everything stops). Sending the JVM a SIGQUIT (causing a thread dump) might help if the responsible thread really is blocked and hasn't just leaked the connection or forgotten to close the transaction. -O
Thankyou for your feedback. Further to this issue, this is what we found: 1. Turning on statement logging for every query shows that the last transaction that was responsible for the 'idle in transaction' did in fact complete successfully. The code did not error during this processing and appeared to complete the transaction normally. 2. The lock being held is a row lock that pertains to only one user, so we don't know why this event would cause the entire servlet to block and hold, as other threads from the pool should be available for other users which are not reliant on this row. This to me seems like a jdbc driver issue, because Jetty is happy to service other servlets that require no database connection (indicating that Jetty is okay), and when we tried this without the use of any DB connection pool, the same situation occured. Can anyone validate if this seems logical? 3. Doing periodic 'ps' on the server frequently shows threads that remain 'idle in transaction' for up to about 3-5 secs before being normally cleared. Is this what we should expect even if we issue explicit con.commit statements after the statement closes? I would have not expected to see idle in transaction at all. Overall I don't really understand why this is happening or where we should now look. Stepping line by line through the statements between that produced under normal circumstances and that which is produced during this lock situation, seems identical. We don't use any Java locking with the application we leave it all up to the DB. Any thoughts on where I could go from here? Kind regards, Andrew. ----- Original Message ----- From: "Oliver Jowett" <oliver@opencloud.com> To: "Temp02" <temp02@bluereef.com.au> Cc: <pgsql-jdbc@postgresql.org> Sent: Monday, November 22, 2004 5:28 PM Subject: Re: [JDBC] Can't throw the dreaded 'idle in transaction' - need help! > Temp02 wrote: > > > 1. We don't know how we can find out exactly which statement caused the > > idle-in-transaction, all we know is which is waiting for the row via > > pg_stat_activity. How can we find this information? > > You might want to turn on statement logging on the DB side. Once you > identify the row that is locked, you should be able to backtrack through > the logs and find a connection that locked that row but has not > subsequently committed or rolled back. > > Note that there is no query that "causes" idle-in-transaction. The > connection is *idle*; it is not processing a query. > > > 2. Is the idle in transaction a "normal" event for all update > > transactions? > > It doesn't really have anything to do with update transactions; it's > just that you only see problems if it is an update transaction that goes > idle, as it will be holding locks. Idle in transaction just means that > the connection is in a transaction (at the JDBC level, autocommit is off > and a query has been issued) and the backend is waiting for a new query > to arrive. > > > Should the database release the lock immediately after the > > commit, in all instances? > > Yes. > > > Should we assume that when we see an 'idle in > > transaction' that some error event has occurred that we're not seeing, > > like a failed query etc, that's causing the connection to remain open? > > The usual cause is that your application is not closing (via > commit/rollback) a transaction it has started. There are many possible > reasons for this, but failing to deal with errors is a common one. I'd > suggest checking your application code to make sure that transactions > are always closed, even in the face of exceptions or other failures. > > Alternatively, you might have an application/db deadlock happening > (thread 1 acquires DB lock; thread 2 acquires Java lock; thread 1 blocks > waiting for thread 2 to release the Java lock; thread 2 blocks waiting > for the transaction started by thread 1 to complete and release the DB > lock; everything stops). > > Sending the JVM a SIGQUIT (causing a thread dump) might help if the > responsible thread really is blocked and hasn't just leaked the > connection or forgotten to close the transaction. > > -O > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html
Temp02 wrote: > 1. Turning on statement logging for every query shows that the last > transaction that was responsible for the 'idle in transaction' did in fact > complete successfully. The code did not error during this processing and > appeared to complete the transaction normally. What do you mean by "complete the transaction normally" -- is the last logged statement on that connection a COMMIT or ROLLBACK? Can you provide statement logging for one of the connections that gets stuck in idle-in-transaction up to the point it gets stuck? > 2. The lock being held is a row lock that pertains to only one user, so we > don't know why this event would cause the entire servlet to block and hold, > as other threads from the pool should be available for other users which are > not reliant on this row. The locks you need to obtain depends on the query you are executing, so it's hard to say. > This to me seems like a jdbc driver issue, because > Jetty is happy to service other servlets that require no database connection > (indicating that Jetty is okay), and when we tried this without the use of > any DB connection pool, the same situation occured. Can anyone validate if > this seems logical? The locks are all on the server side, and the JDBC driver knows nothing about them. So I can't really see it being a JDBC driver bug, unless the JDBC driver is getting transaction demarcation wrong. > 3. Doing periodic 'ps' on the server frequently shows threads that remain > 'idle in transaction' for up to about 3-5 secs before being normally > cleared. Is this what we should expect even if we issue explicit con.commit > statements after the statement closes? I would have not expected to see idle > in transaction at all. For a particular connection the lifecycle looks something like: connection made from client mark connection as idle receive BEGIN from client mark connection as busy start transaction send BEGIN result to client mark connection as idle in transaction receive query from client mark connection as busy execute query send results to client mark connection as idle in transaction [... more queries ...] receive COMMIT or ROLLBACK from client mark connection as busy commit or rollback transaction send COMMIT or ROLLBACK result to client mark connection as idle So it is normal to see connections that are "idle in transaction" for short periods; these are connections that are waiting for the next query from the client. -O
On Tuesday 23 November 2004 01:03, Temp02 wrote: > We don't use any Java locking with the application we leave it all > up to the DB. Not a single synchronized block of code in your entire application? Did you try sending a SIGQUIT to your application, as Oliver suggested in http://archives.postgresql.org/pgsql-jdbc/2004-11/msg00169.php | $ cat Main.java | public class Main { | public static void main(String[] _) throws InterruptedException { | while (true) { | Thread.sleep(1000); | } | } | } | | $ javac Main.java | $ java -cp . Main & | [1] 9346 | $ ps -ef | grep java | vadim 9346 7513 3 10:03 pts/0 00:00:00 java -cp . Main In response to a SIGQUIT, the JRE should dump stack traces for all the current threads, like so: | $ kill -QUIT 9346 | Full thread dump Java HotSpot(TM) Client VM (1.4.2_05-b04 mixed mode): | | "Signal Dispatcher" daemon prio=1 tid=0x08095950 nid=0x2482 waiting on condition [0..0] | | "Finalizer" daemon prio=1 tid=0x08090e90 nid=0x2482 in Object.wait() [4c792000..4c79287c] | at java.lang.Object.wait(Native Method) | - waiting on <0x44590490> (a java.lang.ref.ReferenceQueue$Lock) | at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:111) | - locked <0x44590490> (a java.lang.ref.ReferenceQueue$Lock) | at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:127) | at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) | | "Reference Handler" daemon prio=1 tid=0x080902e8 nid=0x2482 in Object.wait() [4c711000..4c71187c] | at java.lang.Object.wait(Native Method) | - waiting on <0x44590380> (a java.lang.ref.Reference$Lock) | at java.lang.Object.wait(Object.java:429) | at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:115) | - locked <0x44590380> (a java.lang.ref.Reference$Lock) | | "main" prio=1 tid=0x0805ab88 nid=0x2482 waiting on condition [bfffd000..bfffd55c] | at java.lang.Thread.sleep(Native Method) | at Main.main(Main.java:4) | | "VM Thread" prio=1 tid=0x0808f088 nid=0x2482 runnable | | "VM Periodic Task Thread" prio=1 tid=0x080981a0 nid=0x2482 waiting on condition | "Suspend Checker Thread" prio=1 tid=0x08094f98 nid=0x2482 runnable Except yours will be a lot longer.