Thread: idle in transaction query makes server unresponsive
Hi everyone,
I have a problem that I've been struggling with for quite some time. Every once in a while I will get a connection that goes to idle in transaction on an in-house programmed application that connects with JDBC. That happens fairly regularly and the programmers are trying to clean that up, but sometimes the idle in transaction connection makes the PG server entirely unresponsive. I'm not getting connection refused, nothing. All connections existing or new, JDBC or psql, just hang. I've already got full query logging on to try to catch the problem query or connection so I can give the developers somewhere to look to resolve their issue with the application, but since queries are logged with runtimes I'm assuming they are only logged after they are complete. And since it's idle in transaction it never completes so it never gets logged. Our application is connecting as an unprivileged user named rmstomcat, and the database is limited to 400 connections out of 512. I'm not running out of connections as I've got reserved connections set, and even connecting as user postgres with psql the connection just hangs. The server doesn't appear to be running out of memory when this happens and nothing is printed in the log. The only thing that resolves it is doing a kill on the PID of any idle in transaction connections existing at the time causing them to roll back. Then everything else picks up right where it left off and works again.
Can anyone give me any hints about why PG becomes unresponsive? Or how to fix it so it doesn't?
My server is 9.1.2 right now. I will be upgrading to the latest 9.1 series soon, but until 9.2 can be run through our development/testing cycle I can't upgrade to 9.2. That will take about 6-10 months.
Thanks!
Scot Kreienkamp
This message is intended only for the individual or entity to which it is addressed. It may contain privileged, confidential information which is exempt from disclosure under applicable laws. If you are not the intended recipient, please note that you are strictly prohibited from disseminating or distributing this information (other than to the intended recipient) or copying this information. If you have received this communication in error, please notify us immediately by e-mail or by telephone at the above number. Thank you.
On 09/25/12 12:23 PM, Scot Kreienkamp wrote: > > I have a problem that I've been struggling with for quite some time. > Every once in a while I will get a connection that goes to idle in > transaction on an in-house programmed application that connects with > JDBC. That happens fairly regularly and the programmers are trying to > clean that up, but sometimes the idle in transaction connection makes > the PG server entirely unresponsive. I'm not getting connection > refused, nothing. All connections existing or new, JDBC or psql, just > hang. I've already got full query logging on to try to catch the > problem query or connection so I can give the developers somewhere to > look to resolve their issue with the application, but since queries > are logged with runtimes I'm assuming they are only logged after they > are complete. And since it's idle in transaction it never completes > so it never gets logged. Our application is connecting as an > unprivileged user named rmstomcat, and the database is limited to 400 > connections out of 512. I'm not running out of connections as I've > got reserved connections set, and even connecting as user postgres > with psql the connection just hangs. The server doesn't appear to be > running out of memory when this happens and nothing is printed in the > log. The only thing that resolves it is doing a kill on the PID of > any idle in transaction connections existing at the time causing them > to roll back. Then everything else picks up right where it left off > and works again. > > Can anyone give me any hints about why PG becomes unresponsive? Or > how to fix it so it doesn't? > that is a LOT of connections. you likely should be limiting that with a connection pooler, and configuring your application to ... 1) get connection from pool 2) execute transaction 3) release connection to pool then configure the pool to stall the requester when some sane number of connections has been reached, like no more than 2-3X the number of CPU cores or hardware threads you have. you'll likely get better overall throughput. if you have jobs that execute long running queries for reporting etc, have those use a seperate smaller pool. re: your logging.... <idle in transaction> means that connection has no query running but started a transaction. there's no pending query on that connection. these are normally only a concern when they go on for a long time, say 10 minutes or more. however, if that transaction has gotten locks on resources, and is then sitting on its thumbs doing nothing, OTHER connections likely will block. join pg_stat_activity with pg_locks to find out what all is going on.. -- john r pierce N 37, W 122 santa cruz ca mid-left coast
> -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > owner@postgresql.org] On Behalf Of John R Pierce > Sent: Tuesday, September 25, 2012 3:53 PM > To: pgsql-general@postgresql.org > Subject: Re: [GENERAL] idle in transaction query makes server unresponsive > > On 09/25/12 12:23 PM, Scot Kreienkamp wrote: > > > > I have a problem that I've been struggling with for quite some time. > > Every once in a while I will get a connection that goes to idle in > > transaction on an in-house programmed application that connects with > > JDBC. That happens fairly regularly and the programmers are trying to > > clean that up, but sometimes the idle in transaction connection makes > > the PG server entirely unresponsive. I'm not getting connection > > refused, nothing. All connections existing or new, JDBC or psql, just > > hang. I've already got full query logging on to try to catch the > > problem query or connection so I can give the developers somewhere to > > look to resolve their issue with the application, but since queries > > are logged with runtimes I'm assuming they are only logged after they > > are complete. And since it's idle in transaction it never completes > > so it never gets logged. Our application is connecting as an > > unprivileged user named rmstomcat, and the database is limited to 400 > > connections out of 512. I'm not running out of connections as I've > > got reserved connections set, and even connecting as user postgres > > with psql the connection just hangs. The server doesn't appear to be > > running out of memory when this happens and nothing is printed in the > > log. The only thing that resolves it is doing a kill on the PID of > > any idle in transaction connections existing at the time causing them > > to roll back. Then everything else picks up right where it left off > > and works again. > > > > Can anyone give me any hints about why PG becomes unresponsive? Or > > how to fix it so it doesn't? > > > > > that is a LOT of connections. you likely should be limiting that with > a connection pooler, and configuring your application to ... > > 1) get connection from pool > 2) execute transaction > 3) release connection to pool > > then configure the pool to stall the requester when some sane number of > connections has been reached, like no more than 2-3X the number of CPU > cores or hardware threads you have. you'll likely get better overall > throughput. > > if you have jobs that execute long running queries for reporting etc, > have those use a seperate smaller pool. > > re: your logging.... <idle in transaction> means that connection has no > query running but started a transaction. there's no pending query on > that connection. these are normally only a concern when they go on > for a long time, say 10 minutes or more. however, if that transaction > has gotten locks on resources, and is then sitting on its thumbs doing > nothing, OTHER connections likely will block. join pg_stat_activity > with pg_locks to find out what all is going on.. > [Scot Kreienkamp] Hi John, The application is using a pooler and generally runs around 100 connections, but I've seen it as high as 200 during the dayfor normal use. It's on a large server; 64 cores total and about 500 gigs of memory. That's one of the reasons I leftit at 512 connections. The idle in transaction connections are getting locks and then going idle in transaction causingthe queries to be waiting in that database. That I can understand. My problem is that I can't run a query to seewhat exactly it's doing because the entire Postgres server is unresponsive. I can't even use psql to connect to the postgresuser database as user postgres so I can query pg_stat_activity, that hangs also until I kill the idle in transactionquery PID. That's what my dilemma is. The server hardware itself is not being stressed when that's happeningthough, so it doesn't appear to be a resource problem, but I can't check because I can't see what PG is doing. The problem is how do I investigate this when PG is entirely unresponsive? Why is it becoming unresponsive, and how do Iprevent the PG server from becoming unresponsive? Thanks! This message is intended only for the individual or entity to which it is addressed. It may contain privileged, confidentialinformation which is exempt from disclosure under applicable laws. If you are not the intended recipient, pleasenote that you are strictly prohibited from disseminating or distributing this information (other than to the intendedrecipient) or copying this information. If you have received this communication in error, please notify us immediatelyby e-mail or by telephone at the above number. Thank you.
On 09/25/12 1:35 PM, Scot Kreienkamp wrote: > The problem is how do I investigate this when PG is entirely unresponsive? Why is it becoming unresponsive, and how doI prevent the PG server from becoming unresponsive? I think I'd push that 9.1.latest upgrade ASAP, and then see if this problem continues. been a pile of critical fixes since 9.1.2, to whit... http://www.postgresql.org/docs/current/static/release-9-1-3.html http://www.postgresql.org/docs/current/static/release-9-1-4.html http://www.postgresql.org/docs/current/static/release-9-1-5.html http://www.postgresql.org/docs/current/static/release-9-1-6.html -- john r pierce N 37, W 122 santa cruz ca mid-left coast
Scot Kreienkamp wrote on 25.09.2012 22:35: > The application is using a pooler and generally runs around 100 > connections, but I've seen it as high as 200 during the day for > normal use. It's on a large server; 64 cores total and about 500 > gigs of memory. That's one of the reasons I left it at 512 > connections. We had several web applications where performance was *improved* by configuring the connection pool have a a lot less connections. There is a threshold where too many connections will simply flood the server. Lowering the number of processes fighting for resource makes each process faster. You might want to give it a try.
Scot Kreienkamp wrote: > I have a problem that I've been struggling with for quite some time. Every once in a while I will get > a connection that goes to idle in transaction on an in-house programmed application that connects with > JDBC. That happens fairly regularly and the programmers are trying to clean that up, but sometimes > the idle in transaction connection makes the PG server entirely unresponsive. I'm not getting > connection refused, nothing. All connections existing or new, JDBC or psql, just hang. I've already > got full query logging on to try to catch the problem query or connection so I can give the developers > somewhere to look to resolve their issue with the application, but since queries are logged with > runtimes I'm assuming they are only logged after they are complete. And since it's idle in > transaction it never completes so it never gets logged. Our application is connecting as an > unprivileged user named rmstomcat, and the database is limited to 400 connections out of 512. I'm not > running out of connections as I've got reserved connections set, and even connecting as user postgres > with psql the connection just hangs. The server doesn't appear to be running out of memory when this > happens and nothing is printed in the log. The only thing that resolves it is doing a kill on the PID > of any idle in transaction connections existing at the time causing them to roll back. Then > everything else picks up right where it left off and works again. > > Can anyone give me any hints about why PG becomes unresponsive? Or how to fix it so it doesn't? > > My server is 9.1.2 right now. I will be upgrading to the latest 9.1 series soon, but until 9.2 can be > run through our development/testing cycle I can't upgrade to 9.2. That will take about 6-10 months. Yes, see if upgrading to 9.1 makes the problem disappear. It is surprising that you cannot even start new connections. You could try to "strace" the postmaster during a connection attempt and see what happens. Maybe that helps to spot the place where things go wrong. Yours, Laurenz Albe
> -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > owner@postgresql.org] On Behalf Of Thomas Kellerer > Sent: Tuesday, September 25, 2012 5:25 PM > To: pgsql-general@postgresql.org > Subject: Re: [GENERAL] idle in transaction query makes server unresponsive > > Scot Kreienkamp wrote on 25.09.2012 22:35: > > The application is using a pooler and generally runs around 100 > > connections, but I've seen it as high as 200 during the day for > > normal use. It's on a large server; 64 cores total and about 500 > > gigs of memory. That's one of the reasons I left it at 512 > > connections. > > We had several web applications where performance was *improved* > by configuring the connection pool have a a lot less connections. > > There is a threshold where too many connections > will simply flood the server. Lowering the number of processes > fighting for resource makes each process faster. > > You might want to give it a try. > [Scot Kreienkamp] Can I get the same effect by limiting the number of logons for the user that my application is connecting as to 250? Ordo I need to lower the number in postgresql.conf? I'd rather go with the first option because I can adjust it live withoutediting and reloading config files. This message is intended only for the individual or entity to which it is addressed. It may contain privileged, confidentialinformation which is exempt from disclosure under applicable laws. If you are not the intended recipient, pleasenote that you are strictly prohibited from disseminating or distributing this information (other than to the intendedrecipient) or copying this information. If you have received this communication in error, please notify us immediatelyby e-mail or by telephone at the above number. Thank you.
> -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > owner@postgresql.org] On Behalf Of Albe Laurenz > Sent: Wednesday, September 26, 2012 5:15 AM > To: Scot Kreienkamp; pgsql-general@postgresql.org > Subject: Re: [GENERAL] idle in transaction query makes server unresponsive > > Scot Kreienkamp wrote: > > I have a problem that I've been struggling with for quite some time. > Every once in a while I will get > > a connection that goes to idle in transaction on an in-house > programmed application that connects with > > JDBC. That happens fairly regularly and the programmers are trying to > clean that up, but sometimes > > the idle in transaction connection makes the PG server entirely > unresponsive. I'm not getting > > connection refused, nothing. All connections existing or new, JDBC or > psql, just hang. I've already > > got full query logging on to try to catch the problem query or > connection so I can give the developers > > somewhere to look to resolve their issue with the application, but > since queries are logged with > > runtimes I'm assuming they are only logged after they are complete. > And since it's idle in > > transaction it never completes so it never gets logged. Our > application is connecting as an > > unprivileged user named rmstomcat, and the database is limited to 400 > connections out of 512. I'm not > > running out of connections as I've got reserved connections set, and > even connecting as user postgres > > with psql the connection just hangs. The server doesn't appear to be > running out of memory when this > > happens and nothing is printed in the log. The only thing that > resolves it is doing a kill on the PID > > of any idle in transaction connections existing at the time causing > them to roll back. Then > > everything else picks up right where it left off and works again. > > > > Can anyone give me any hints about why PG becomes unresponsive? Or > how to fix it so it doesn't? > > > > My server is 9.1.2 right now. I will be upgrading to the latest 9.1 > series soon, but until 9.2 can be > > run through our development/testing cycle I can't upgrade to 9.2. > That will take about 6-10 months. > > Yes, see if upgrading to 9.1 makes the problem disappear. > > It is surprising that you cannot even start new connections. > > You could try to "strace" the postmaster during a connection attempt > and see what happens. Maybe that helps to spot the place where > things go wrong. > > Yours, > Laurenz Albe [Scot Kreienkamp] I'm willing to give it a try, but I've never done it before. What do I need to do? This message is intended only for the individual or entity to which it is addressed. It may contain privileged, confidentialinformation which is exempt from disclosure under applicable laws. If you are not the intended recipient, pleasenote that you are strictly prohibited from disseminating or distributing this information (other than to the intendedrecipient) or copying this information. If you have received this communication in error, please notify us immediatelyby e-mail or by telephone at the above number. Thank you.
Scot Kreienkamp wrote: >> You could try to "strace" the postmaster during a connection attempt >> and see what happens. Maybe that helps to spot the place where >> things go wrong. > [Scot Kreienkamp] > I'm willing to give it a try, but I've never done it before. What do I need to do? "man strace" is your friend. Assuming you are on Linux. Find out the process ID of the postmaster process (the parent of the other PostgreSQL processes), then try strace -f -o /some/large/directory/tracefile -p process_id The connect to PostgreSQL and see what gets logged. Press Ctrl+C to stop strace. Yours, Laurenz Albe
Chris Travers wrote: > If it is truly idle in a transaction, maybe it has locks that are holding up other transactions? > > Locks are usually held until commit time, except advisory locks iirc but those have to be explicitly > checked, so if you don't know if you are using them you probably aren't. Yes, but locks that block a new connection from succeeding? Yours, Laurenz Albe