Thread: How to diagnose application "hangs" in pg_receive?
Hi, A hibernate based application I wrote suffers from "hangs" - the thread which should fetch the data is stuck at pg_receive (the function reading back data from the socket). My first idea were threading errors, however after careful code review I couldn't find suspicious code. The application doesn't do any explicit locking, an excerpt of the locking table is at the end of the email. Any idea what could be going wrong here? Except a broken connction, what could be the reason for never arriving reply from the server? Thank you in advance, Clemens datname | relname | mode | granted | usename | substr | query_start | age | procpid -------------+--------------------------------+--------------------------+---------+----------+--------------------------------+-------------------------------+-----------------+--------- khbldb_prod | startkladde_pkey | RowExclusiveLock | t | postgres | autovacuum: VACUUM public.star | 2011-05-21 11:19:51.780587+01 | 03:43:21.367065 | 18241 khbldb_prod | startkladde | ShareUpdateExclusiveLock | t | postgres | autovacuum: VACUUM public.star | 2011-05-21 11:19:51.780587+01 | 03:43:21.367065 | 18241 khbldb_prod | kladde_flugnr_index | RowExclusiveLock | t | postgres | autovacuum: VACUUM public.star | 2011-05-21 11:19:51.780587+01 | 03:43:21.367065 | 18241 khbldb_prod | kladde_schleppid_index | RowExclusiveLock | t | postgres | autovacuum: VACUUM public.star | 2011-05-21 11:19:51.780587+01 | 03:43:21.367065 | 18241 khbldb_prod | kladde_lehrer_index | RowExclusiveLock | t | postgres | autovacuum: VACUUM public.star | 2011-05-21 11:19:51.780587+01 | 03:43:21.367065 | 18241 khbldb_prod | kladde_nachplatzid_index | RowExclusiveLock | t | postgres | autovacuum: VACUUM public.star | 2011-05-21 11:19:51.780587+01 | 03:43:21.367065 | 18241 khbldb_prod | kladde_bmok_index | RowExclusiveLock | t | postgres | autovacuum: VACUUM public.star | 2011-05-21 11:19:51.780587+01 | 03:43:21.367065 | 18241 khbldb_prod | kladde_vonplatzid_index | RowExclusiveLock | t | postgres | autovacuum: VACUUM public.star | 2011-05-21 11:19:51.780587+01 | 03:43:21.367065 | 18241 khbldb_prod | | ExclusiveLock | t | postgres | autovacuum: VACUUM public.star | 2011-05-21 11:19:51.780587+01 | 03:43:21.367065 | 18241 khbldb_prod | kladde_datum_index | RowExclusiveLock | t | postgres | autovacuum: VACUUM public.star | 2011-05-21 11:19:51.780587+01 | 03:43:21.367065 | 18241 khbldb_prod | statusindex | RowExclusiveLock | t | postgres | autovacuum: VACUUM public.star | 2011-05-21 11:19:51.780587+01 | 03:43:21.367065 | 18241 khbldb_prod | kladde_pilot_index | RowExclusiveLock | t | postgres | autovacuum: VACUUM public.star | 2011-05-21 11:19:51.780587+01 | 03:43:21.367065 | 18241 khbldb_prod | bzbisindex | RowExclusiveLock | t | postgres | autovacuum: VACUUM public.star | 2011-05-21 11:19:51.780587+01 | 03:43:21.367065 | 18241 khbldb_prod | kladde_flugzeug_index | RowExclusiveLock | t | postgres | autovacuum: VACUUM public.star | 2011-05-21 11:19:51.780587+01 | 03:43:21.367065 | 18241 khbldb_prod | kladde_geschlepptid_index | RowExclusiveLock | t | postgres | autovacuum: VACUUM public.star | 2011-05-21 11:19:51.780587+01 | 03:43:21.367065 | 18241 //No eclusive locks are held by the application: khbldb_prod | verkauf_type_index | AccessShareLock | t | khbl | <IDLE> in transaction | 2011-05-21 13:04:53.930505+01 | 01:58:19.217147 | 18637 khbldb_prod | rechnungsstatus_pkey | AccessShareLock | t | khbl | <IDLE> in transaction | 2011-05-21 13:04:53.930505+01 | 01:58:19.217147 | 18637 khbldb_prod | bzbisindex | AccessShareLock | t | khbl | <IDLE> in transaction | 2011-05-21 13:04:53.930505+01 | 01:58:19.217147 | 18637 khbldb_prod | rechnungsstatus | AccessShareLock | t | khbl | <IDLE> in transaction | 2011-05-21 13:04:53.930505+01 | 01:58:19.217147 | 18637 khbldb_prod | kladde_lehrer_index | AccessShareLock | t | khbl | <IDLE> in transaction | 2011-05-21 13:04:53.930505+01 | 01:58:19.217147 | 18637
On Sat, 2011-05-21 at 16:08 +0200, Clemens Eisserer wrote: > Hi, > > A hibernate based application I wrote suffers from "hangs" - the > thread which should fetch the data is stuck at pg_receive (the > function reading back data from the socket). > My first idea were threading errors, however after careful code review > I couldn't find suspicious code. > The application doesn't do any explicit locking, an excerpt of the > locking table is at the end of the email. > > Any idea what could be going wrong here? I'm not an expert in JDBC so all the debugging advice is on server side :) > Except a broken connction, what could be the reason for never arriving > reply from the server? Maybe too small connection pool somewhere on the way, so the statement is waiting for connection, not for query answer ? to see if this is the case, set full logging in postgresql.conf and see if the query reaches the database at all. You could also start by setting statement_timeout to some sensible value so that you can rule out a hang statement / query maybe set statement_timeout to '5s'; would be a good value to start. > Thank you in advance, Clemens -- ------- Hannu Krosing PostgreSQL Infinite Scalability and Performance Consultant PG Admin Book: http://www.2ndQuadrant.com/books/
Clemens Eisserer wrote: > A hibernate based application I wrote suffers from "hangs" - the > thread which should fetch the data is stuck at pg_receive (the > function reading back data from the socket). You have two things happening which look bad, and could cause you to experience extreme bloat. It may not be that your application is blocked, but that your database has gotten into such bad shape that queries are taking A Very Long Time. First, all those autovacuum processes which have been running for hours make me wonder if you're on a very old version of PostgreSQL -- we had some bugs for a while which might explain those if you haven't kept up with bug-fix minor releases: http://www.postgresql.org/support/versioning Please post the output from running the query on this page: http://wiki.postgresql.org/wiki/Server_Configuration Second, those " in transaction" connections which are hours old will prevent autovacuum (even if you don't have an old version with bugs) from cleaning up old versions of rows, making your queries progressively slower. These would be caused by poor programming technique in the application or the application framework, and should be considered to be serious bugs. -Kevin
Hi Kevin, The tables are so small and the queries so simple, I am very sure its not caused by bad shape of the database. (and the queries run fast, iff they run) I know that using those long-running transactions are very ugly, however its a low-client-count app (2-5 clients) which all disconnect at night where VACUUM and ANALYZE are performed once a week as cron job. Thanks, Clemens 2011/5/21 Kevin Grittner <Kevin.Grittner@wicourts.gov>: > Clemens Eisserer wrote: > >> A hibernate based application I wrote suffers from "hangs" - the >> thread which should fetch the data is stuck at pg_receive (the >> function reading back data from the socket). > > You have two things happening which look bad, and could cause you to > experience extreme bloat. It may not be that your application is > blocked, but that your database has gotten into such bad shape that > queries are taking A Very Long Time. > > First, all those autovacuum processes which have been running for > hours make me wonder if you're on a very old version of PostgreSQL > -- we had some bugs for a while which might explain those if you > haven't kept up with bug-fix minor releases: > > http://www.postgresql.org/support/versioning > > Please post the output from running the query on this page: > > http://wiki.postgresql.org/wiki/Server_Configuration > > Second, those " in transaction" connections which are hours > old will prevent autovacuum (even if you don't have an old version > with bugs) from cleaning up old versions of rows, making your > queries progressively slower. These would be caused by poor > programming technique in the application or the application > framework, and should be considered to be serious bugs. > > -Kevin > > >
> Any idea what could be going wrong here? > Except a broken connction, what could be the reason for never arriving > reply from the server? Stupid question, but could this be related to the issues discussed here: http://archives.postgresql.org/pgsql-jdbc/2011-04/msg00080.php ? Whatever happened to that patch? The e-mail thread peters out... --- Maciek Sakrejda | System Architect | Truviso 1065 E. Hillsdale Blvd., Suite 215 Foster City, CA 94404 (650) 242-3500 Main www.truviso.com
Please read this carefully: http://wiki.postgresql.org/wiki/Guide_to_reporting_problems You haven't told us what version of PostgreSQL you're running, in spite of my asking that you post the results of a query which would have told us that and other useful information. There are a number of things this could be, alone or in combination, but you have not provided anywhere near enough factual information to rule much in or out. The only evidence that you have provided suggests that at least part of your problem has been bloat, but you dismissed that out of hand without showing any sign that you understand that issue or have checked for it. In follow-ups, please try to avoid hand-wavy terms like "forever"; tell us whether you waited a minute, an hour, ten days, whatever "forever" actually was, along with any observations on CPU and disk usage during this time. You didn't show the active query in your listings -- were those taking during the time the query was hanging? -Kevin
Hi Kevin, If you feel personally attacked, just because I highlighted that I am sure the problem is not caused by a query taking a long time to execute, I am sorry. Nontheless, please find my detailed answers below. Thanks, Clemens > You haven't told us what version of PostgreSQL you're running, PostgreSQL 8.4.7 postgresql-8.4-701.jdbc3.jar > spite of my asking that you post the results of a query which would > have told us that and other useful information. Because as I said, the problem is not the execution of the query. Last time I experienced such a hang it was a simple "select .... from ... where id=x" query, which executes usually in a few ms. > The only evidence that you have provided suggests that at > least part of your problem has been bloat, but you dismissed that > out of hand without showing any sign that you understand that issue > or have checked for it. The database has about 2MB of data stored and is mostly read-only. I know what bloat is, understand the issue and have given a clear indication that I know bloat is not the problem. Therefor narrowing down the possibilities, and to direct guessing into another direction. > In follow-ups, please try to avoid hand-wavy terms like "forever"; > tell us whether you waited a minute, an hour, ten days, whatever > "forever" actually was, along with any observations on CPU and disk > usage during this time. You didn't show the active query in your > listings -- were those taking during the time the query was hanging? Forever means, what forever usually means - as long as I terminate the client. I took this hand-wavy term to make clear that my understanding of the problem is that its not a slowly running query.
Clemens Eisserer wrote: > If you feel personally attacked Far from it -- I just feel like my time is being wasted. If you provide the information I requested in the prior emails, I'll probably take another shot at helping. -Kevin