On Aug 15, 2007, at 9:27 PM, Jon Jensen wrote:
> I've got a simple select query that runs every 10 minutes in order
> to update data in some external rrds (it lets us make pretty graphs
> and so forth). This has been working fine for months on end, when
> suddenly yesterday the badness happen. For some reason, this same
> query that normally takes a couple seconds has now been stuck
> running for over 24 hours, maxing the CPU and generally slowing
> other queries down.
>
> The external script that initiates the query has been restarted,
> and netstat no longer shows that connection. All subsequent calls
> of the same query are quick as usual, but the renegade process
> lingers on, unresponsive to signals. Some of the things I've tried
> so far (unsuccessfully):
>
> 1. I've tried killing the process using kill from the command-line
> (INT, TERM and HUP), as well as using pg_cancel_backend() via psql.
> 2. I've tried attaching gdb to the renegade process to see what
> it's doing, but that hangs, forcing me to kill gdb (no problems
> attaching to other postgres processes however).
>
> Any other ideas? I'd like to avoid doing a kill -9 if at all
> possible. The machine is debian (sarge) running postgres 8.1.
There's a lot of parts of the code that don't check for signals,
because normally they don't run for any real length of time... until
they do. :) The factorial calculation is an example that was recently
fixed. So it's possible that something in your query is in that same
condition. You may be stuck with a kill -9, but it would be good to
identify what part of the code is hung up so we can determine if it
makes sense to add signal handling.
--
Decibel!, aka Jim Nasby decibel@decibel.org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)