Thread: Backend process that won't die

Backend process that won't die

From
Susan Cassidy
Date:

I have a couple of backend processes that are “stuck”, and do not respond to a pg_cancel_backend.  This is PostgreSQL 8.3.5.  The pg_cancel_backend returns true, but the process keeps running.  I have also done a “kill 12345” from the command-line, with no effect.

 

The processes are running a “select function_x” statement that normally takes a fraction of a second to run.

 

No locks are shown when I do:

select relname,pg_locks.* from pg_class,pg_locks where relfilenode=relation and not granted;

 

We had a database crash last week, and had to reindex a bunch of tables, but this function has been working for several days on the same tables that should be being used by the function_x function.

 

Any ideas on how to get the processes to go away?

 

They are eating cpu cycles, for no good reason:

postgres 28396 85.0  1.4 4420768 242224 ?      Ss   Sep03 3193:40 postgres: userxx dbname1 172.27.43.9(1160) SELECT

 

 

Thanks,

Susan

 

 

 

Re: Backend process that won't die

From
Tom Lane
Date:
Susan Cassidy <scassidy@edgewave.com> writes:
> I have a couple of backend processes that are "stuck", and do not respond to a pg_cancel_backend.  This is PostgreSQL
8.3.5. The pg_cancel_backend returns true, but the process keeps running.  I have also done a "kill 12345" from the
command-line,with no effect. 

> We had a database crash last week, and had to reindex a bunch of tables, but this function has been working for
severaldays on the same tables that should be being used by the function_x function. 

By "this function" you mean that the reindex is not finished, but
nonetheless you have got regular queries running with the corrupted
indexes?

> Any ideas on how to get the processes to go away?

It seems like a good bet that they're chasing circular links in the
corrupted indexes.  "kill -9" would get rid of them, but it would force
a database-wide restart, which would also take out your reindex process,
so maybe that wouldn't be a good idea.

If they're significantly interfering with the progress of the reindex
then maybe you should bite the bullet and kill them anyway.  Otherwise
I'd be inclined to let them go until you can afford a restart.

            regards, tom lane

Re: Backend process that won't die

From
Susan Cassidy
Date:
-----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Tuesday, September 06, 2011 9:57 AM
> To: Susan Cassidy
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Backend process that won't die

> Susan Cassidy <scassidy@edgewave.com> writes:
>> I have a couple of backend processes that are "stuck", and do not respond to a pg_cancel_backend.  This is
PostgreSQL8.3.5.  The pg_cancel_backend returns true, but the process keeps running.  I have also done a "kill 12345"
fromthe command-line, with no effect. 

>> We had a database crash last week, and had to reindex a bunch of tables, but this function has been working for
severaldays on the same tables that should be being used by the function_x function. 

> By "this function" you mean that the reindex is not finished, but
nonetheless you have got regular queries running with the corrupted
indexes?

No, the reindexes that I knew were needed have already been done.

> Any ideas on how to get the processes to go away?

> It seems like a good bet that they're chasing circular links in the
corrupted indexes.  "kill -9" would get rid of them, but it would force
a database-wide restart, which would also take out your reindex process,
so maybe that wouldn't be a good idea.

> If they're significantly interfering with the progress of the reindex
then maybe you should bite the bullet and kill them anyway.  Otherwise
I'd be inclined to let them go until you can afford a restart.

>            regards, tom lane

Without any error messages about indexes, which I have not seen lately, I have no idea which indexes still might need
rebuilding.

So, you think I should go ahead and kill -9 the "stuck" processes, and let the database restart?  It is a 2-system
cluster,with failover, so I'll let the IT guy handle that, I guess. 

Thanks,
Susan