Thread: Hang in semop() in 'waiting' backends

Hang in semop() in 'waiting' backends

From
Robert Fielding
Date:
Hi,

I'm afraid information is a little sketchy with this one, but I'll
present to you quickly what's going on as best as I can. Your
assistance is appreciated:

platform: Linux 2.4.20 #19 SMP RedHat 9 base, PG 7.4.5 (also 7.4.2)
problem: postgres backends waiting forever

strace shows:

semop(19529737, 0xbfffe020, 1

SELECT domain_add ('robf', 'testdomain123456.uk', '212.69.194.46' )  ;\
<hang forever>

domain_add is a very complex function which calls lots of the SELECT
functions and performs string functions on the input.

I've tried reindexing both domains_pkey and domains_vs_idx indexes
attached to the domains table with no effect. I've also restarted the
service several times. This only results as a temporary fix in clearing
the waiting processes, but not adding domains.

Instead I have suspended service, dumped and recreated the DNS
database. I have also upgraded to PG 7.4.5 (I know not the latest, but
this machine doesn't have external access). The dump and restore
resolved the issue temporarily.

ipcs shows all shared memory use clears when the service is stopped,
and resumes when restarted.

Individual calls to domain_add() work; multiple calls within a
transaction block on the first call. The individual calls from a psql
shell also fail after this. The only fix is to kill the backends which
are waiting.

This is a production system which has suddenly gone wrong. Last schema
update was posted to CVS 2005/03/21 which included UPDATE and DELETE
constraints and foreign key updates; It seems strange updates from 23rd
of last month would start causing problems now. I will need to talk to
the developer responsible if any updates haven't been posted to CVS -
however this is not our practice. Given the hang, could it still be
described as a bug anyway?

Best regards,

Rob Fielding
Designer Servers
Business Serve plc


Re: Hang in semop() in 'waiting' backends

From
Tom Lane
Date:
Robert Fielding <rob@dsvr.net> writes:
> strace shows:
> semop(19529737, 0xbfffe020, 1

Can you attach to a few of the stuck processes with gdb and get stack
traces?  Are they all doing exactly the same query when they hang?

            regards, tom lane