Re: INSERT/UPDATE waiting (another example) - Mailing list pgsql-sql

From Wayne Piekarski
Subject Re: INSERT/UPDATE waiting (another example)
Date
Msg-id 199905060331.NAA26678@helpdesk.senet.com.au
Whole thread Raw
Responses Re: [HACKERS] Re: INSERT/UPDATE waiting (another example)
List pgsql-sql
> Date: Tue,  4 May 1999 11:33:34 +0200 (CEST)
> From: Dirk Lutzebaeck <lutzeb@aeccom.com>
> Subject: INSERT/UPDATE waiting
> 
> Hello,
> 
> somehow the backend is hanging on my system (6.5beta1, Linux 2.2.6):
> 
> postgres 29957  0.0  1.5  4124  2048  ?  S    18:23   0:00 /usr/local/pgsql/bin/postgres localhost lutzeb cs UPDATE
waiting
 
> postgres 29980  0.0  1.6  4124  2064  ?  S    18:25   0:00 /usr/local/pgsql/bin/postgres localhost lutzeb cs UPDATE
waiting
 
> postgres 30005  0.0  1.6  4124  2088  ?  S    18:27   0:00 /usr/local/pgsql/bin/postgres localhost lutzeb cs UPDATE
waiting
 
> postgres 30012  0.0  2.1  4532  2696  ?  S    18:28   0:00 /usr/local/pgsql/bin/postgres localhost lutzeb cs INSERT
waiting
 
> postgres 30030  0.0  3.0  5780  3916  ?  S    18:28   0:00 /usr/local/pgsql/bin/postgres localhost lutzeb cs idle

 
> 
> [about 34 processes]
> 
> What is happening here? Can't find anything in the documentation.

Hi everyone,

We just deployed a large system a few weeks ago which involves using
PostgreSQL 6.4.2 and CGI based interfaces, and we usually have around
10-20 connections always running, the system is very busy during 9-5
hours. The machine is running FreeBSD 2.2.7 and has 256 mb of RAM,
lots of file descriptors.

We are experiencing exactly the same problem as above - during the day,
all of a sudden Postgres will completely jam up, with all processing in
one of the following states: (from ps -axwwwwww)

SELECT waiting
DELETE waiting
INSERT waiting
idle waiting
idle

Sometimes the ps output will also return postgres backends with garbage
high-ascii characters in their name, got no idea why here either....

Originally, we thought it was a bad query somewhere with lock statements
in the wrong order, causing the deadlock, and so we reduced one of the
constants compiled into the backend which controls how often deadlocks
were checked for and set it to something like ten seconds.

I then forced the database to go into a real deadlock by doing:

1:   BEGIN
2:   BEGIN
1:   LOCK table a
2:   LOCK table b
1:   LOCK table b
2:   LOCK table a

Ten seconds later, one is aborted due to the deadlock checking - this is
great, as everything clears up and continues no problems.

However, during operation, every so often for no apparent reason, all the
backends doing work all jam up with "waiting" in their name, otherwise
they are idle - and nothing happens. I let it sit for about 5 minutes one
time, and nothing happened. I had to kill everything so the staff could
actually use the system.

For a while, we went hunting through our code looking for improperly
ordered lock statements and things like that. We found one which was fixed
but the problem still happens - there may be more of these in the code,
there probably are in fact, but I'm under the impression the deadlock
checking is supposed to get around that and kill backends nicely to
resolve the conflict?

Also, one other thing we discovered was making the following deadlock:

S1: BEGIN
S1: LOCK table a
S2: SELECT * FROM a, b where a.id = b.id
S1: LOCK table b

If we did explain on the SELECT, and it chose to scan A first, it would work
but if we used an index or rearranged the select statement, and B was scanned first
we would get a deadlock, since the select couldn't complete. Now I would have
hoped that locks for the join would be acquired either all at once, or none
at all. I don't want to have to wrap lock statements around the select because
all I want to do is read some data, I don't want to make updates!




Anyways, back to the real problem, I enabled a constant in the code and
whenever the deadlock checking occurs during this lockup problem, and it
prints the following out to the log file: 

DEBUG:  DumpAllLocks: xidLook->tag.lock = NULL
DEBUG:  DumpAllLocks: xidLook->tag.lock = NULL
DEBUG:  DumpAllLocks: xidLook->tag.lock = NULL

Not sure what this means - I'd really like it to show what kind of locks
the postgres processes are waiting for, as it could help me resolve the
problem. Is there anything else I can compile in to do this? I notice
there are quite a few symbols you can define for locks.


So right now I am at a loss on what to do - whenever everything jams up, I
do a ps -axwwww and kill off all postgres processes which are idle
waiting, then SELECT waiting, and sometimes it clears up. Failing that
I then kill the ones making changes like delete, insert, update - although
I hate doing that because I found that when I went around ruthlessly
killing backends to resolve conflicts, tables got corrupted and we
experienced problems like BTP_CHAIN, and vacuums would start failing,
requiring us to dump the tables and reload, which was pretty bad :(

I've had other problems as well, but I'll save them for another email.

So if anyone can offer any advice, I'd be eternally grateful! Right now
I'm getting lots of flak from people, saying I should have used Oracle, or
mSQL, etc, but Postgres works great and allows me to embed code into the
backend, and all kinds of other cool features. I really am looking forward
to 6.5 as the MVCC stuff sounds great, but right now I need to get this
working reliably until then.

thanks,
Wayne

------------------------------------------------------------------------------
Wayne Piekarski                               Tel:     (08) 8221 5221
Research & Development Manager                Fax:     (08) 8221 5220
SE Network Access Pty Ltd                     Mob:     0407 395 889
222 Grote Street                              Email:   wayne@senet.com.au
Adelaide SA 5000                              WWW:     http://www.senet.com.au



pgsql-sql by date:

Previous
From: Michael J Davis
Date:
Subject: RE: [GENERAL] which guru know this ??
Next
From: Greg Frith
Date:
Subject: Re: [SQL] User and Groups