Thread: locked backend
Hi all, I've had today a strange lockup on our postgres data base. Postgres 8.0.3 Debian GNU/Linux 3.1 autovacuum set up The situation (diagnosed via pg_stat_activity): one table was locked by an update, a VACUUM ANALYZE was running for the same table (triggered by autovacuum), and a handful of inserts were waiting on the same table. After some checking around (see below for details), I decided to kill the UPDATE (it was holding up the inserts which means lost data for us and danger of connection starvation), and did it with "kill processid" from the command line, but no success: the backend didn't respond. Then I killed the VACUUM which exited promptly, the UPDATE was still blocked. Then I remembered there is a pg_cancel_backend function (seen it recently on this list), and tried that too, without success - the update was still blocked, and blocking all inserts. In this situation I decided to kill -9 the blocked backend, knowingly that this will basically restart the data base server, which did happen. Only it took a veeeery long time to come up again (~15-20 minutes), which I also find very strange... Now my actual question is: what can cause a backend to block so that it won't answer to kill or pg_cancel_backend ? Is there any other solution than kill -9 in these situations ? I'm pretty sure the update would never have finished, the box was mostly idle when I did all this, so I can exclude that the update was busy working. I actually have seen this a few times before, and had to do the same kill -9 to get back on track. Some details of the circumstances this happened: I'm attaching a file with some diagnostic output, together with the SQLs used to get them. I also checked ps on the command line to make sure the pg_stat_activity is not reporting phantom SQLs (as I had some occasions it did). Unfortunately the attached diagnostic is not complete, as I forgot to save it from the beginning, and some of it rolled out of my terminal's buffer... particularly, the VACUUM is not visible anymore, I already killed it by then. As you see, the update was the oldest query running (and for more than 3 hours too !!!), and when I looked first there was a VACUUM also running, being the immediate oldest entry after the UPDATE. The inserts came after it, and checking their locks, they were waiting for a lock on the "problem_table", I don't remember exactly what kind of lock. I have no clue as of what's happening here, so I would be grateful for any help in diagnosing this further... Maybe I should mention that with the same data base (but a different box !) I have already seen this behavior 1-2 times, and 2 times the system locked up completely (the machine was not answering even for ssh, but then after restart nothing was suspect, an extensive hardware test did not find anything either). TIA for any help. Cheers, Csaba.
Attachment
Csaba Nagy <nagy@ecircle-ag.com> writes: > The situation (diagnosed via pg_stat_activity): one table was locked by > an update, a VACUUM ANALYZE was running for the same table (triggered by > autovacuum), and a handful of inserts were waiting on the same table. Updates do not block inserts, and neither does vacuum, so there's something you're not telling us. In particular an UPDATE wouldn't take an ExclusiveLock on the table, so that lock must have come from some other operation in the same transaction. regards, tom lane
Csaba Nagy <nagy@ecircle-ag.com> writes: > The situation (diagnosed via pg_stat_activity): one table was locked by > an update, a VACUUM ANALYZE was running for the same table (triggered by > autovacuum), and a handful of inserts were waiting on the same table. > After some checking around (see below for details), I decided to kill > the UPDATE (it was holding up the inserts which means lost data for us > and danger of connection starvation), and did it with "kill processid" > from the command line, but no success: the backend didn't respond. Then > I killed the VACUUM which exited promptly, the UPDATE was still blocked. > Then I remembered there is a pg_cancel_backend function (seen it > recently on this list), and tried that too, without success - the update > was still blocked, and blocking all inserts. Are there any foreign key relationships involved here? Because barring foreign key constraints none of these should have blocked any of the others. The only thing that would block a plain VACUUM (as opposed to VACUUM FULL) would be if you had a transaction pending that had a schema change like ALTER TABLE. Or something that had done an explicit LOCK TABLE. And the only thing that would block the INSERTs are the above or if there's a foreign key relationship to another table and that record in the other is locked, from another INSERT that refers to it or from an UPDATE. The only way your explanation that the UPDATE was holding up the inserts makes sense is if the records you were UPDATEing were referred to by the records you were inserting in a foreign key reference. Even so it shouldn't have held up the VACUUM at all. -- greg
On Wed, 2005-11-16 at 19:41, Tom Lane wrote: > Csaba Nagy <nagy@ecircle-ag.com> writes: > > The situation (diagnosed via pg_stat_activity): one table was locked by > > an update, a VACUUM ANALYZE was running for the same table (triggered by > > autovacuum), and a handful of inserts were waiting on the same table. > > Updates do not block inserts, and neither does vacuum, so there's > something you're not telling us. In particular an UPDATE wouldn't > take an ExclusiveLock on the table, so that lock must have come from > some other operation in the same transaction. Well, if I'm not telling you something is because I don't know it myself :-) OK, that makes sense with something else done before blocking the inserts and not the update. In any case the transaction of the update was locking the rest, as nothing else was running at the moment I checked. BTW, is the "ExclusiveLock" a table lock ? From the documentation of "pg_locks" it is not completely clear (it refers to http://www.postgresql.org/docs/8.0/static/explicit-locking.html#LOCKING-TABLES which does not enumerates these names usedin pg_lock). I wonder what would take an exclusive lock on the table ? I would exclude any alter table, we don't do that from our application, and the other person who could have done an alter table beside me sits next to me and he didn't do it (the update's SQL comes from the application actually). There are no foreign keys on the table, just a primary key on a field populated from a sequence (by our application, not via a default clause). We do not lock the table explicitly. The only locking is done via a SELECT...FOR UPDATE, could that cause a table lock ? But whatever the cause of the lock would be, I still wonder why was the UPDATE hanging ? This table is a temporary table, it is regularly filled-emptied, and usually it is relatively small (max a few tens of thousands of rows), so an update running more than 3 hours is not kosher. The update is part of the emptying procedure actually. If it was some kind of deadlock, why was it not detected ? And why the backend didn't respond to the kill signal ? I'm shooting around in the dark, but I have to find out what happened, so I can avoid it next time... or at least be able to shut down efficiently a backend which blocks my server's activity... Thanks, Csaba.