Re: a heavy duty operation on an "unused" table kills my server - Mailing list pgsql-performance

From Eduardo Piombino
Subject Re: a heavy duty operation on an "unused" table kills my server
Date
Msg-id e24c1d9d1001130954y3553f145raf2a0a367b8a5397@mail.gmail.com
Whole thread Raw
In response to Re: a heavy duty operation on an "unused" table kills my server  (Scott Marlowe <scott.marlowe@gmail.com>)
Responses Re: a heavy duty operation on an "unused" table kills my server  (Scott Marlowe <scott.marlowe@gmail.com>)
List pgsql-performance

OK, I'm not entirely sure this table is not still locking something
else.  If you make a copy by doing something like:

select * into test_table from a;

and then alter test_table do you still get the same problems?  If so,
then it is an IO issue, most likely.  If not, then there is some
client connection still referencing this table or something and that
could cause this type of behaviour as well.

I can guarantee you that the table is not being referenced by any other thread, table or process, and that it is totally unrelated to everything else in the system.

Its just a plain table, with 1.8 million records, that no thread knows it exists. It has no foreign keys that would allow thinking of a possible "lock" on the parent table, nor it is being referenced by any other table in the model. It has no triggers associated, and no indexes. It could very well even be on another database on the same physical server, and still do the same damage. I did not try this, but I'm pretty sure of the outcome. I would'nt like to bring the server down just to prove this, but I will do it if I find it necessary.

The only things that are common to this table and other tables in the system, as I see are:
RAM, IO, and CPU, at a very low level. One of these is being stressed out by the thread executing the ALTER, and the other threads (not just pgsql application threads, but system processes in general) suffer from the lack of this resource. All the previous discussions tend to induce that the resource we are talking about is IO.

The fact that the Task Manager does not come up, would also not be explained by a lock in a client thread.
Besides all that, all the client queries are NO WAIT, thus any lock would just return immediately, and no retry would be done until the response gets back to the user and the user confirms it. In that case, all the errors presented to the final users would be "The element is being processed some other place", as my default handler to pgsql error code "55P03", instead of the horrible "Operation timed out", that is what final users got during the huge slowdown/downtime.





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

pgsql-performance by date:

Previous
From: Robert Haas
Date:
Subject: Re: [PERFORMANCE] work_mem vs temp files issue
Next
From: Scott Marlowe
Date:
Subject: Re: a heavy duty operation on an "unused" table kills my server