Re: a heavy duty operation on an "unused" table kills my server - Mailing list pgsql-performance

From Eduardo Piombino
Subject Re: a heavy duty operation on an "unused" table kills my server
Date
Msg-id e24c1d9d1001122147l1c0e2111kce25dc2c7f393087@mail.gmail.com
Whole thread Raw
In response to Re: a heavy duty operation on an "unused" table kills my server  (Craig James <craig_james@emolecules.com>)
Responses Re: a heavy duty operation on an "unused" table kills my server
Re: a heavy duty operation on an "unused" table kills my server
Re: a heavy duty operation on an "unused" table kills my server
List pgsql-performance
I'm sorry.

The server is a production server HP Proliant, I don't remember the exact model, but the key features were:
4 cores, over 2GHz each (I'm sorry I don't remember the actual specs), I think it had 16G of RAM (if that is possible?)
It has two 320G disks in RAID (mirrored).

I don't even have the emails with the specs here, but I can give you the exact configuration by tomorrow.

Operating system: Windows 2003 server, with latest patches.
Postgres version: 8.2.4, with all defaults, except DateStyle and TimeZone.

At any given time, the server is on 0% CPU load, with peaks of 1%, 2%, max. In normal operation.

I've been digging a little in the archives, and one thing that it helped me come up with, is that I don't really remember seeing high CPU usage (fact that surprised me, but i do remember seeing high IO activity). I'm sorry, its pretty late here.
I know this single statement is enough to almost change everything I've just asked.
Please try interpreting again my original mail, considering that when I said "high CPU usage" It might very well be "high IO usage".

The final effect was that the server went non-responsive, for all matters, not even the TaskManager would come up when i hit CTRL-ALT-DEL, and of course, every client would suffer horrific (+20 secs) for the simplest operations like SELECT NOW();

I've just made a little modification to my original questions, to extend to the possibility of a IO usage issue, instead of just CPU.


Bottom line is, I can't seem to do any heavy processing on the database (or any operation that would require the server to enter into high CPU usage or IO USAGE), and still expect the server to behave normally. Whatever heavy duty operation, DDL, DML, on whatever table (related, or unrelated), on whatever thread, would tear down my servers integrity.

My question then is: is there a way to limit the CPU or IO USAGE assigned to a specific connection?
I mean, I don't care if my ALTER TABLE takes 4 days instead of 4 hours.

Something like:
pg_set_max_cpu _or_io_usage(2/100);


On Wed, Jan 13, 2010 at 2:14 AM, Craig James <craig_james@emolecules.com> wrote:
Eduardo Piombino wrote:
Hi list, I'm having a problem when dealing with operations that asks too much CPU from the server.
The scenario is this:

A nice description below, but ... you give no information about your system: number of CPUs, disk types and configuration, how much memory, what have you changed in your Postgres configuration?  And what operating system, what version of Postgres, etc., etc.  The more information you give, the better the answer.

If you're operating on a single disk with a tiny amount of memory, and old, misconfigured Postgres on a laptop computer, that's a whole different problem than if you're on a big sytem with 16 CPUs and a huge RAID 1+0 with battery-backed cache.
 
Craig



I have a multithreaded server, each thread with its own connection to the database. Everything is working fine, actually great, actually outstandingly, in normal operation.

I've a table named "a" with 1.8 million records, and growing, but I'm ok with it, at least for the moment. Maybe in the near future we will cut it down, backup old data, and free it up. But this is not the issue, as I said, everything is working great. I have a cpl of indexes to help some queries, and that's it.

Now my problem started when I tried to do some model refactoring on this production table.

First I tried a dumb approach.
I connected from pgadmin, opened a new session.
I tried an ALTER TABLE on this table just to turn a char(255) field into char(250), and it locked up my system.

No surprise, since I had many threads waiting for this alter table to finish. What I did not foresee was that this alter table would take up so much time. Ok, my fault, for not having calculated the time that it would take the ALTER TABLE to complete.

Now, with this experience, I tried a simple workaround.
Created an empty version of "a" named "a_empty", identical in every sense.
renamed "a" to "a_full", and "a_empty" to "a". This procedure costed me like 0 seconds of downtime, and everything kept working smoothly. Maybe a cpl of operations could have failed if they tried to write in the very second that there was actually no table named "a", but since the operation was transactional, the worst scenario was that if the operation should have failed, the client application would just inform of the error and ask the user for a retry. No big deal.

Now, this table, that is totally unattached to the system in every way (no one references this table, its like a dumpster for old records), is not begin accessed by no other thread in the system, so an ALTER table on it, to turn a char(255) to char(250), should have no effect on the system.

So, with this in mind, I tried the ALTER TABLE this time on the "a_full" (totally unrelated) table.
The system went non-responsive again, and this time it had nothing to do with threads waiting for the alter table to complete. The pgAdmin GUI went non-responsive, as well as the application's server GUI, whose threads kept working on the background, but starting to take more and more time for every clients request (up to 25 seconds, which are just ridiculous and completely unacceptable in normal conditions).

This resulted in my client applications to start disconnecting after their operations failed due to timeout, and the system basically went down again, from a users point of view.

This time, since I saw no relation between my operation on a totally unrelated table, and the server BIG slowdown, I blamed the servers memory.

After some tests, I came up to the conclusion that any heavy duty operation on any thread (ALTER TABLE on 1.8 million records tables, updates on this table, or an infinite loop, just to make my point), would affect the whole server.

Bottom line is, I can't seem to do any heavy processing on the database (or any operation that would require the server to enter into high CPU usage or IO USAGE), and still expect the server to behave normally. Whatever heavy duty operation, DDL, DML, on whatever table (related, or unrelated), on whatever thread, would tear down my servers integrity.

My question then is: is there a way to limit the CPU or IO USAGE assigned to a specific connection?
I mean, I don't care if my ALTER TABLE takes 4 days instead of 4 hours.

Something like:
pg_set_max_cpu_or_io_usage(2/100);

and rest assured that no matter what that thread is asking the database to do, it just wont affect the other running threads. Obviosly, assuring that the process itself does not involve any locking of the other threads.

Is something like that possible?

Thanks in advance,
Eduardo.



pgsql-performance by date:

Previous
From: Craig Ringer
Date:
Subject: Re: a heavy duty operation on an "unused" table kills my server
Next
From: Craig Ringer
Date:
Subject: Re: a heavy duty operation on an "unused" table kills my server