Re: a heavy duty operation on an "unused" table kills my server - Mailing list pgsql-performance

From Craig Ringer
Subject Re: a heavy duty operation on an "unused" table kills my server
Date
Msg-id 4B4DA763.1030604@postnewspapers.com.au
Whole thread Raw
In response to Re: a heavy duty operation on an "unused" table kills my server  (Eduardo Piombino <drakorg@gmail.com>)
List pgsql-performance
On 13/01/2010 3:03 PM, Eduardo Piombino wrote:
> One last question, this IO issue I'm facing, do you think it is just a
> matter of RAID configuration speed, or a matter of queue gluttony (and
> not leaving time for other processes to get into the IO queue in a
> reasonable time)?

Hard to say with the data provided. It's not *just* a matter of a slow
array, but that might contribute.

Specifically, though, by "slow array" in this case I'm looking at
latency rather than throughput, particularly read latency under heavy
write load. Simple write throughput isn't really the issue, though bad
write throughput can make it fall apart under a lighter load than it
would otherwise.

High read latencies may not be caused by deep queuing, though that's one
possible cause. A controller that prioritizes batching sequential writes
efficiently over serving random reads would cause it too - though
reducing its queue depth so it can't see as many writes to batch would help.

Let me stress, again, that if you have a decent RAID controller with a
battery backed cache unit you can enable write caching and most of these
issues just go away. Using an array format with better read/write
concurrency, like RAID 10, may help as well.

Honestly, though, at this point you need to collect data on what the
system is actually doing, what's slowing it down and where. *then* look
into how to address it. I can't advise you much on that as you're using
Windows, but there must be lots of info on optimising windows I/O
latencies and throughput on the 'net...

> Because if it was just a matter of speed, ok, with my actual RAID
> configuration lets say it takes 10 minutes to process the ALTER TABLE
> (leaving no space to other IOs until the ALTER TABLE is done), lets say
> then i put the fastest possible RAID setup, or even remove RAID for the
> sake of speed, and it completes in lets say again, 10 seconds (an unreal
> assumption). But if my table now grows 60 times, I would be facing the
> very same problem again, even with the best RAID configuration.

Only if the issue is one of pure write throughput. I don't think it is.
You don't care how long the ALTER takes, only how much it impacts other
users. Reducing the impact on other users so your ALTER can complete in
its own time without stamping all over other work is the idea.

> The problem would seem to be in the way the OS (or hardware, or someone
> else, or all of them) is/are inserting the IO requests into the queue.

It *might* be. There's just not enough information to tell that yet.
You'll need to do quite a bit more monitoring. I don't have the
expertise to advise you on what to do and how to do it under Windows.

> What can I do to control the order in which these IO requests are
> finally entered into the queue?

No idea. You probably need to look into I/O priorities on Windows.

Ideally you shouldn't have to, though. If you can keep read latencies at
sane levels under high write load on your array, you don't *need* to
mess with this.

Note that I'm still guessing about the issue being high read latencies
under write load. It fits what you describe, but there isn't enough data
to be sure, and I don't know how to collect it on Windows.

> What cards do I have to manipulate the order the IO requests are entered
> into the "queue"?
> Can I disable this queue?
> Should I turn disk's IO operation caches off?
> Not use some specific disk/RAID  vendor, for instance?

Don't know. Contact your RAID card tech support, Google, search MSDN, etc.

--
Craig Ringer

pgsql-performance by date:

Previous
From: Eduardo Piombino
Date:
Subject: Re: a heavy duty operation on an "unused" table kills my server
Next
From: Euler Taveira de Oliveira
Date:
Subject: Re: a heavy duty operation on an "unused" table kills my server