Re: a heavy duty operation on an "unused" table kills my server - Mailing list pgsql-performance

From Craig Ringer
Subject Re: a heavy duty operation on an "unused" table kills my server
Date
Msg-id 4B4D61DA.7080008@postnewspapers.com.au
Whole thread Raw
In response to Re: a heavy duty operation on an "unused" table kills my server  (Eduardo Piombino <drakorg@gmail.com>)
Responses Re: a heavy duty operation on an "unused" table kills my server  (Eduardo Piombino <drakorg@gmail.com>)
List pgsql-performance
On 13/01/2010 1:47 PM, Eduardo Piombino wrote:
> I'm sorry.
>
> The server is a production server HP Proliant, I don't remember the
> exact model, but the key features were:
> 4 cores, over 2GHz each (I'm sorry I don't remember the actual specs), I
> think it had 16G of RAM (if that is possible?)
> It has two 320G disks in RAID (mirrored).

Plain 'ol SATA disks in RAID-1?

Hardware RAID (and if so, controller model)? With battery backup? Write
cache on or off?

Or software RAID? If so, Windows build-in sw raid, or some vendor's
fakeraid (Highpoint, Promise, Adaptec, etc) ?

Anyway, with two disks in RAID-1 I'm not surprised you're seeing some
performance issues with heavy writes, especially since it seems unlikely
that you have a BBU hardware RAID controller. In RAID-1 a write must hit
both disks, so a 1Mb write effectively costs twice as much as a 1Mb
read. Since many controllers try for high throughput (because it looks
good in benchmarks) at the expense of latency they also tend to try to
batch writes into long blocks, which keeps the disks busy in extended
bursts. That slaughters read latencies.

I had this sort of issue with a 3Ware 8500-8, and landed up modifying
and recompiling the driver to reduce its built-in queue depth. I also
increased readahead. It was still pretty awful as I was working with
RAID 5 on SATA disks, but it made a big difference and more importantly
meant that my Linux server was able to honour `ionice' priorities and
feed more important requests to the controller first.

On windows, I really don't know what to do about it beyond getting a
better I/O subsystem. Google may help - look into I/O priorities, queue
depths, reducing read latencies, etc.

> I don't even have the emails with the specs here, but I can give you the
> exact configuration by tomorrow.
>
> Operating system: Windows 2003 server, with latest patches.
> Postgres version: 8.2.4, with all defaults, except DateStyle and TimeZone.

Urk. 8.2 ?

Pg on Windows improves a lot with each release, and that's an old buggy
version of 8.2 at that. Looking into an upgrade would be a really,
REALLY good idea.

> Please try interpreting again my original mail, considering that when I
> said "high CPU usage" It might very well be "high IO usage".
>
> The final effect was that the server went non-responsive, for all
> matters, not even the TaskManager would come up when i hit CTRL-ALT-DEL,
> and of course, every client would suffer horrific (+20 secs) for the
> simplest operations like SELECT NOW();

That sounds a LOT like horrible read latencies caused by total I/O
overload. It could also be running out of memory and swapping heavily,
so do keep an eye out for that, but I wouldn't expect to see that with
an ALTER TABLE - especially on a 16GB server.

>     / My question then is: is there a way to limit the CPU* or **IO
>     USAGE* assigned to a specific connection?/

In win32 you can set CPU priorities manually in Task Manager, but only
once you already know the process ID of the Pg backend that's going to
be hammering the machine. Not helpful.

I don't know of any way to do per-process I/O priorities in Win32, but I
only use win32 reluctantly and don't use it for anything I care about
(like a production Pg server) so I'm far from a definitive source.

--
Craig Ringer

pgsql-performance by date:

Previous
From: Eduardo Piombino
Date:
Subject: Re: a heavy duty operation on an "unused" table kills my server
Next
From: Jaime Casanova
Date:
Subject: Re: [PERFORMANCE] work_mem vs temp files issue