Thread: High load average with PostgreSQL 7.4.2 on debian/ibm eserver.

High load average with PostgreSQL 7.4.2 on debian/ibm eserver.

From
eleven@ludojad.itpp.pl
Date:
Hello,

I'm using PostgreSQL 7.4.2 (package from backports.org)
on a Debian (woody) box. The machine is IBM eServer 345
with two 2.8 Xeon CPUs, it has 1024MB of RAM and
two 15k RPM SCSI disks running in hardware RAID1, which
is provided by the onboard LSI Logic controller (LSI53C1030).

The database consists of two rather large tables
(currently about 15 million rows in one table
and about 5 million in the other one). Both tables
have 5 indexes (4 btree/1 hash).
Application running on the server INSERTs a lot of
stuff to the tables (which is not the target use of
the DB, it'll add data periodically, about
300 rows per 10 minutes). Queries (SELECTs) run perfectly
fine on the database, thanks to the indexes we have
here probably.

Performance issue, I'm experiencing here, is somewhat
weird - server gets high average load (from 5 up to 15,
8 on average). Standard performance monitoring
utilities (like top) show that CPUs are not loaded
(below 20%, often near zero).
With kernel 2.6.x which I was using earlier,
top showed very high "wa" values (which indicate I/O waiting, AFAIK).
I've googled some issues with 2.6 kernels and LSI Logic controllers
running RAID, so I've downgraded the kernel to 2.4.26.
The machine started to behave a bit better, but still high
load states look weird. Unfortunately, top with 2.4 kernels
does not show "wa" column, so I can't be sure if the load is
caused by waiting for disks, but high idle values and high average
load would suggest it. With kernel 2.6 swap was almost always 100% free,
with 2.4.26 Linux eats below 5 megabytes of swapspace.
PostgreSQL is running with shared_mem set to 48000,
sort_mem = 4096, fsync off.

Whole config is available here:
http://ludojad.itpp.pl/~eleven/pg-high-load.conf

I've also made some iostat report (using iostat 3 1000 as
suggested in one of the posts):
http://ludojad.itpp.pl/~eleven/iostat.log

Any solutions I should consider?
I'd be grateful getting some hints on this.

--
11.

On Tue, 29 Jun 2004 17:55:37 +0200, eleven@ludojad.itpp.pl
<eleven@ludojad.itpp.pl> wrote:

> Performance issue, I'm experiencing here, is somewhat
> weird - server gets high average load (from 5 up to 15,
> 8 on average). Standard performance monitoring
> utilities (like top) show that CPUs are not loaded
> (below 20%, often near zero).

So ... you never actually say what the performance issue you
experience is.  Having a high load average is not necessarily a
performance issue.

What is it that you want to fix?

Re: High load average with PostgreSQL 7.4.2 on debian/ibm

From
Bill Montgomery
Date:
eleven@ludojad.itpp.pl wrote:

> <>I'm using PostgreSQL 7.4.2 (package from backports.org)
> on a Debian (woody) box. The machine is IBM eServer 345
> with two 2.8 Xeon CPUs, it has 1024MB of RAM and
> two 15k RPM SCSI disks running in hardware RAID1, which
> is provided by the onboard LSI Logic controller (LSI53C1030).

<snip>

> <>With kernel 2.6.x which I was using earlier,
> top showed very high "wa" values (which indicate I/O waiting, AFAIK)


It sounds like you are very much bound by disk I/O. Your iostat output
indicates a good amount of I/O going on--I bet an iostat -x /dev/sdX
would show very high await times (time in ms before an IO request to the
device is serviced).

If your RAID controller has a battery-backed cache, check that you have
write-back (as opposed to write-through) enabled. This will cause the
controller to report data written only to RAID cache and not yet flushed
to disk as sync'd. You can experience large gains in write performance
this way.

If write-back is already enabled, or enabling it does not give a large
enough performance boost, you may need to buy more disks. In general, if
you have the budget for lots of disks, RAID 10 is the best you can do
performance-wise; if your budget for disks is limited, RAID 5 is the
next best thing. Also, you will get more bang for your buck with a
larger number of 10k disks than a smaller number of 15k disks.

Good luck,

Bill Montgomery

Re: High load average with PostgreSQL 7.4.2 on debian/ibm eserver.

From
eleven@ludojad.itpp.pl
Date:
On Tue, Jun 29, 2004 at 09:17:36AM -0700, Marc wrote:

> > Performance issue, I'm experiencing here, is somewhat
> > weird - server gets high average load (from 5 up to 15,
> > 8 on average). Standard performance monitoring
> > utilities (like top) show that CPUs are not loaded
> > (below 20%, often near zero).
> So ... you never actually say what the performance issue you
> experience is.  Having a high load average is not necessarily a
> performance issue.

Well, if the server's CPUs are idle and the machine
is starting to hog itself, one can
suspect something bad going on.

> What is it that you want to fix?

Basically, I'm wondering if I'm already on the edge of
performance capabilities of this machine/configuration, or maybe
there's some abnormal behaviour happening (which
could be noticed by somebody from this mailing list, hopefully).

In particular - could someone tell me if those iostat
values can tell if I'm close to upper performance boundary
of fast SCSI (Ultra 320, 15k RPM) disks?

--
11.

Re: High load average with PostgreSQL 7.4.2 on

From
"Scott Marlowe"
Date:
On Tue, 2004-06-29 at 09:55, eleven@ludojad.itpp.pl wrote:
> Hello,
>
> I'm using PostgreSQL 7.4.2 (package from backports.org)
> on a Debian (woody) box. The machine is IBM eServer 345
> with two 2.8 Xeon CPUs, it has 1024MB of RAM and
> two 15k RPM SCSI disks running in hardware RAID1, which
> is provided by the onboard LSI Logic controller (LSI53C1030).
>
> The database consists of two rather large tables
> (currently about 15 million rows in one table
> and about 5 million in the other one). Both tables
> have 5 indexes (4 btree/1 hash).
> Application running on the server INSERTs a lot of
> stuff to the tables (which is not the target use of
> the DB, it'll add data periodically, about
> 300 rows per 10 minutes). Queries (SELECTs) run perfectly
> fine on the database, thanks to the indexes we have
> here probably.
>
> Performance issue, I'm experiencing here, is somewhat
> weird - server gets high average load (from 5 up to 15,
> 8 on average). Standard performance monitoring
> utilities (like top) show that CPUs are not loaded
> (below 20%, often near zero).

Anytime you have high load but low CPU utilization, you usually have an
I/O bound system.

Ad disks to your RAID (big RAID 5 or RAID 1+0) and make sure you have
battery backed cache set to write back.  also, put as memory as you can
in the machine.  Jumping up to 2 Gigs may help quite a bit as well.


Re: High load average with PostgreSQL 7.4.2 on debian/ibm eserver.

From
Josh Berkus
Date:
Eleven,

> In particular - could someone tell me if those iostat
> values can tell if I'm close to upper performance boundary
> of fast SCSI (Ultra 320, 15k RPM) disks?

It's quite possible that you need to improve your disk array; certainly I
would have spec'd a lot more disk than you're using (like raid 0+1 with 6
disks or RAID 5 with seven disks).

However, there's the other end as well; it's quite possible that your queries
are doing seq scans and other disk-intensive operations that could be
avoided.   Have you analyed this at all?

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: High load average with PostgreSQL 7.4.2 on debian/ibm eserver.

From
Gaetano Mendola
Date:
eleven@ludojad.itpp.pl wrote:


> Whole config is available here:
> http://ludojad.itpp.pl/~eleven/pg-high-load.conf

effective_cache_size = 4000    # typically 8KB each
#random_page_cost = 4        # units are one sequential page fetch cost
#cpu_tuple_cost = 0.01        # (same)
#cpu_index_tuple_cost = 0.001    # (same)
#cpu_operator_cost = 0.0025    # (same)


These values are too higher for your hardware, try to execute the
explain analyze for the queries that are running on your box and
repeat it lowering these values, I bet postgres is running seq scan
instead of an index scan.

These are the value that I use for a configuration closer to your:


effective_cache_size = 20000
random_page_cost = 2.0
cpu_tuple_cost = 0.005
cpu_index_tuple_cost = 0.0005
cpu_operator_cost = 0.0025


last question, do you use the autovacuum daemon ?
If no => you have to use it
If yes => did you apply the patch that will not fail with
           big tables like yours ?


if you can post the autovacuum daemon log ( last lines ).



Regards
Gaetano Mendola