Thread: Looking for bottleneck during load test

Looking for bottleneck during load test

From
"Hell, Robert"
Date:
We have a PostgreSQL 8.2.7 database (~230 GB) running on a machine with 8 Intel Xeon Cores and 32 GB RAM (64-bit Linux
2.6.18).Data is stored on an EMC² CLARiiON on RAID 1/0 (8 x 146 GB 15k rpm).  
When we do random I/O with a small test tool (reading random 8k blocks from big files in 200 threads) on the disk we
retrievedata with about 25 MB/s. 

For testing purpose a test set of about 700.000 queries (those were logged during a problem situation) are executed
againstthe database in 180 concurrent threads. 
Some of the queries are very small and fast - other ones read more than 50000 blocks. All queries are selects (using
cursors)- there is only read activity on the database. 

By setting tuple_fraction for cursors to 0.0 instead of 0.1
(http://archives.postgresql.org/pgsql-performance/2008-04/msg00018.php)we reduced reads during the test
(pg_statio_all_tables):
- 8.2.7
  reads from disk: 4.395.276, reads from cache: 471.575.925
- 8.2.7 cursor_tuple_fraction=0.0
  Reads from disk: 3.406.164, reads from cache: 37.924.625

But the duration of the test was only reduced by 18 % (from 110 minutes to 90 minutes).

When running the test with tuple_fraction=0.0 we observe the following on the server:
- avg read from disk is at 7 MB/s
- when we start the random I/O tool during the test we again read data with about 25 MB/s from disk (for me it seems
thatdisk isn't the bottleneck) 
- cpu time is divided between idle and iowait - user and system cpu are practically zero
- there are from 5000 to 10000 context switches per second

I can't see a bottleneck here. Does anyone has an explanation for that behavior?

Regards,
Robert


Re: Looking for bottleneck during load test

From
Ivan Voras
Date:
Hell, Robert wrote:
> We have a PostgreSQL 8.2.7 database (~230 GB) running on a machine with 8 Intel Xeon Cores and 32 GB RAM (64-bit
Linux2.6.18). Data is stored on an EMC² CLARiiON on RAID 1/0 (8 x 146 GB 15k rpm).  
> When we do random I/O with a small test tool (reading random 8k blocks from big files in 200 threads) on the disk we
retrievedata with about 25 MB/s. 

How do you test random IO? Do you use this utility:
http://arctic.org/~dean/randomio/ ?

If not, try using it, with same parameters. It might be that the latency
is destroying your performance. Do you use NFS or are you accessing the
storage as SAN?

Re: Looking for bottleneck during load test

From
"Hell, Robert"
Date:
I tried different other tools for random IO (including a self written one which does random lseek and read).

This tool, started during one of our tests, achieves 2 iops (8k each).
Started alone I get something about 1,500 iops with an avg latency of 100 ms.

We are using SAN (EMC CLARiiON CX 300) - are those ~7 MB/s really our bottleneck?
Any other tuning ideas?

Regards,
Robert

-----Original Message-----
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Ivan Voras
Sent: Montag, 07. April 2008 14:38
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Looking for bottleneck during load test

Hell, Robert wrote:
> We have a PostgreSQL 8.2.7 database (~230 GB) running on a machine with 8 Intel Xeon Cores and 32 GB RAM (64-bit
Linux2.6.18). Data is stored on an EMC² CLARiiON on RAID 1/0 (8 x 146 GB 15k rpm). 
 
> When we do random I/O with a small test tool (reading random 8k blocks from big files in 200 threads) on the disk we
retrievedata with about 25 MB/s.
 

How do you test random IO? Do you use this utility:
http://arctic.org/~dean/randomio/ ?

If not, try using it, with same parameters. It might be that the latency
is destroying your performance. Do you use NFS or are you accessing the
storage as SAN?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Looking for bottleneck during load test

From
Bill Moran
Date:
In response to "Hell, Robert" <Robert.Hell@fabasoft.com>:

> I tried different other tools for random IO (including a self written one which does random lseek and read).
>
> This tool, started during one of our tests, achieves 2 iops (8k each).
> Started alone I get something about 1,500 iops with an avg latency of 100 ms.
>
> We are using SAN (EMC CLARiiON CX 300) - are those ~7 MB/s really our bottleneck?
> Any other tuning ideas?

You know, with all the performance problems people have been bringing up
with regard to SANs, I'm putting SAN in the same category as RAID-5 ...


> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Ivan Voras
> Sent: Montag, 07. April 2008 14:38
> To: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Looking for bottleneck during load test
>
> Hell, Robert wrote:
> > We have a PostgreSQL 8.2.7 database (~230 GB) running on a machine with 8 Intel Xeon Cores and 32 GB RAM (64-bit
Linux2.6.18). Data is stored on an EMC² CLARiiON on RAID 1/0 (8 x 146 GB 15k rpm).  
> > When we do random I/O with a small test tool (reading random 8k blocks from big files in 200 threads) on the disk
weretrieve data with about 25 MB/s. 
>
> How do you test random IO? Do you use this utility:
> http://arctic.org/~dean/randomio/ ?
>
> If not, try using it, with same parameters. It might be that the latency
> is destroying your performance. Do you use NFS or are you accessing the
> storage as SAN?
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

****************************************************************
IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.
****************************************************************

Re: Looking for bottleneck during load test

From
Greg Smith
Date:
On Mon, 7 Apr 2008, Bill Moran wrote:

> You know, with all the performance problems people have been bringing up
> with regard to SANs, I'm putting SAN in the same category as RAID-5 ...

Not really fair, because unlike RAID5 it's at least *possible* to get good
write performance out of a SAN.  Just harder than most people think it is.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Looking for bottleneck during load test

From
Bill Moran
Date:
In response to Greg Smith <gsmith@gregsmith.com>:

> On Mon, 7 Apr 2008, Bill Moran wrote:
>
> > You know, with all the performance problems people have been bringing up
> > with regard to SANs, I'm putting SAN in the same category as RAID-5 ...
>
> Not really fair, because unlike RAID5 it's at least *possible* to get good
> write performance out of a SAN.  Just harder than most people think it is.

*shrug*  "in theory" it's possible to get good write performance out of
RAID-5 as well, with fast enough disks and enough cache ...

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: Looking for bottleneck during load test

From
Ivan Voras
Date:
Hell, Robert wrote:
> > I tried different other tools for random IO (including a self
written one which does random lseek and read).
> >
> > This tool, started during one of our tests, achieves 2 iops (8k each).
> > Started alone I get something about 1,500 iops with an avg latency
of 100 ms.

1500 iops looks about right for 4x2 RAID 10 volume. What's your worst
latency (as reported by the tool)? iowait is mostly seek time.

> > We are using SAN (EMC CLARiiON CX 300) - are those ~7 MB/s really
our bottleneck?

Depending on your access pattern to the database, it could be (if you
have lots of random IO, and 180 concurrent database threads can make any
IO random enough). Are your queries read-mostly or a mix?

> > Any other tuning ideas?

Only generic ones:

- Are your queries optimized, use indexes, etc.?
- Try PostgreSQL 8.3 - if you have sequential seeks it can in theory
make better use of data between connections.
- Do you have enough memory dedicated to data caches, both in PostgreSQL
and in the OS? (i.e. what is your shared_buffers setting?)
- If the SAN can configure parameters such as prefetch (pre-read) and
stripe size, try lowering them (should help if you have random IO).

Re: Looking for bottleneck during load test

From
"Hell, Robert"
Date:
Worst latency was at ~600 ms.

In this test case we have only selects - so it's read only.

I tried 8.3 - it's better there (10-15 %) - but IO rates stay the same.
I use 18 GB shared memory on a machine with 32 GB during the test, I think this shouldn't be a problem.

I did some further testing and I now really think that disk is the bottleneck here. 
I'm really surprised that those 7 MB/s in average are the maximum for that disks.

Any filesystem (ext3) settings that could help here?

Regards,
Robert


-----Original Message-----
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Ivan Voras
Sent: Dienstag, 08. April 2008 11:03
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Looking for bottleneck during load test

Hell, Robert wrote:
> > I tried different other tools for random IO (including a self
written one which does random lseek and read).
> >
> > This tool, started during one of our tests, achieves 2 iops (8k each).
> > Started alone I get something about 1,500 iops with an avg latency
of 100 ms.

1500 iops looks about right for 4x2 RAID 10 volume. What's your worst
latency (as reported by the tool)? iowait is mostly seek time.

> > We are using SAN (EMC CLARiiON CX 300) - are those ~7 MB/s really
our bottleneck?

Depending on your access pattern to the database, it could be (if you
have lots of random IO, and 180 concurrent database threads can make any
IO random enough). Are your queries read-mostly or a mix?

> > Any other tuning ideas?

Only generic ones:

- Are your queries optimized, use indexes, etc.?
- Try PostgreSQL 8.3 - if you have sequential seeks it can in theory
make better use of data between connections.
- Do you have enough memory dedicated to data caches, both in PostgreSQL
and in the OS? (i.e. what is your shared_buffers setting?)
- If the SAN can configure parameters such as prefetch (pre-read) and
stripe size, try lowering them (should help if you have random IO).


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Looking for bottleneck during load test

From
Ivan Voras
Date:
Hell, Robert wrote:
> I tried different other tools for random IO (including a self written one which does random lseek and read).
>
> This tool, started during one of our tests, achieves 2 iops (8k each).
> Started alone I get something about 1,500 iops with an avg latency of 100 ms.

1500 iops looks about right for 4x2 RAID 10 volume. What's your worst
latency (as reported by the tool)? iowait is mostly seek time.

> We are using SAN (EMC CLARiiON CX 300) - are those ~7 MB/s really our bottleneck?

Depending on your access pattern to the database, it could be (if you
have lots of random IO, and 180 concurrent database threads can make any
IO random enough). Are your queries read-mostly or a mix?

> Any other tuning ideas?

Only generic ones:

- Are your queries optimized, use indexes, etc.?
- Try PostgreSQL 8.3 - if you have sequential seeks it can in theory
make better use of data between connections.
- Do you have enough memory dedicated to data caches, both in PostgreSQL
and in the OS? (i.e. what is your shared_buffers setting?)
- If the SAN can configure parameters such as prefetch (pre-read) and
stripe size, try lowering them (should help if you have random IO).


Attachment