Thread: When does sequential performance matter in PG?

From:
henk de wit
Date:

Hi,

It is frequently said that for PostgreSQL the number 1 thing to pay attention to when increasing performance is the amount of IOPS a storage system is capable of. Now I wonder if there is any situation in which sequential IO performance comes into play. E.g. perhaps during a tablescan on a non-fragmented table, or during a backup or restore?

The reason I'm asking is that we're building a storage array and for some reason are unable to increase the number of random IOPS beyond a certain threshold when we add more controllers or more (SSD) disks to the system. However, the sequential performance keeps increasing when we do that.

Would this extra sequential performance be of any benefit to PG or would it just be wasted?

Kind regards


Express yourself instantly with MSN Messenger! MSN Messenger
From:
Matthew Wakeling
Date:

On Tue, 10 Mar 2009, henk de wit wrote:
> It is frequently said that for PostgreSQL the number 1 thing to pay
> attention to when increasing performance is the amount of IOPS a storage
> system is capable of. Now I wonder if there is any situation in which
> sequential IO performance comes into play. E.g. perhaps during a
> tablescan on a non-fragmented table, or during a backup or restore?

Yes, up to a point. That point is when a single CPU can no longer handle
the sequential transfer rate. Yes, there are some parallel restore
possibilities which will get you further. Generally it only takes a few
discs to max out a single CPU though.

> The reason I'm asking is that we're building a storage array and for
> some reason are unable to increase the number of random IOPS beyond a
> certain threshold when we add more controllers or more (SSD) disks to
> the system. However, the sequential performance keeps increasing when we
> do that.

Are you sure you're measuring the maximum IOPS, rather than measuring the
IOPS capable in a single thread? The advantage of having more discs is
that you can perform more operations in parallel, so if you have lots of
simultaneous requests they can be spread over the disc array.

Matthew

--
 [About NP-completeness] These are the problems that make efficient use of
 the Fairy Godmother.                    -- Computer Science Lecturer

From:
henk de wit
Date:

Hi,

> On Tue, 10 Mar 2009, henk de wit wrote:
> > Now I wonder if there is any situation in which
> > sequential IO performance comes into play. E.g. perhaps during a
> > tablescan on a non-fragmented table, or during a backup or restore?
>
> Yes, up to a point. That point is when a single CPU can no longer handle
> the sequential transfer rate. Yes, there are some parallel restore
> possibilities which will get you further. Generally it only takes a few
> discs to max out a single CPU though.

I see, but I take it you are only referring to a backup or a restore? It's of course unlikely (even highly undesirable) that multiple processes are doing a backup, but it doesn't seem unlikely that multiple queries are doing a table scan ;)

> Are you sure you're measuring the maximum IOPS, rather than measuring the
> IOPS capable in a single thread?

I'm pretty sure we're not testing the number of IOPS for a single thread, as we're testing with 1, 10 and 40 threads. There is a significant (2x) increase in the total number of IOPS when going from 1 to 10 threads, but no increase when going from 10 to 40 threads. You can read more details about the setup I used and the problems I ran into here: http://www.xtremesystems.org/forums/showthread.php?p=3707365

Henk


Express yourself instantly with MSN Messenger! MSN Messenger
From:
Greg Smith
Date:

On Tue, 10 Mar 2009, henk de wit wrote:

> Now I wonder if there is any situation in which sequential IO
> performance comes into play. E.g. perhaps during a tablescan on a
> non-fragmented table, or during a backup or restore?

If you're doing a sequential scan of data that was loaded in a fairly
large batch, you can approach reading at the sequential I/O rate of the
drives.  Doing a backup using pg_dump is one situation where you might
actually do that.

Unless your disk performance is really weak, restores in PostgreSQL are
usually CPU bound right now.  There's a new parallel restore feature in
8.4 that may make sequential write performance a more likely upper bound
to run into, assuming your table structure is amenable to loading in
parallel (situations with just one giant table won't benefit as much).

--
* Greg Smith  http://www.gregsmith.com Baltimore, MD

From:
Scott Carey
Date:




On 3/10/09 6:28 AM, "Matthew Wakeling" <> wrote:

On Tue, 10 Mar 2009, henk de wit wrote:
> It is frequently said that for PostgreSQL the number 1 thing to pay
> attention to when increasing performance is the amount of IOPS a storage
> system is capable of. Now I wonder if there is any situation in which
> sequential IO performance comes into play. E.g. perhaps during a
> tablescan on a non-fragmented table, or during a backup or restore?

Yes, up to a point. That point is when a single CPU can no longer handle
the sequential transfer rate. Yes, there are some parallel restore
possibilities which will get you further. Generally it only takes a few
discs to max out a single CPU though.

This is not true if  you have concurrent sequential scans.  Then an array can be tuned for total throughput with concurrent access.  Single thread sequential measurements are similarly useful to single thread random i/o measurement — not really a test like the DB will act, but useful as a starting point for tuning.
I’m past the point where a single thread can not keep up with the disk on a sequential scan.  For the most simple select * queries, this is ~ 800MB/sec for me.
For any queries those with more complicated processing/filtering, its much less, usually 400MB/sec is a pretty good rate for a single thread.  
However our raw array does about 1200MB/sec, and can get 75% efficiency on this or so with between 4 and 8 concurrent sequential scans.  It took some significant tuning and testing time to make sure this worked, and to balance that with random i/o requirements.

Furthermore, higher sequential rates help your random IOPS when you have sequential access concurrent with random access.  You can tune OS parameters (readahead in linux, I/O scheduler types) to bias throughput or latency towards random iops throughput or sequential MB/sec throughput.  Having faster sequential disk access means less % of time doing sequential I/O, meaning more time left for random I/O.  It only goes so far, but it does help with mixed loads.  

Overall, it depends a lot on how important sequential scans are to your use case.