Thread: [PERFORMANCE] expanding to SAN: which portion best to move

[PERFORMANCE] expanding to SAN: which portion best to move

From
Willy-Bas Loos
Date:
Hi,

Our database has gotten rather large and we are running out of disk space.
our disks are 15K rpm SAS disks in RAID 10.

We are going to rent some space on a FibreChannel SAN.
That gives us the opportunity to separate the data and the indexes.
Now i thought it would be best to move the indexes to the SAN and leave the data on the disks, since the disks are better at sequential I/O and the SAN will have lots of random I/O since there are lots of users on it.

Is that a wise thing to do?

Cheers,

WBL

--
"Patriotism is the conviction that your country is superior to all others because you were born in it." -- George Bernard Shaw

Re: [PERFORMANCE] expanding to SAN: which portion best to move

From
Jesper Krogh
Date:
On 2011-05-03 17:52, Willy-Bas Loos wrote:
> Our database has gotten rather large and we are running out of disk space.
> our disks are 15K rpm SAS disks in RAID 10.
>
> We are going to rent some space on a FibreChannel SAN.
> That gives us the opportunity to separate the data and the indexes.
> Now i thought it would be best to move the indexes to the SAN and leave the
> data on the disks, since the disks are better at sequential I/O and the SAN
> will have lots of random I/O since there are lots of users on it.
>
> Is that a wise thing to do?

If you're satisfied with the current performance then it should be safe
to keep the indices and move the data, the risk of the SAN performing
worse on sequential I/O is not that high. But without testing and
knowledge about the SAN then it is hard to say if what you currently
have is better or worse than the SAN. The vendor may have a "way better
san",
but is may also be shared among 200 other hosts connected over iSCSI or FC
so your share may be even worse than what you currently have.

Without insight and testing is it hard to guess. I've pretty much come
to the conclusion of going the DAS way every time, but it all depends on
what your end looks like.

--
Jesper

Re: [PERFORMANCE] expanding to SAN: which portion best to move

From
Willy-Bas Loos
Date:
are you saying that, generally speaking, moving the data would be better unless the SAN performs worse than the disks?
besides your point that it depends on what our end looks like i mean.
(and what do you mean by "the DAS way", sry no native speaker)

cheers,

wbl

On Wed, May 4, 2011 at 6:43 AM, Jesper Krogh <jesper@krogh.cc> wrote:

If you're satisfied with the current performance then it should be safe
to keep the indices and move the data, the risk of the SAN performing
worse on sequential I/O is not that high. But without testing and
knowledge about the SAN then it is hard to say if what you currently
have is better or worse than the SAN. The vendor may have a "way better san",
but is may also be shared among 200 other hosts connected over iSCSI or FC
so your share may be even worse than what you currently have.

Without insight and testing is it hard to guess. I've pretty much come
to the conclusion of going the DAS way every time, but it all depends on
what your end looks like.

--
Jesper



--
"Patriotism is the conviction that your country is superior to all others because you were born in it." -- George Bernard Shaw

Re: [PERFORMANCE] expanding to SAN: which portion best to move

From
Jesper Krogh
Date:
On 2011-05-04 07:25, Willy-Bas Loos wrote:
> are you saying that, generally speaking, moving the data would be better
> unless the SAN performs worse than the disks?
It was more, "given all the incertainties, that seems like the least
risky".
The SAN might actually be less well performing than what you currently
have, you dont know yet I guess?

> besides your point that it depends on what our end looks like i mean.
> (and what do you mean by "the DAS way", sry no native speaker)
DAS way => A disk array where the database has sole access
to the hardware (not shared among other systems). Examples
are Dell MD1200/1220 or similary.

--
Jesper


Re: [PERFORMANCE] expanding to SAN: which portion best to move

From
Willy-Bas Loos
Date:
I'm asking them for (real) benchmarks, thanks for the advice. (fio is not available for us now to do it myself, grmbl)

It just occurred to me that it is not necessarily the case that reading the indexes causes a lot of random I/O (on the index itself).
I mean, maybe the index is generally read sequentially and then, when retrieving the data, there is a lot of random I/O.

if it's a long story, any tips for info about this (book or web site)?

cheers,

wbl

On Wed, May 4, 2011 at 7:33 AM, Jesper Krogh <jesper@krogh.cc> wrote:
On 2011-05-04 07:25, Willy-Bas Loos wrote:
are you saying that, generally speaking, moving the data would be better
unless the SAN performs worse than the disks?
It was more, "given all the incertainties, that seems like the least risky".
The SAN might actually be less well performing than what you currently
have, you dont know yet I guess?


besides your point that it depends on what our end looks like i mean.
(and what do you mean by "the DAS way", sry no native speaker)
DAS way => A disk array where the database has sole access
to the hardware (not shared among other systems). Examples
are Dell MD1200/1220 or similary.

--
Jesper




--
"Patriotism is the conviction that your country is superior to all others because you were born in it." -- George Bernard Shaw

Re: [PERFORMANCE] expanding to SAN: which portion best to move

From
Robert Haas
Date:
On Wed, May 4, 2011 at 6:31 AM, Willy-Bas Loos <willybas@gmail.com> wrote:
> I'm asking them for (real) benchmarks, thanks for the advice. (fio is not
> available for us now to do it myself, grmbl)
> It just occurred to me that it is not necessarily the case that reading the
> indexes causes a lot of random I/O (on the index itself).
> I mean, maybe the index is generally read sequentially and then, when
> retrieving the data, there is a lot of random I/O.
> if it's a long story, any tips for info about this (book or web site)?

If you don't do anything special, and if the query plan says "Index
Scan" rather than "Bitmap Index Scan", then both the index I/O and the
table I/O are likely to be fairly random.  However there are a number
of cases in which you can expect the table I/O to be sequential:

- In some cases, you may happen to insert rows with an ordering that
matches the index.  For example, if you have a table with not too many
updates and deletes, and an index on a serial column, then new rows
will have a higher value in that column than old rows, and will also
typically be physically after older rows in the file.  Or you might be
inserting timestamped data from oldest to newest.
- If the planner chooses a Bitmap Index Scan, it effectively scans the
index to figure out which table blocks to read, and then reads those
table blocks in block number order, so that the I/O is sequential,
with skips.
- If you CLUSTER the table on a particular index, it will be
physically ordered to match the index's key ordering.  As the table is
further modified the degree of clustering will gradually decline;
eventually you may wish to re-CLUSTER.

It's also worth keeping in mind that the index itself won't
necessarily be accessed in physically sequential order.  The point of
the index is to emit the rows in key order, but if the table is
heavily updated, it won't necessarily be the case that a page
containing lower-valued keys physically precedes a page containing
higher-valued keys.  I'm actually somewhat fuzzy on how this works,
and to what extent it's a problem in practice, but I am fairly sure it
can happen.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PERFORMANCE] expanding to SAN: which portion best to move

From
Robert Klemme
Date:
On Fri, May 13, 2011 at 9:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, May 4, 2011 at 6:31 AM, Willy-Bas Loos <willybas@gmail.com> wrote:
>> I'm asking them for (real) benchmarks, thanks for the advice. (fio is not
>> available for us now to do it myself, grmbl)
>> It just occurred to me that it is not necessarily the case that reading the
>> indexes causes a lot of random I/O (on the index itself).
>> I mean, maybe the index is generally read sequentially and then, when
>> retrieving the data, there is a lot of random I/O.
>> if it's a long story, any tips for info about this (book or web site)?
>
> If you don't do anything special, and if the query plan says "Index
> Scan" rather than "Bitmap Index Scan", then both the index I/O and the
> table I/O are likely to be fairly random.  However there are a number
> of cases in which you can expect the table I/O to be sequential:
>
> - In some cases, you may happen to insert rows with an ordering that
> matches the index.  For example, if you have a table with not too many
> updates and deletes, and an index on a serial column, then new rows
> will have a higher value in that column than old rows, and will also
> typically be physically after older rows in the file.  Or you might be
> inserting timestamped data from oldest to newest.
> - If the planner chooses a Bitmap Index Scan, it effectively scans the
> index to figure out which table blocks to read, and then reads those
> table blocks in block number order, so that the I/O is sequential,
> with skips.

Are these two separate phases (i.e. first scan index completely, then
access table)?

> - If you CLUSTER the table on a particular index, it will be
> physically ordered to match the index's key ordering.  As the table is
> further modified the degree of clustering will gradually decline;
> eventually you may wish to re-CLUSTER.
>
> It's also worth keeping in mind that the index itself won't
> necessarily be accessed in physically sequential order.  The point of
> the index is to emit the rows in key order, but if the table is
> heavily updated, it won't necessarily be the case that a page
> containing lower-valued keys physically precedes a page containing
> higher-valued keys.  I'm actually somewhat fuzzy on how this works,
> and to what extent it's a problem in practice, but I am fairly sure it
> can happen.

Separating index and tables might not be a totally good idea
generally.  Richard Foote has an excellent article about Oracle but I
assume at least a few things do apply to PostgreSQL as well - it's at
least worth as something to check PostgreSQL's access patterns
against:


http://richardfoote.wordpress.com/2008/04/16/separate-indexes-from-tables-some-thoughts-part-i-everything-in-its-right-place/

I would probably rather try to separate data by the nature and
frequency of accesses.  One reasonable separation would be to leave
all frequently accessed tables *and* their indexes on local RAID and
moving less frequently accessed data to the SAN.  This separation
could be easily identified if you have separate tables for current and
historic data.

Kind regards

robert


--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Re: [PERFORMANCE] expanding to SAN: which portion best to move

From
Robert Haas
Date:
On Mon, May 16, 2011 at 4:19 AM, Robert Klemme
<shortcutter@googlemail.com> wrote:
>> - If the planner chooses a Bitmap Index Scan, it effectively scans the
>> index to figure out which table blocks to read, and then reads those
>> table blocks in block number order, so that the I/O is sequential,
>> with skips.
>
> Are these two separate phases (i.e. first scan index completely, then
> access table)?

Yes.

> Separating index and tables might not be a totally good idea
> generally.  Richard Foote has an excellent article about Oracle but I
> assume at least a few things do apply to PostgreSQL as well - it's at
> least worth as something to check PostgreSQL's access patterns
> against:
>
>
http://richardfoote.wordpress.com/2008/04/16/separate-indexes-from-tables-some-thoughts-part-i-everything-in-its-right-place/
>
> I would probably rather try to separate data by the nature and
> frequency of accesses.  One reasonable separation would be to leave
> all frequently accessed tables *and* their indexes on local RAID and
> moving less frequently accessed data to the SAN.  This separation
> could be easily identified if you have separate tables for current and
> historic data.

Yeah, I think the idea of putting tables and indexes in separate
tablespaces is mostly to bring more I/O bandwidth to bear on the same
data.  But there are other reasonable things you might do also - e.g.
put the indexes on an SSD, and the tables on a spinning disk, figuring
that the SSD is less reliable but you can always rebuild the index if
you need to...

Also, a lot of people have reported big speedups from putting pg_xlog
on a dedicated RAID 1 pair, or moving the PostgreSQL logs off the data
partition.  So those sorts of divisions should be considered also.
Your idea of dividing things by access frequency is another good
thought.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PERFORMANCE] expanding to SAN: which portion best to move

From
Robert Klemme
Date:
On Mon, May 16, 2011 at 4:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, May 16, 2011 at 4:19 AM, Robert Klemme
> <shortcutter@googlemail.com> wrote:
>>> - If the planner chooses a Bitmap Index Scan, it effectively scans the
>>> index to figure out which table blocks to read, and then reads those
>>> table blocks in block number order, so that the I/O is sequential,
>>> with skips.
>>
>> Are these two separate phases (i.e. first scan index completely, then
>> access table)?
>
> Yes.

So then a single query will only ever access one of both at a time.

>> Separating index and tables might not be a totally good idea
>> generally.  Richard Foote has an excellent article about Oracle but I
>> assume at least a few things do apply to PostgreSQL as well - it's at
>> least worth as something to check PostgreSQL's access patterns
>> against:
>>
>>
http://richardfoote.wordpress.com/2008/04/16/separate-indexes-from-tables-some-thoughts-part-i-everything-in-its-right-place/
>>
>> I would probably rather try to separate data by the nature and
>> frequency of accesses.  One reasonable separation would be to leave
>> all frequently accessed tables *and* their indexes on local RAID and
>> moving less frequently accessed data to the SAN.  This separation
>> could be easily identified if you have separate tables for current and
>> historic data.
>
> Yeah, I think the idea of putting tables and indexes in separate
> tablespaces is mostly to bring more I/O bandwidth to bear on the same
> data.

Richard commented on that as well, I believe it was in
http://richardfoote.wordpress.com/2008/04/18/separate-indexes-from-tables-some-thoughts-part-ii-there-there/

The main point is that you do not benefit from the larger IO bandwidth
if access patterns do not permit parallel access to both disks (e.g.
because you first need to read index blocks in order to know the table
blocks to read).  The story might be different though if you have a
lot of concurrent accesses.  But even then, if the table is a hotspot
chances are that index blocks are cached and you only need physical IO
for table blocks...

>  But there are other reasonable things you might do also - e.g.
> put the indexes on an SSD, and the tables on a spinning disk, figuring
> that the SSD is less reliable but you can always rebuild the index if
> you need to...

Richard commented on that theory as well:
http://richardfoote.wordpress.com/2008/05/02/indexes-in-their-own-tablespace-recoverability-advantages-get-back/

The point: if you do the math you might figure that lost indexes lead
to so much downtime that you don't want to risk that and the rebuild
isn't all that simple (in terms of time).  For a reasonable sized
database recovery might be significantly faster than rebuilding.

> Also, a lot of people have reported big speedups from putting pg_xlog
> on a dedicated RAID 1 pair, or moving the PostgreSQL logs off the data
> partition.  So those sorts of divisions should be considered also.

Now, this is something I'd seriously consider because access patterns
to pg_xlog are vastly different than those of table and index data!
So you want to have pg_xlog on a device with high reliability and high
write speed.

> Your idea of dividing things by access frequency is another good
> thought.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Re: [PERFORMANCE] expanding to SAN: which portion best to move

From
Craig Ringer
Date:
On 05/17/2011 03:00 PM, Robert Klemme wrote:

> The main point is that you do not benefit from the larger IO bandwidth
> if access patterns do not permit parallel access to both disks (e.g.
> because you first need to read index blocks in order to know the table
> blocks to read).

This makes me wonder if Pg attempts to pre-fetch blocks of interest for
areas where I/O needs can be known in advance, while there's still other
works or other I/O to do. For example, pre-fetching for the next
iteration of a nested loop while still executing the prior one. Is it
even possible?

I'm guessing not, because (AFAIK) Pg uses only synchronous blocking I/O,
and with that there isn't really a way to pre-fetch w/o threads or
helper processes. Linux (at least) supports buffered async I/O, so it'd
be possible to submit such prefetch requests ... on modern Linux
kernels. Portably doing so, though - not so much.

--
Craig Ringer

Re: [PERFORMANCE] expanding to SAN: which portion best to move

From
Cédric Villemain
Date:
2011/5/17 Craig Ringer <craig@postnewspapers.com.au>:
> On 05/17/2011 03:00 PM, Robert Klemme wrote:
>
>> The main point is that you do not benefit from the larger IO bandwidth
>> if access patterns do not permit parallel access to both disks (e.g.
>> because you first need to read index blocks in order to know the table
>> blocks to read).
>
> This makes me wonder if Pg attempts to pre-fetch blocks of interest for
> areas where I/O needs can be known in advance, while there's still other
> works or other I/O to do. For example, pre-fetching for the next iteration
> of a nested loop while still executing the prior one. Is it even possible?
>
> I'm guessing not, because (AFAIK) Pg uses only synchronous blocking I/O, and
> with that there isn't really a way to pre-fetch w/o threads or helper
> processes. Linux (at least) supports buffered async I/O, so it'd be possible
> to submit such prefetch requests ... on modern Linux kernels. Portably doing
> so, though - not so much.

Prefetching is used in bitmapheapscan. The GUC
effeective_io_concurrency allow you increase the prefetch window.

>
> --
> Craig Ringer
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

Re: [PERFORMANCE] expanding to SAN: which portion best to move

From
Robert Klemme
Date:
On Tue, May 17, 2011 at 11:47 AM, Craig Ringer
<craig@postnewspapers.com.au> wrote:
> On 05/17/2011 03:00 PM, Robert Klemme wrote:
>
>> The main point is that you do not benefit from the larger IO bandwidth
>> if access patterns do not permit parallel access to both disks (e.g.
>> because you first need to read index blocks in order to know the table
>> blocks to read).
>
> This makes me wonder if Pg attempts to pre-fetch blocks of interest for
> areas where I/O needs can be known in advance, while there's still other
> works or other I/O to do. For example, pre-fetching for the next iteration
> of a nested loop while still executing the prior one. Is it even possible?
>
> I'm guessing not, because (AFAIK) Pg uses only synchronous blocking I/O, and
> with that there isn't really a way to pre-fetch w/o threads or helper
> processes. Linux (at least) supports buffered async I/O, so it'd be possible
> to submit such prefetch requests ... on modern Linux kernels. Portably doing
> so, though - not so much.

There is a much more serious obstacle than the mere technical (if that
was one at all): prefetching is only reasonable if you can predict
which data you need with high probability (say >= 80%).  If you can't
you'll have much more IO than without prefetching and overall
performance likely suffers.  Naturally that probability depends on the
data at hand and the access pattern.  As Cédric wrote, there seems to
be at least one case where it's done.

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Re: [PERFORMANCE] expanding to SAN: which portion best to move

From
Greg Smith
Date:
On 05/17/2011 05:47 AM, Craig Ringer wrote:
> This makes me wonder if Pg attempts to pre-fetch blocks of interest
> for areas where I/O needs can be known in advance, while there's still
> other works or other I/O to do. For example, pre-fetching for the next
> iteration of a nested loop while still executing the prior one. Is it
> even possible?

Well, remember that a nested loop isn't directly doing any I/O.  It's
pulling rows from some lower level query node.  So the useful question
to ask is "how can pre-fetch speed up the table access methods?"  That
worked out like this:

Sequential Scan:  logic here was added and measured as useful for one
system with terrible I/O.  Everywhere else it was tried on Linux, the
read-ahead logic in the kernel seems to make this redundant.  Punted as
too much complexity relative to measured average gain.  You can try to
tweak this on a per-file database in an application, but the kernel has
almost as much information to make that decision usefully as the
database does.

Index Scan:  It's hard to know what you're going to need in advance here
and pipeline the reads, so this hasn't really been explored yet.

Bitmap heap scan:  Here, the exact list of blocks to fetch is known in
advance, they're random, and it's quite possible for the kernel to
schedule them more efficiently than serial access of them can do.  This
was added as the effective_io_concurrency feature (it's the only thing
that feature impacts), which so far is only proven to work on Linux.
Any OS implementing the POSIX API used will also get this however;
FreeBSD was the next likely candidate that might benefit when I last
looked around.

> I'm guessing not, because (AFAIK) Pg uses only synchronous blocking
> I/O, and with that there isn't really a way to pre-fetch w/o threads
> or helper processes. Linux (at least) supports buffered async I/O, so
> it'd be possible to submit such prefetch requests ... on modern Linux
> kernels. Portably doing so, though - not so much.

Linux supports the POSIX_FADV_WILLNEED advisory call, which is perfect
for suggesting what blocks will be accessed in the near future in the
bitmap heap scan case.  That's how effective_io_concurrency works.

Both Solaris and Linux also have async I/O mechanisms that could be used
instead.  Greg Stark built a prototype and there's an obvious speed-up
there to be had.  But the APIs for this aren't very standard, and it's
really hard to rearchitect the PostgreSQL buffer manager to operate in a
less synchronous way.  Hoping that more kernels support the "will need"
API usefully, which meshes very well with how PostgreSQL thinks about
the problem, is where things are at right now.  With so many bigger
PostgreSQL sites on Linux, that's worked out well so far.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Re: [PERFORMANCE] expanding to SAN: which portion best to move

From
Vitalii Tymchyshyn
Date:
24.05.11 21:48, Greg Smith написав(ла):
>
> Bitmap heap scan: Here, the exact list of blocks to fetch is known in
> advance, they're random, and it's quite possible for the kernel to
> schedule them more efficiently than serial access of them can do. This
> was added as the effective_io_concurrency feature (it's the only thing
> that feature impacts), which so far is only proven to work on Linux.
> Any OS implementing the POSIX API used will also get this however;
> FreeBSD was the next likely candidate that might benefit when I last
> looked around.
FreeBSD unfortunately do not have the support :(
It has AIO, but does not have the call needed to enable this settings.

Best regards, Vitalii Tymchyshyn

Re: [PERFORMANCE] expanding to SAN: which portion best to move

From
Willy-Bas Loos
Date:


On Mon, May 16, 2011 at 10:19 AM, Robert Klemme <shortcutter@googlemail.com> wrote:
On Fri, May 13, 2011 at 9:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:
Separating index and tables might not be a totally good idea
generally.  Richard Foote has an excellent article about Oracle but I
assume at least a few things do apply to PostgreSQL as well - it's at
least worth as something to check PostgreSQL's access patterns
against:

http://richardfoote.wordpress.com/2008/04/16/separate-indexes-from-tables-some-thoughts-part-i-everything-in-its-right-place/

I would probably rather try to separate data by the nature and
frequency of accesses.  One reasonable separation would be to leave
all frequently accessed tables *and* their indexes on local RAID and
moving less frequently accessed data to the SAN.  This separation
could be easily identified if you have separate tables for current and
historic data.

Well, after reading your article i have been reading some materail about it on the internet, stating that separating indexes from data for performance benefits is a myth.
I found your comment "So then a single query will only ever access one of both at a time." very smart (no sarcasm there).
I also found a thread on AskTom that said mainly "the goal is to achieve even io." (that makes absolute sense)

In my situation, where i need extra space on a SAN, it seems logical to separate the tables from the indexes, to achieve just that: roughly even IO.. (put tables on san, leave indexes on raid10 cluster)
Or am i being silly?

Cheers,

WBL
--
"Patriotism is the conviction that your country is superior to all others because you were born in it." -- George Bernard Shaw

Re: [GENERAL] [PERFORMANCE] expanding to SAN: which portion best to move

From
Greg Smith
Date:
On 06/09/2011 07:43 AM, Willy-Bas Loos wrote:
Well, after reading your article i have been reading some materail about it on the internet, stating that separating indexes from data for performance benefits is a myth.
I found your comment " So then a single query will only ever access one of both at a time." very smart (no sarcasm there).
I also found a thread on AskTom that said mainly "the goal is to achieve even io." (that makes absolute sense)

The idea that separating indexes and tables from one another via a tablespace is inherently good is a myth.  Particularly nowadays, where the fastest part of a drive is nearly twice as fast as the slowest one in sequential transfers, and the ratio between sequential and random I/O is huge.  Trying to get clever about breaking out a tablespace is unlikely to outsmart what you'd get if you just let the OS deal with that stuff.

What is true is that when you have multiple tiers of storage speeds available, allocating the indexes and tables among them optimally is both difficult and potentially worthwhile.  A customer of mine has two drive arrays, one of which is about 50% faster than the other; second was added as expansion once the first filled.  Nowadays, both are 75% full, and I/O on each has to be carefully balanced.  Making sure the heavily hit indexes are on the fast array, and that less critical things are not placed there, is the difference between that site staying up or going down.

The hidden surprise in this problem for most people is the day they discover that *the* most popular indexes, the ones they figured had to go on the fastest storage around, are actually sitting in RAM all the time anyway.  It's always fun and sad at the same time to watch someone spend a fortune on some small expensive storage solution, move their most performance critical data to it, and discover nothing changed.  Some days that works great; others, it's no faster all, because that data was already in memory.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

Re: [GENERAL] [PERFORMANCE] expanding to SAN: which portion best to move

From
Robert Klemme
Date:


On Thu, Jun 9, 2011 at 7:44 PM, Greg Smith <greg@2ndquadrant.com> wrote:
On 06/09/2011 07:43 AM, Willy-Bas Loos wrote:
Well, after reading your article i have been reading some materail about it on the internet, stating that separating indexes from data for performance benefits is a myth.
I found your comment " So then a single query will only ever access one of both at a time." very smart (no sarcasm there).
I also found a thread on AskTom that said mainly "the goal is to achieve even io." (that makes absolute sense)

The idea that separating indexes and tables from one another via a tablespace is inherently good is a myth.  Particularly nowadays, where the fastest part of a drive is nearly twice as fast as the slowest one in sequential transfers, and the ratio between sequential and random I/O is huge.  Trying to get clever about breaking out a tablespace is unlikely to outsmart what you'd get if you just let the OS deal with that stuff.

What is true is that when you have multiple tiers of storage speeds available, allocating the indexes and tables among them optimally is both difficult and potentially worthwhile.  A customer of mine has two drive arrays, one of which is about 50% faster than the other; second was added as expansion once the first filled.  Nowadays, both are 75% full, and I/O on each has to be carefully balanced.  Making sure the heavily hit indexes are on the fast array, and that less critical things are not placed there, is the difference between that site staying up or going down.

The hidden surprise in this problem for most people is the day they discover that *the* most popular indexes, the ones they figured had to go on the fastest storage around, are actually sitting in RAM all the time anyway.  It's always fun and sad at the same time to watch someone spend a fortune on some small expensive storage solution, move their most performance critical data to it, and discover nothing changed.  Some days that works great; others, it's no faster all, because that data was already in memory.

Adding a few more thoughts to this: it is important to understand the very different nature of read and write IO.  While write IO usually can be done concurrently to different IO channels (devices) for read IO there are typically dependencies, e.g. you need to read the index before you know which part of the table you need to read.  Thus both cannot be done concurrently for a single select unless the whole query is partitioned and executed in parallel (Oracle can do that for example).  Even then each parallel executor has this dependency between index and table data.  That's the same situation as with concurrent queries into the same table and index.  There are of course exceptions, e.g. during a sequential table scan you can know beforehand which blocks need to be read next and fetch them wile processing the current block(s).  The buffering strategy also plays an important role here.

Bottom line: one needs to look at each case individually, do the math and ideally also measurements.

Kind regards

robert


--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/