Thread: table spaces

table spaces

From
Gregg Jaskiewicz
Date:
Performance related question. 
With Linux (centos 6.3+), 64bit, ext4 in mind, how would you guys go about distributing write load across disks. 

Lets say I have quite few disks, and I can partition them the way I want, in mirror configuration (to get some hardware failure resilience). Should I separate tables from indexes onto separate raids ?

I know WAL has to go on a separate disk, for added performance. 

I'm looking for your experiences, and most importantly how do you go about deciding which way is best. I.e. which combinations make sense to try out first, short of all permutations :-)

Thanks. 



--
GJ

Re: table spaces

From
Bèrto ëd Sèra
Date:
Hi Gregg

yes, keep the indexes on a separate channel. Much depends on how the
data is mapped and accessed, sometimes even distributing the data
itself onto different spaces may do good.

If you use a lot of logging (say you feed a massive pgFouine
activity), you would want to have that on yet another separate
channel, too.

There is no universal bullet for this, iostat will eventually tell you
whether your load distribution is good enough, or not.

Cheers
Bèrto

On 9 March 2013 17:51, Gregg Jaskiewicz <gryzman@gmail.com> wrote:
> Performance related question.
> With Linux (centos 6.3+), 64bit, ext4 in mind, how would you guys go about
> distributing write load across disks.
>
> Lets say I have quite few disks, and I can partition them the way I want, in
> mirror configuration (to get some hardware failure resilience). Should I
> separate tables from indexes onto separate raids ?
>
> I know WAL has to go on a separate disk, for added performance.
>
> I'm looking for your experiences, and most importantly how do you go about
> deciding which way is best. I.e. which combinations make sense to try out
> first, short of all permutations :-)
>
> Thanks.
>
>
>
> --
> GJ



--
==============================
If Pac-Man had affected us as kids, we'd all be running around in a
darkened room munching pills and listening to repetitive music.


Re: table spaces

From
Scott Marlowe
Date:
On Sat, Mar 9, 2013 at 10:51 AM, Gregg Jaskiewicz <gryzman@gmail.com> wrote:
> Performance related question.
> With Linux (centos 6.3+), 64bit, ext4 in mind, how would you guys go about
> distributing write load across disks.
>
> Lets say I have quite few disks, and I can partition them the way I want, in
> mirror configuration (to get some hardware failure resilience). Should I
> separate tables from indexes onto separate raids ?
>
> I know WAL has to go on a separate disk, for added performance.
>
> I'm looking for your experiences, and most importantly how do you go about
> deciding which way is best. I.e. which combinations make sense to try out
> first, short of all permutations :-)

First get a baseline for how things work with just pg_xlog on one
small set (RAID 1 is often plenty) and RAID-10 on all the rest with
all the data (i.e. base directory) there. With a fast HW RAID
controller this is often just about as fast as any amount of breaking
things out will be. But if you do break things out and they are fster
then you'll know by how much. If it's slower then you know you've got
a really busy set and some not so busy ones. And so on...


Re: table spaces

From
Gregg Jaskiewicz
Date:



On 10 March 2013 02:19, Scott Marlowe <scott.marlowe@gmail.com> wrote:
First get a baseline for how things work with just pg_xlog on one
small set (RAID 1 is often plenty) and RAID-10 on all the rest with
all the data (i.e. base directory) there. With a fast HW RAID
controller this is often just about as fast as any amount of breaking
things out will be. But if you do break things out and they are fster
then you'll know by how much. If it's slower then you know you've got
a really busy set and some not so busy ones. And so on...
(side note, google mail in their infinite evilness make it tricky if not careful to reply below post using their webapp, beware).

I might have a table that needs some heavy writes, and while it doesn't necessarily have to be fast TPS wise, I don't want it to bog down rest of the database.
Reads are ok, as I'm planning for the DB to fit in RAM cache, so once read - it will be there - more or less. 
It's distributing writes that I care about mostly. 

I'll try iostat, whilst running characterisation scenarios. That was my plan anyway. 
I had no idea separating indexes from tables might help too. Would have thought, they both are interconnected so much in the code, that dividing them up won't help as much. 


What about table partitioning ? For heavy writes, would some sort of a strategy there make difference ?


--
GJ

Re: table spaces

From
Scott Marlowe
Date:
On Sun, Mar 10, 2013 at 6:23 AM, Gregg Jaskiewicz <gryzman@gmail.com> wrote:
>
>
>
> On 10 March 2013 02:19, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>>
>> First get a baseline for how things work with just pg_xlog on one
>> small set (RAID 1 is often plenty) and RAID-10 on all the rest with
>> all the data (i.e. base directory) there. With a fast HW RAID
>> controller this is often just about as fast as any amount of breaking
>> things out will be. But if you do break things out and they are fster
>> then you'll know by how much. If it's slower then you know you've got
>> a really busy set and some not so busy ones. And so on...
>
> (side note, google mail in their infinite evilness make it tricky if not
> careful to reply below post using their webapp, beware).
>
> I might have a table that needs some heavy writes, and while it doesn't
> necessarily have to be fast TPS wise, I don't want it to bog down rest of
> the database.
> Reads are ok, as I'm planning for the DB to fit in RAM cache, so once read -
> it will be there - more or less.
> It's distributing writes that I care about mostly.

A large drive set with a HW caching RAID controller is usually pretty
good at that. There are times breaking the data up onto separate disks
help.  Often what happens is that when you split your data up one set
or another gets suboptimal performance by having fewer drives
available to it etc. So at least have a benchmark of all the disks
together first to compare to.  Often it's nice to have two machines
one with one big array and one you can reconfigured to test on to
compare with in real time.

> I'll try iostat, whilst running characterisation scenarios. That was my plan
> anyway.
> I had no idea separating indexes from tables might help too. Would have
> thought, they both are interconnected so much in the code, that dividing
> them up won't help as much.

It can definitely help if you only have a few drives. The more drives
you have the less each read or write is likely to bump into another
read or write on the same drive.  I.e. as the number of drives
approaches infinity the chance of collision is zero.

> What about table partitioning ? For heavy writes, would some sort of a
> strategy there make difference ?

Partitioning is more about handling large data sets and splitting off
the most used parts into separate partitions so you're not
sequentially scanning a huge data set to get to the small, more
commonly used parts.


Re: table spaces

From
Gregg Jaskiewicz
Date:
Ok, 

So by that token (more drives the better), I should have raid 5 (or whichever will work) with all 6 drives in it ?

I was thinking about splitting it up like this. I have 6 drives (and one spare). Combine them into 3 separate logical drives in mirrored configuration (for some hardware redundancy). 
And use one for base system, and some less frequently read tables, second one for WAL, third one for whatever tables/indexes happen to need separate space (subject to characterisation outcome). 

I was basically under impression that separating WAL is a big plus. On top of that, having separate partition to hold some other data - will do too. 
But it sounds - from what you said - like having all in single logical drive will work, because raid card will spread the load amongst number of drives. 
Am I understanding that correctly ?


Re: table spaces

From
John R Pierce
Date:
On 3/12/2013 2:31 PM, Gregg Jaskiewicz wrote:
> I was basically under impression that separating WAL is a big plus. On
> top of that, having separate partition to hold some other data - will
> do too.
> But it sounds - from what you said - like having all in single logical
> drive will work, because raid card will spread the load amongst number
> of drives.
> Am I understanding that correctly ?
>

both those models have merits.

doing a single raid 10 should fairly evenly distribute the IO workload
given adequate concurrency, and suitable stripe size and alignment.
there are scenarios where a hand tuned spindle layout can be more
efficient, but there's also the possibility of getting write bound on
any one of those 3 seperate raid1's, and having other disks sitting idle.





--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



Re: table spaces

From
Gregg Jaskiewicz
Date:

On 12 March 2013 21:59, John R Pierce <pierce@hogranch.com> wrote:
On 3/12/2013 2:31 PM, Gregg Jaskiewicz wrote:
I was basically under impression that separating WAL is a big plus. On top of that, having separate partition to hold some other data - will do too.
But it sounds - from what you said - like having all in single logical drive will work, because raid card will spread the load amongst number of drives.
Am I understanding that correctly ?


both those models have merits.

doing a single raid 10 should fairly evenly distribute the IO workload given adequate concurrency, and suitable stripe size and alignment.     there are scenarios where a hand tuned spindle layout can be more efficient, but there's also the possibility of getting write bound on any one of those 3 seperate raid1's, and having other disks sitting idle.

I'm trying to get an understanding of all options. 

So out of 6 disks then having 4 in Raid 1+0 configuration and other two in mirror for WAL. That's another option then for me to test. 

 

Re: table spaces

From
Alban Hertroys
Date:
On 12 March 2013 22:31, Gregg Jaskiewicz <gryzman@gmail.com> wrote:
Ok, 

So by that token (more drives the better), I should have raid 5 (or whichever will work) with all 6 drives in it ?
 
Raid 5 is usually advised against, as in many scenarios it won't perform very well. For example, see: http://www.revsys.com/writings/postgresql-performance.html
--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

Re: table spaces

From
Shaun Thomas
Date:
On 03/12/2013 05:49 PM, Gregg Jaskiewicz wrote:

> So out of 6 disks then having 4 in Raid 1+0 configuration and other
> two in mirror for WAL. That's another option then for me to test.

That is an option, but it's not necessarily a good one. If all you have
are six disks, you are probably better off just doing a big RAID-10 for
everything.

As Alban mentioned, you would do best to stay away from RAID-5 as that
will basically obliterate your write speeds, which is especially
critical for WAL files.

There is one caveat to all of this, of course. What you plan to use the
server for has a critical impact on these decisions.

What it boils down to, is that six spindles is not very many. You don't
really have enough to dedicate any to your WAL files without drastically
cutting the available IOPS for your regular data. In addition, with any
of these setups, you'll want a hot spare in case of drive failure. If
you have six disks available, you really only have five. That's not an
even number, so one is wasted in RAID-10, and you may find in your
RAID-5 tests that it doesn't work well for long-term use.

The truth is that you can get by for quite a while on this no matter how
you split up those six spindles. Until you need a lot of heavy disk
reads or writes. Then it's going to hurt. So ask yourself what kind of
load or usage you expect, and plan accordingly.

I have to tell you though, we had a server with twelve spindles three
years ago, and it barely kept up with our transaction load. We had two
hot spares, a RAID-1, and 8-disks in a RAID-10. Several pgbench tests
back then showed that our RAID-10 could only adequately serve 1800TPS
directly, and we needed at least 6000. Ultimately, it lead to us
switching to NVRAM (SSD) for high TPS data, and creating a tablespace on
a RAID-10 for archived or low-priority data.

But we got by on those original 12 spindles for a couple years. If your
data needs are less, you can probably do OK with six. For a while,
anyway. :)

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: table spaces

From
John R Pierce
Date:
On 3/13/2013 6:26 AM, Shaun Thomas wrote:
> I have to tell you though, we had a server with twelve spindles three
> years ago, and it barely kept up with our transaction load. We had two
> hot spares, a RAID-1, and 8-disks in a RAID-10. Several pgbench tests
> back then showed that our RAID-10 could only adequately serve 1800TPS
> directly, and we needed at least 6000. Ultimately, it lead to us
> switching to NVRAM (SSD) for high TPS data, and creating a tablespace
> on a RAID-10 for archived or low-priority data.

I've got a server in my lab we use for benchmarking and testing that has
a 20 disk raid10 of 15k 150gb SAS2 drives.   is faaassssst.    I don't
remember the IOPS numbers off the top of my head but 4 SSD's in a raid0
were only a little bit faster at heavy OLTP small transaction
write-intensive operations.



--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



Re: table spaces

From
Greg Jaskiewicz
Date:

On 13 Mar 2013, at 13:26, Shaun Thomas <sthomas@optionshouse.com> wrote:

> On 03/12/2013 05:49 PM, Gregg Jaskiewicz wrote:
>
>> So out of 6 disks then having 4 in Raid 1+0 configuration and other
>> two in mirror for WAL. That's another option then for me to test.
>
> That is an option, but it's not necessarily a good one. If all you have are six disks, you are probably better off
justdoing a big RAID-10 for everything. 
>
> As Alban mentioned, you would do best to stay away from RAID-5 as that will basically obliterate your write speeds,
whichis especially critical for WAL files. 
>
> There is one caveat to all of this, of course. What you plan to use the server for has a critical impact on these
decisions.
>
> What it boils down to, is that six spindles is not very many. You don't really have enough to dedicate any to your
WALfiles without drastically cutting the available IOPS for your regular data. In addition, with any of these setups,
you'llwant a hot spare in case of drive failure. If you have six disks available, you really only have five. That's not
aneven number, so one is wasted in RAID-10, and you may find in your RAID-5 tests that it doesn't work well for
long-termuse. 
I do have spare, but there was no point mentioning them.


>
> The truth is that you can get by for quite a while on this no matter how you split up those six spindles. Until you
needa lot of heavy disk reads or writes. Then it's going to hurt. So ask yourself what kind of load or usage you
expect,and plan accordingly. 
>
> I have to tell you though, we had a server with twelve spindles three years ago, and it barely kept up with our
transactionload. We had two hot spares, a RAID-1, and 8-disks in a RAID-10. Several pgbench tests back then showed that
ourRAID-10 could only adequately serve 1800TPS directly, and we needed at least 6000. Ultimately, it lead to us
switchingto NVRAM (SSD) for high TPS data, and creating a tablespace on a RAID-10 for archived or low-priority data. 
>
> But we got by on those original 12 spindles for a couple years. If your data needs are less, you can probably do OK
withsix. For a while, anyway. :) 

The problem I have is that this isn't just for a single installation. The product can be installed (and will be) in few
placesin the future. We just provide software. But I need tools to plan these things depending on load per installation
basis,taking into account that it practically be different for each customer, not to mention variations in hardware. 

Is that SSD mixed in with other disks?

Re: table spaces

From
Shaun Thomas
Date:
On 03/13/2013 10:30 AM, Greg Jaskiewicz wrote:

> Is that SSD mixed in with other disks?

Kinda. We chose a PCIe-based SSD (FusionIO). We have a RAID-10 for
low-transaction and archived data.

It worked for us, but it's pretty spendy.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: table spaces

From
Scott Marlowe
Date:
On Wed, Mar 13, 2013 at 7:26 AM, Shaun Thomas <sthomas@optionshouse.com> wrote:
> On 03/12/2013 05:49 PM, Gregg Jaskiewicz wrote:
>
>> So out of 6 disks then having 4 in Raid 1+0 configuration and other
>> two in mirror for WAL. That's another option then for me to test.
>
>
> That is an option, but it's not necessarily a good one. If all you have are
> six disks, you are probably better off just doing a big RAID-10 for
> everything.

When you've got a smallish number of drives, one big RAID-10 should be
the starting point, and until benchmarks or testing show otherwise
it's usually the best answer.

Note that due to some issues with fsync (esp on ext2/3 partitions) it
is often STILL useful to make a separate partition on that one big
RAID-10 for pg_xlog to live on.

The only real exception to one big RAID-10 on a smallish number of
disks is an almost read only database.  In that case using a RAID-5
might be acceptable for the greater amount of storage you can get.
Something like a log analysis or reporting db for instance.