Thread: Hardware advice for scalable warehouse db

Hardware advice for scalable warehouse db

From
chris
Date:
Hi list,

My employer will be donated a NetApp FAS 3040 SAN [1] and we want to run
our warehouse DB on it. The pg9.0 DB currently comprises ~1.5TB of
tables, 200GB of indexes, and grows ~5%/month. The DB is not update
critical, but undergoes larger read and insert operations frequently.

My employer is a university with little funds and we have to find a
cheap way to scale for the next 3 years, so the SAN seems a good chance
to us. We are now looking for the remaining server parts to maximize DB
performance with costs <= $4000. I digged out the following
configuration with the discount we receive from Dell:

  1 x Intel Xeon X5670, 6C, 2.93GHz, 12M Cache
  16 GB (4x4GB) Low Volt DDR3 1066Mhz
  PERC H700 SAS RAID controller
  4 x 300 GB 10k SAS 6Gbps 2.5" in RAID 10

I was thinking to put the WAL and the indexes on the local disks, and
the rest on the SAN. If funds allow, we might downgrade the disks to
SATA and add a 50 GB SATA SSD for the WAL (SAS/SATA mixup not possible).

Any comments on the configuration? Any experiences with iSCSI vs. Fibre
Channel for SANs and PostgreSQL? If the SAN setup sucks, do you see a
cheap alternative how to connect as many as 16 x 2TB disks as DAS?

Thanks so much!

Best,
Chris

[1]: http://www.b2net.co.uk/netapp/fas3000.pdf

Re: Hardware advice for scalable warehouse db

From
Greg Smith
Date:
chris wrote:
> My employer is a university with little funds and we have to find a
> cheap way to scale for the next 3 years, so the SAN seems a good chance
> to us.

A SAN is rarely ever the cheapest way to scale anything; you're paying
extra for reliability instead.


> I was thinking to put the WAL and the indexes on the local disks, and
> the rest on the SAN. If funds allow, we might downgrade the disks to
> SATA and add a 50 GB SATA SSD for the WAL (SAS/SATA mixup not possible).
>

If you want to keep the bulk of the data on the SAN, this is a
reasonable way to go, performance-wise.  But be aware that losing the
WAL means your database is likely corrupted.  That means that much of
the reliability benefit of the SAN is lost in this configuration.


> Any experiences with iSCSI vs. Fibre
> Channel for SANs and PostgreSQL? If the SAN setup sucks, do you see a
> cheap alternative how to connect as many as 16 x 2TB disks as DAS?
>

I've never heard anyone recommend iSCSI if you care at all about
performance, while FC works fine for this sort of job.  The physical
dimensions of 3.5" drives makes getting 16 of them in one reasonably
sized enclosure normally just out of reach.  But a Dell PowerVault
MD1000 will give you 15 x 2TB as inexpensively as possible in a single
3U space (well, as cheaply as you want to go--you might build your own
giant box cheaper but I wouldn't recommend ).  I've tested MD1000,
MD1200, and MD1220 arrays before, and always gotten seriously good
performance relative to the dollars spent with that series.  Only one of
these Dell storage arrays I've heard two disappointing results from (but
not tested directly yet) is the MD3220.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD



Re: Hardware advice for scalable warehouse db

From
jesper@krogh.cc
Date:
>   1 x Intel Xeon X5670, 6C, 2.93GHz, 12M Cache
>   16 GB (4x4GB) Low Volt DDR3 1066Mhz
>   PERC H700 SAS RAID controller
>   4 x 300 GB 10k SAS 6Gbps 2.5" in RAID 10

Apart from Gregs excellent recommendations. I would strongly suggest
more memory. 16GB in 2011 is really on the low side.

PG is using memory (either shared_buffers og OS cache) for
keeping frequently accessed data in. Good recommendations are hard
without knowledge of data and access-patterns, but 64, 128 and 256GB
system are quite frequent when you have data that can't all be
in memory at once.

SAN's are nice, but I think you can buy a good DAS thing each year
for just the support cost of a Netapp, but you might have gotten a
really good deal there too. But you are getting a huge amount of
advanced configuration features and potential ways of sharing and..
and .. just see the specs.

.. and if you need those the SAN is a good way to go, but
they do come with a huge pricetag.

Jesper


Re: Hardware advice for scalable warehouse db

From
Robert Schnabel
Date:
On 7/15/2011 2:10 AM, Greg Smith wrote:
> chris wrote:
>> My employer is a university with little funds and we have to find a
>> cheap way to scale for the next 3 years, so the SAN seems a good chance
>> to us.
> A SAN is rarely ever the cheapest way to scale anything; you're paying
> extra for reliability instead.
>
>
>> I was thinking to put the WAL and the indexes on the local disks, and
>> the rest on the SAN. If funds allow, we might downgrade the disks to
>> SATA and add a 50 GB SATA SSD for the WAL (SAS/SATA mixup not possible).
>>
> If you want to keep the bulk of the data on the SAN, this is a
> reasonable way to go, performance-wise.  But be aware that losing the
> WAL means your database is likely corrupted.  That means that much of
> the reliability benefit of the SAN is lost in this configuration.
>
>
>> Any experiences with iSCSI vs. Fibre
>> Channel for SANs and PostgreSQL? If the SAN setup sucks, do you see a
>> cheap alternative how to connect as many as 16 x 2TB disks as DAS?
>>
> I've never heard anyone recommend iSCSI if you care at all about
> performance, while FC works fine for this sort of job.  The physical
> dimensions of 3.5" drives makes getting 16 of them in one reasonably
> sized enclosure normally just out of reach.  But a Dell PowerVault
> MD1000 will give you 15 x 2TB as inexpensively as possible in a single
> 3U space (well, as cheaply as you want to go--you might build your own
> giant box cheaper but I wouldn't recommend ).

I'm curious what people think of these:
http://www.pc-pitstop.com/sas_cables_enclosures/scsase166g.asp

I currently have my database on two of these and for my purpose they
seem to be fine and are quite a bit less expensive than the Dell
MD1000.  I actually have three more of the 3G versions with expanders
for mass storage arrays (RAID0) and haven't had any issues with them in
the three years I've had them.

Bob




Re: Hardware advice for scalable warehouse db

From
Scott Marlowe
Date:
On Fri, Jul 15, 2011 at 12:34 AM, chris <chricki@gmx.net> wrote:
> I was thinking to put the WAL and the indexes on the local disks, and
> the rest on the SAN. If funds allow, we might downgrade the disks to
> SATA and add a 50 GB SATA SSD for the WAL (SAS/SATA mixup not possible).

Just to add to the conversation, there's no real advantage to putting
WAL on SSD.  Indexes can benefit from them, but WAL is mosty
seqwuential throughput and for that a pair of SATA 1TB drives at
7200RPM work just fine for most folks.  For example, in one big server
we're running we have 24 drives in a RAID-10 for the /data/base dir
with 4 drives in a RAID-10 for pg_xlog, and those 4 drives tend to
have the same io util % under iostat as the 24 drives under normal
usage.  It takes a special kind of load (lots of inserts happening in
large transactions quickly) for the 4 drive RAID-10 to have more than
50% util ever.

Re: Hardware advice for scalable warehouse db

From
Scott Marlowe
Date:
On Fri, Jul 15, 2011 at 10:39 AM, Robert Schnabel
<schnabelr@missouri.edu> wrote:
> I'm curious what people think of these:
> http://www.pc-pitstop.com/sas_cables_enclosures/scsase166g.asp
>
> I currently have my database on two of these and for my purpose they seem to
> be fine and are quite a bit less expensive than the Dell MD1000.  I actually
> have three more of the 3G versions with expanders for mass storage arrays
> (RAID0) and haven't had any issues with them in the three years I've had
> them.

I have a co-worker who's familiar with them and they seem a lot like
the 16 drive units we use from Aberdeen, which fully outfitted with
15k SAS drives run $5k to $8k depending on the drives etc.

Re: Hardware advice for scalable warehouse db

From
Josh Berkus
Date:
> Just to add to the conversation, there's no real advantage to putting
> WAL on SSD.  Indexes can benefit from them, but WAL is mosty
> seqwuential throughput and for that a pair of SATA 1TB drives at
> 7200RPM work just fine for most folks.

Actually, there's a strong disadvantage to putting WAL on SSD.  SSD is
very prone to fragmentation if you're doing a lot of deleting and
replacing files.  I've implemented data warehouses where the database
was on SSD but WAL was still on HDD.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Hardware advice for scalable warehouse db

From
"chris r."
Date:
Hi list,

Thanks a lot for your very helpful feedback!

> I've tested MD1000, MD1200, and MD1220 arrays before, and always gotten
> seriously good performance relative to the dollars spent
Great hint, but I'm afraid that's too expensive for us. But it's a great
way to scale over the years, I'll keep that in mind.

I had a look at other server vendors who offer 4U servers with slots for
16 disks for 4k in total (w/o disks), maybe that's an even
cheaper/better solution for us. If you had the choice between 16 x 2TB
SATA vs. a server with some SSDs for WAL/indexes and a SAN (with SATA
disk) for data, what would you choose performance-wise?

Again, thanks so much for your help.

Best,
Chris

Re: Hardware advice for scalable warehouse db

From
Rob Wultsch
Date:
On Fri, Jul 15, 2011 at 11:49 AM, chris r. <chricki@gmx.net> wrote:
> Hi list,
>
> Thanks a lot for your very helpful feedback!
>
>> I've tested MD1000, MD1200, and MD1220 arrays before, and always gotten
>> seriously good performance relative to the dollars spent
> Great hint, but I'm afraid that's too expensive for us. But it's a great
> way to scale over the years, I'll keep that in mind.
>
> I had a look at other server vendors who offer 4U servers with slots for
> 16 disks for 4k in total (w/o disks), maybe that's an even
> cheaper/better solution for us. If you had the choice between 16 x 2TB
> SATA vs. a server with some SSDs for WAL/indexes and a SAN (with SATA
> disk) for data, what would you choose performance-wise?
>
> Again, thanks so much for your help.
>
> Best,
> Chris

SATA drives can easily flip bits and postgres does not checksum data,
so it will not automatically detect corruption for you. I would steer
well clear of SATA unless you are going to be using a fs like ZFS
which checksums data. I would hope that a SAN would detect this for
you, but I have no idea.


--
Rob Wultsch
wultsch@gmail.com

Re: Hardware advice for scalable warehouse db

From
Josh Berkus
Date:
On 7/14/11 11:34 PM, chris wrote:
> Any comments on the configuration? Any experiences with iSCSI vs. Fibre
> Channel for SANs and PostgreSQL? If the SAN setup sucks, do you see a
> cheap alternative how to connect as many as 16 x 2TB disks as DAS?

Here's the problem with iSCSI: on gigabit ethernet, your maximum
possible throughput is 100mb/s, which means that your likely maximum
database throughput (for a seq scan or vacuum, for example) is 30mb/s.
That's about a third of what you can get with good internal RAID.

While multichannel iSCSI is possible, it's hard to configure, and
doesn't really allow you to spread a *single* request across multiple
channels.  So: go with fiber channel if you're using a SAN.

iSCSI also has horrible lag times, but you don't care about that so much
for DW.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Hardware advice for scalable warehouse db

From
Terry Schmitt
Date:
Hi Chris,

A couple comments on the NetApp SAN.
We use NetApp, primarily with Fiber connectivity and FC drives. All of the Postgres files are located on the SAN and this configuration works well.
We have tried iSCSI, but performance his horrible. Same with SATA drives.
The SAN will definitely be more costly then local drives. It really depends on what your needs are.
The biggest benefit for me in using SAN is using the special features that it offers. We use snapshots and flex clones, which is a great way to backup and clone large databases.

Cheers,
Terry


On Thu, Jul 14, 2011 at 11:34 PM, chris <chricki@gmx.net> wrote:
Hi list,

My employer will be donated a NetApp FAS 3040 SAN [1] and we want to run
our warehouse DB on it. The pg9.0 DB currently comprises ~1.5TB of
tables, 200GB of indexes, and grows ~5%/month. The DB is not update
critical, but undergoes larger read and insert operations frequently.

My employer is a university with little funds and we have to find a
cheap way to scale for the next 3 years, so the SAN seems a good chance
to us. We are now looking for the remaining server parts to maximize DB
performance with costs <= $4000. I digged out the following
configuration with the discount we receive from Dell:

 1 x Intel Xeon X5670, 6C, 2.93GHz, 12M Cache
 16 GB (4x4GB) Low Volt DDR3 1066Mhz
 PERC H700 SAS RAID controller
 4 x 300 GB 10k SAS 6Gbps 2.5" in RAID 10

I was thinking to put the WAL and the indexes on the local disks, and
the rest on the SAN. If funds allow, we might downgrade the disks to
SATA and add a 50 GB SATA SSD for the WAL (SAS/SATA mixup not possible).

Any comments on the configuration? Any experiences with iSCSI vs. Fibre
Channel for SANs and PostgreSQL? If the SAN setup sucks, do you see a
cheap alternative how to connect as many as 16 x 2TB disks as DAS?

Thanks so much!

Best,
Chris

[1]: http://www.b2net.co.uk/netapp/fas3000.pdf


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance