optimal ZFS filesystem on JBOD layout - Mailing list pgsql-performance

From Joachim Worringen
Subject optimal ZFS filesystem on JBOD layout
Date
Msg-id 4C2C94B8.6080606@iathh.de
Whole thread Raw
Responses Re: optimal ZFS filesystem on JBOD layout
List pgsql-performance
Greetings,

we are running a few databases of currently 200GB (growing) in total for
data warehousing:
- new data via INSERTs for (up to) millions of rows per day; sometimes
with UPDATEs
- most data in a single table (=> 10 to 100s of millions of rows)
- queries SELECT subsets of this table via an index
- for effective parallelisation, queries create (potentially large)
non-temporary tables which are deleted at the end of the query => lots
of simple INSERTs and SELECTs during queries
- large transactions: they may contain millions of INSERTs/UPDATEs
- running version PostgreSQL 8.4.2

We are moving all this to a larger system - the hardware is available,
therefore fixed:
- Sun X4600 (16 cores, 64GB)
- external SAS JBOD with 24 2,5" slots:
   - 18x SAS 10k 146GB drives
   - 2x SAS 10k 73GB drives
   - 4x Intel SLC 32GB SATA SSD
- JBOD connected to Adaptec SAS HBA with BBU
- Internal storage via on-board RAID HBA:
   - 2x 73GB SAS 10k for OS (RAID1)
   - 2x Intel SLC 32GB SATA SSD for ZIL (RAID1) (?)
- OS will be Solaris 10 to have ZFS as filesystem (and dtrace)
- 10GigE towards client tier (currently, another X4600 with 32cores and
64GB)

What would be the optimal storage/ZFS layout for this? I checked
solarisinternals.com and some PostgreSQL resources and came to the
following concept - asking for your comments:
- run the JBOD without HW-RAID, but let all redundancy be done by ZFS
for maximum flexibility
- create separate ZFS pools for tablespaces (data, index, temp) and WAL
on separate devices (LUNs):
- use the 4 SSDs in the JBOD as Level-2 ARC cache (can I use a single
cache for all pools?) w/o redundancy
- use the 2 SSDs connected to the on-board HBA as RAID1 for ZFS ZIL

Potential issues that I see:
- the ZFS ZIL will not benefit from a BBU (as it is connected to the
backplane, driven by the onboard-RAID), and might be too small (32GB for
~2TB of data with lots of writes)?
- the pools on the JBOD might have the wrong size for the tablespaces -
like: using the 2 73GB drives as RAID 1 for temp might become too small,
but adding a 146GB drive might not be a good idea?
- with 20 spindles, does it make sense at all to use dedicated devices
for the tabelspaces, or will the load be distributed well enough across
the spindles anyway?

thanks for any comments & suggestions,

   Joachim


pgsql-performance by date:

Previous
From: Thom Brown
Date:
Subject: Re: What is the best way to optimize this query
Next
From: Greg Smith
Date:
Subject: Re: optimal ZFS filesystem on JBOD layout