Re: Postgresql 9.4 and ZFS? - Mailing list pgsql-general

From Tomas Vondra
Subject Re: Postgresql 9.4 and ZFS?
Date
Msg-id 560C3ED0.1080303@2ndquadrant.com
Whole thread Raw
In response to Re: Postgresql 9.4 and ZFS?  (Benjamin Smith <lists@benjamindsmith.com>)
Responses Re: Postgresql 9.4 and ZFS?
List pgsql-general

On 09/30/2015 07:33 PM, Benjamin Smith wrote:
> On Wednesday, September 30, 2015 02:22:31 PM Tomas Vondra wrote:
>> I think this really depends on the workload - if you have a lot of
>> random writes, CoW filesystems will perform significantly worse than
>> e.g. EXT4 or XFS, even on SSD.
>
> I'd be curious about the information you have that leads you to this
> conclusion. As with many (most?) "rules of thumb", the devil is
> quiteoften the details.

A lot of testing done recently, and also experience with other CoW
filesystems (e.g. BTRFS explicitly warns about workloads with a lot of
random writes).

>>> We've been running both on ZFS/CentOS 6 with excellent results, and
>>> are considering putting the two together. In particular, the CoW
>>> nature (and subsequent fragmentation/thrashing) of ZFS becomes
>>> largely irrelevant on SSDs; the very act of wear leveling on an SSD
>>> is itself a form of intentional thrashing that doesn't affect
>>> performance since SSDs have no meaningful seek time.
>>
>> I don't think that's entirely true. Sure, SSD drives handle random I/O
>> much better than rotational storage, but it's not entirely free and
>> sequential I/O is still measurably faster.
>>
>> It's true that the drives do internal wear leveling, but it probably
>> uses tricks that are impossible to do at the filesystem level (which is
>> oblivious to internal details of the SSD). CoW also increases the amount
>> of blocks that need to be reclaimed.
>>
>> In the benchmarks I've recently done on SSD, EXT4 / XFS are ~2x
>> faster than ZFS. But of course, if the ZFS features are interesting
>> for you, maybe it's a reasonable price.
>
> Again, the details would be highly interesting to me. What memory
> optimization was done? Status of snapshots? Was the pool RAIDZ or
> mirrored vdevs? How many vdevs? Was compression enabled? What ZFS
> release was this? Was this on Linux,Free/Open/Net BSD, Solaris, or
> something else?

I'm not sure what you mean by "memory optimization" so the answer is
probably "no".

FWIW I don't have much experience with ZFS in production, all I have is
data from benchmarks I've recently done exactly with the goal to educate
myself on the differences of current filesystems.

The tests were done on Linux, with kernel 4.0.4 / zfs 0.6.4. So fairly
recent versions, IMHO.

My goal was to test the file systems under the same conditions and used
a single device (Intel S3700 SSD). I'm aware that this is not a perfect
test and ZFS offers interesting options (e.g. moving ZIL to a separate
device). I plan to benchmark some additional configurations with more
devices and such.

>
> A 2x performance difference is almost inconsequential in my
> experience, where growth is exponential. 2x performance change
> generally means 1 to 2 years of advancement or deferment against the
> progression of hardware; our current, relatively beefy DB servers
> are already older than that, and have an anticipated life cycle of at
> leastanother couple years.

I'm not sure I understand what you suggest here. What I'm saying is that
when I do a stress test on the same hardware, I do get ~2x the
throughput with EXT4/XFS, compared to ZFS.

> // Our situation // Lots of RAM for the workload: 128 GB of ECC RAM
> with an on-disk DB size of ~ 150 GB. Pretty much, everything runs
> straight out of RAM cache, with only writes hitting disk. Smart
> reports 4/96 read/write ratio.

So your active set fits into RAM? I'd guess all your writes are then WAL
+ checkpoints, which probably makes them rather sequential.

If that's the case, CoW filesystems may perform quite well - I was
mostly referring to workloads with a lot of random writes to he device.

> Query load: Constant, heavy writes and heavy use of temp tables in
> order to assemble very complex queries. Pretty much the "worst case"
> mix of reads and writes, average daily peak of about 200-250
 > queries/second.

I'm not sure how much random I/O that actually translates to. According
to the numbers I've posted to this thread few hours ago, a tuned ZFS on
a single SSD device handles ~2.5k tps (with dataset ~2x the RAM). But
those are OLTP queries - your queries may write much more data. OTOH it
really does not matter that much if your active set fits into RAM,
because then it's mostly about writing to ZIL.

>
> 16 Core XEON servers, 32 HT "cores".
>
> SAS 3 Gbps
>
> CentOS 6 is our O/S of choice.
>
> Currently, we're running Intel 710 SSDs in a software RAID1 without
> trim enabled and generally happy with the reliability and performance
> we see. We're planning to upgrade storage soon (since we're over 50%
> utilization) and in the process, bring the magic goodness of
> snapshots/clones from ZFS.

I presume by "software RAID1" you mean "mirrored vdev zpool", correct?


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-general by date:

Previous
From: Keith Fiske
Date:
Subject: Re: Postgresql 9.4 and ZFS?
Next
From: Tomas Vondra
Date:
Subject: Re: Postgresql 9.4 and ZFS?