Re: Postgresql 9.4 and ZFS? - Mailing list pgsql-general
From | Tomas Vondra |
---|---|
Subject | Re: Postgresql 9.4 and ZFS? |
Date | |
Msg-id | 560C3ED0.1080303@2ndquadrant.com Whole thread Raw |
In response to | Re: Postgresql 9.4 and ZFS? (Benjamin Smith <lists@benjamindsmith.com>) |
Responses |
Re: Postgresql 9.4 and ZFS?
|
List | pgsql-general |
On 09/30/2015 07:33 PM, Benjamin Smith wrote: > On Wednesday, September 30, 2015 02:22:31 PM Tomas Vondra wrote: >> I think this really depends on the workload - if you have a lot of >> random writes, CoW filesystems will perform significantly worse than >> e.g. EXT4 or XFS, even on SSD. > > I'd be curious about the information you have that leads you to this > conclusion. As with many (most?) "rules of thumb", the devil is > quiteoften the details. A lot of testing done recently, and also experience with other CoW filesystems (e.g. BTRFS explicitly warns about workloads with a lot of random writes). >>> We've been running both on ZFS/CentOS 6 with excellent results, and >>> are considering putting the two together. In particular, the CoW >>> nature (and subsequent fragmentation/thrashing) of ZFS becomes >>> largely irrelevant on SSDs; the very act of wear leveling on an SSD >>> is itself a form of intentional thrashing that doesn't affect >>> performance since SSDs have no meaningful seek time. >> >> I don't think that's entirely true. Sure, SSD drives handle random I/O >> much better than rotational storage, but it's not entirely free and >> sequential I/O is still measurably faster. >> >> It's true that the drives do internal wear leveling, but it probably >> uses tricks that are impossible to do at the filesystem level (which is >> oblivious to internal details of the SSD). CoW also increases the amount >> of blocks that need to be reclaimed. >> >> In the benchmarks I've recently done on SSD, EXT4 / XFS are ~2x >> faster than ZFS. But of course, if the ZFS features are interesting >> for you, maybe it's a reasonable price. > > Again, the details would be highly interesting to me. What memory > optimization was done? Status of snapshots? Was the pool RAIDZ or > mirrored vdevs? How many vdevs? Was compression enabled? What ZFS > release was this? Was this on Linux,Free/Open/Net BSD, Solaris, or > something else? I'm not sure what you mean by "memory optimization" so the answer is probably "no". FWIW I don't have much experience with ZFS in production, all I have is data from benchmarks I've recently done exactly with the goal to educate myself on the differences of current filesystems. The tests were done on Linux, with kernel 4.0.4 / zfs 0.6.4. So fairly recent versions, IMHO. My goal was to test the file systems under the same conditions and used a single device (Intel S3700 SSD). I'm aware that this is not a perfect test and ZFS offers interesting options (e.g. moving ZIL to a separate device). I plan to benchmark some additional configurations with more devices and such. > > A 2x performance difference is almost inconsequential in my > experience, where growth is exponential. 2x performance change > generally means 1 to 2 years of advancement or deferment against the > progression of hardware; our current, relatively beefy DB servers > are already older than that, and have an anticipated life cycle of at > leastanother couple years. I'm not sure I understand what you suggest here. What I'm saying is that when I do a stress test on the same hardware, I do get ~2x the throughput with EXT4/XFS, compared to ZFS. > // Our situation // Lots of RAM for the workload: 128 GB of ECC RAM > with an on-disk DB size of ~ 150 GB. Pretty much, everything runs > straight out of RAM cache, with only writes hitting disk. Smart > reports 4/96 read/write ratio. So your active set fits into RAM? I'd guess all your writes are then WAL + checkpoints, which probably makes them rather sequential. If that's the case, CoW filesystems may perform quite well - I was mostly referring to workloads with a lot of random writes to he device. > Query load: Constant, heavy writes and heavy use of temp tables in > order to assemble very complex queries. Pretty much the "worst case" > mix of reads and writes, average daily peak of about 200-250 > queries/second. I'm not sure how much random I/O that actually translates to. According to the numbers I've posted to this thread few hours ago, a tuned ZFS on a single SSD device handles ~2.5k tps (with dataset ~2x the RAM). But those are OLTP queries - your queries may write much more data. OTOH it really does not matter that much if your active set fits into RAM, because then it's mostly about writing to ZIL. > > 16 Core XEON servers, 32 HT "cores". > > SAS 3 Gbps > > CentOS 6 is our O/S of choice. > > Currently, we're running Intel 710 SSDs in a software RAID1 without > trim enabled and generally happy with the reliability and performance > we see. We're planning to upgrade storage soon (since we're over 50% > utilization) and in the process, bring the magic goodness of > snapshots/clones from ZFS. I presume by "software RAID1" you mean "mirrored vdev zpool", correct? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-general by date: