Thread: ZFS vs. UFS

ZFS vs. UFS

From

Laszlo Nagy

Date:

24 July 2012, 09:51:28

Hello,

Under FreeBSD 9, what filesystem should I use for PostgreSQL? (Dell PowerEdge 2900, 24G mem, 10x2T SATA2 disk, Intel RAID controller.)

ZFS is journaled, and it is more independent of the hardware. So if the computer goes wrong, I can move the zfs array to a different server.
UFS is not journaled. Also I have to rely on the RAID card to build the RAID array. If there is a hw problem with it, then I won't be able to recover the data easily.

I wonder if UFS has better performance or not. Or can you suggest another fs? Just of the PGDATA directory.

Thanks,

Laszlo

Re: ZFS vs. UFS

From

Georgi Naplatanov

Date:

24 July 2012, 10:04:07

Hi.

As far as I know UFS is faster than ZFS on FreeBSD 9.0.

Some users reported stability problem with ZFS on AMD64 and maybe UFS is
better choice.

Best regards
Georgi

On 07/24/2012 03:51 PM, Laszlo Nagy wrote:
>
> Hello,
>
> Under FreeBSD 9, what filesystem should I use for PostgreSQL? (Dell
> PowerEdge 2900, 24G mem, 10x2T SATA2 disk, Intel RAID controller.)
>
>     * ZFS is journaled, and it is more independent of the hardware. So
>       if the computer goes wrong, I can move the zfs array to a
>       different server.
>     * UFS is not journaled. Also I have to rely on the RAID card to
>       build the RAID array. If there is a hw problem with it, then I
>       won't be able to recover the data easily.
>
> I wonder if UFS has better performance or not. Or can you suggest
> another fs? Just of the PGDATA directory.
>
> Thanks,
>
> Laszlo
>

Heavy inserts load wile querying...

From

Ioannis Anagnostopoulos

Date:

24 July 2012, 10:23:00

Hello,
The Postres 9.0 database we use gets about 20K inserts per minute. As
long as you don't query at the same time the database is copying fine.
However long running queries seems to delay so much the db that the
application server buffers the incoming data as it cannot insert them
fast enough. The server has 4 HD. One is used for archive, past static
tables, the second is the index of the current live tables and the third
is the current data. The fourth is the OS.

The serve specs are:
Intel(R) Xeon(R) CPU           W3520  @ 2.67GHz
4 cores
18GB Ram

Do you think that this work load is high that requires an upgrade to
cluster or RAID 10 to cope with it?

Kind Regards
Yiannis

Re: ZFS vs. UFS

From

Torsten Zuehlsdorff

Date:

24 July 2012, 11:27:12

On 24.07.2012 14:51, Laszlo Nagy wrote:

>   * UFS is not journaled.

There is journal support for UFS as far as i know. Please have a look at
the gjournal manpage.

Greetings,
Torsten

Re: Heavy inserts load wile querying...

From

Craig James

Date:

24 July 2012, 11:30:27

On Tue, Jul 24, 2012 at 6:22 AM, Ioannis Anagnostopoulos <ioannis@anatec.com> wrote:

Hello,
The Postres 9.0 database we use gets about 20K inserts per minute. As long as you don't query at the same time the database is copying fine. However long running queries seems to delay so much the db that the application server buffers the incoming data as it cannot insert them fast enough. The server has 4 HD. One is used for archive, past static tables, the second is the index of the current live tables and the third is the current data. The fourth is the OS.

The serve specs are:
Intel(R) Xeon(R) CPU W3520 @ 2.67GHz
4 cores
18GB Ram

Do you think that this work load is high that requires an upgrade to cluster or RAID 10 to cope with it?

You need to learn more about what exactly is your bottleneck ... memory, CPU, or I/O. That said, I suspect you'd be way better off with this hardware if you built a single software RAID 10 array and put everything on it.

Right now, the backup disk and the OS disk are sitting idle most of the time. With a RAID10 array, you'd at least double, maybe quadruple your I/O. And if you added a battery-backed RAID controller, you'd have a pretty fast system.

Craig

Kind Regards
Yiannis

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: ZFS vs. UFS

From

Craig James

Date:

24 July 2012, 11:34:25

On 24/07/2012 14:51, Laszlo Nagy wrote:
>
> Hello,
>
> Under FreeBSD 9, what filesystem should I use for PostgreSQL? (Dell
> PowerEdge 2900, 24G mem, 10x2T SATA2 disk, Intel RAID controller.)
>
> * ZFS is journaled, and it is more independent of the hardware. So if
> the computer goes wrong, I can move the zfs array to a different server.
> * UFS is not journaled. Also I have to rely on the RAID card to build
> the RAID array. If there is a hw problem with it, then I won't be
> able to recover the data easily.
>
> I wonder if UFS has better performance or not. Or can you suggest
> another fs? Just of the PGDATA directory.

Relying on physically moving a disk isn't a good backup/recovery strategy. Disks are the least reliable single component in a modern computer. You should figure out the best file system for your application, and separately figure out a recovery strategy, one that can survive the failure of *any* component in your system, including the disk itself.

Craig

Re: Heavy inserts load wile querying...

From

Ioannis Anagnostopoulos

Date:

24 July 2012, 11:42:24

On 24/07/2012 15:30, Craig James wrote:

On Tue, Jul 24, 2012 at 6:22 AM, Ioannis Anagnostopoulos <ioannis@anatec.com> wrote:
Hello,
The Postres 9.0 database we use gets about 20K inserts per minute. As long as you don't query at the same time the database is copying fine. However long running queries seems to delay so much the db that the application server buffers the incoming data as it cannot insert them fast enough. The server has 4 HD. One is used for archive, past static tables, the second is the index of the current live tables and the third is the current data. The fourth is the OS.

The serve specs are:
Intel(R) Xeon(R) CPU W3520 @ 2.67GHz
4 cores
18GB Ram

Do you think that this work load is high that requires an upgrade to cluster or RAID 10 to cope with it?

You need to learn more about what exactly is your bottleneck ... memory, CPU, or I/O. That said, I suspect you'd be way better off with this hardware if you built a single software RAID 10 array and put everything on it.

Right now, the backup disk and the OS disk are sitting idle most of the time. With a RAID10 array, you'd at least double, maybe quadruple your I/O. And if you added a battery-backed RAID controller, you'd have a pretty fast system.

Craig

Kind Regards
Yiannis

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

I can only assume that it is an i/o issue. At last this is what I can read from iostat:

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await svctm %util
sda               0.00   277.50    0.00   20.00     0.00 2344.00   117.20     0.09    2.25   4.50   9.00
sdb               1.00     0.50 207.50    4.50 45228.00    33.50   213.50     2.40   11.34   4.13 87.50
sdc               0.00     0.00   29.50    0.00 4916.00     0.00   166.64     0.11    3.73   1.36   4.00
sdd               0.00     0.00    4.00 179.50    96.00 3010.00    16.93   141.25 828.77   5.45 100.00

avg-cpu: %user   %nice %system %iowait %steal   %idle
           7.60    0.00    2.08   46.45    0.00   43.87

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await svctm %util
sda               0.00    61.50    0.00   28.00     0.00   704.00    25.14     0.04    3.04   1.43   4.00
sdb               2.00     0.00   90.50 162.00 19560.00 2992.00    89.31    78.92 194.26   3.76 95.00
sdc               0.00     0.00   10.50    0.00 2160.00     0.00   205.71     0.02    1.90   1.90   2.00
sdd               0.00     0.00    1.50 318.50    24.00 5347.00    16.78   134.72 572.81   3.12 100.00

Where sdb is the data disk and sdd is the index disk. "Top" hardly reports anything more than 10% per postgress process ever, while when the query is running, these numbers on iostat are consistatnly high. At least I can identify my buffering the moment that index hits 100% util. Is there any other way that I can identify bottlenecks in a more positive way?

Re: ZFS vs. UFS

From

Laszlo Nagy

Date:

24 July 2012, 15:27:39

> I wonder if UFS has better performance or not. Or can you suggest
> another fs? Just of the PGDATA directory.

Relying on physically moving a disk isn't a good backup/recovery strategy. Disks are the least reliable single component in a modern computer. You should figure out the best file system for your application, and separately figure out a recovery strategy, one that can survive the failure of *any* component in your system, including the disk itself.

This is why I use a RAID array of 10 disks. So there is no single point of failure. What else could I do? (Yes, I can make regular backups, but that is not the same. I can still loose data...)

Re: ZFS vs. UFS

From

Laszlo Nagy

Date:

24 July 2012, 15:36:08

> On 24.07.2012 14:51, Laszlo Nagy wrote:
>
>>   * UFS is not journaled.
>
> There is journal support for UFS as far as i know. Please have a look
> at the gjournal manpage.
Yes, but gjournal works for disk devices. I would have rely on the hw
card for RAID. When the card goes wrong I won't be able to access my data.

I could also buy an identical RAID card. In fact I could buy a complete
backup server. But right now I don't have the money for that. So I would
like to use a solution that allows me to recover from a failure even if
the RAID card goes wrong.

It might also be possible to combine gmirror + gjournal, but that is not
good enough. Performance and stability of a simple gmirror with two
disks is much worse then a raidz array with 10 disks (and hot spare), or
even a raid 1+0 (and hot spare) that is supported by the hw RAID card.

So I would like to stick with UFS+hw card support (and then I need to
buy an identical RAID card if I can), or ZFS.

Re: ZFS vs. UFS

From

Craig James

Date:

24 July 2012, 16:14:52

On Tue, Jul 24, 2012 at 11:27 AM, Laszlo Nagy <gandalf@shopzeus.com> wrote:

> I wonder if UFS has better performance or not. Or can you suggest
> another fs? Just of the PGDATA directory.

Relying on physically moving a disk isn't a good backup/recovery strategy. Disks are the least reliable single component in a modern computer. You should figure out the best file system for your application, and separately figure out a recovery strategy, one that can survive the failure of *any* component in your system, including the disk itself.
This is why I use a RAID array of 10 disks. So there is no single point of failure. What else could I do? (Yes, I can make regular backups, but that is not the same. I can still loose data...)

Only you can answer that because it depends on your application. If you're operating PayPal, you probably want 24/7 100% reliability. If you're operating a social networking site for teenagers, losing data is probably not a catastrophe.

In my experience, most data loss is NOT from equipment failure. It's from software bugs and operator errors. If your recovery plan doesn't cover this, you have a problem.

Craig

Re: ZFS vs. UFS

From

Torsten Zuehlsdorff

Date:

25 July 2012, 04:27:13

>> On 24.07.2012 14:51, Laszlo Nagy wrote:
>>
>>>   * UFS is not journaled.
>>
>> There is journal support for UFS as far as i know. Please have a look
>> at the gjournal manpage.
 >
> Yes, but gjournal works for disk devices.

That isn't completly correct! gjournal works with all GEOM-devices,
which could be not only disk devices, but also (remote) disk devices,
(remote) files, (remote) software-raids etc.

It is very easy to mirror the *complete* disk from one *server* to
another. I use this technic for customers which need cheap backups of
their complete server.

But a RAID card will be much faster than this. I just wanted to make
this clear.

Greetings,
Torsten

Re: ZFS vs. UFS

From

Greg Smith

Date:

27 July 2012, 02:25:04

On 07/24/2012 08:51 AM, Laszlo Nagy wrote:

Under FreeBSD 9, what filesystem should I use for PostgreSQL? (Dell PowerEdge 2900, 24G mem, 10x2T SATA2 disk, Intel RAID controller.)

When Intel RAID controller is that? All of the ones on the motherboard are pretty much useless if that's what you have. Those are slower than software RAID and it's going to add driver issues you could otherwise avoid. Better to connect the drives to the non-RAID ports or configure the controller in JBOD mode first.

Using one of the better RAID controllers, one of Dell's good PERC models for example, is one of the biggest hardware upgrades you could make to this server. If your database is mostly read traffic, it won't matter very much. Write-heavy loads really benefit from a good RAID controller's write cache.

ZFS is journaled, and it is more independent of the hardware. So if the computer goes wrong, I can move the zfs array to a different server.
UFS is not journaled. Also I have to rely on the RAID card to build the RAID array. If there is a hw problem with it, then I won't be able to recover the data easily.

You should be able to get UFS working with a software mirror and journaling using gstripe/gmirror or vinum. It doesn't matter that much for PostgreSQL though. The data writes are journaled by the database, and it tries to sync data to disk after updating metadata too. There are plenty of PostgreSQL installs on FreeBSD/UFS that work fine.

ZFS needs more RAM and has higher CPU overhead than UFS does. It's a heavier filesystem all around than UFS is. Your server is fast enough that you should be able to afford it though, and the feature set is nice. In addition to the RAID setup being simple to handle, having checksums on your data is a good safety feature for PostgreSQL.

ZFS will heavily use server RAM for caching by default, much more so than UFS. Make sure you check into that, and leave enough RAM for the database to run too. (Doing *some* caching that way is good for Postgres; you just don't want *all* the memory to be used for that)

Moving disks to another server is a very low probability fix for a broken system. The disks are a likely place for the actual failure to happen at in the first place. I like to think more in terms of "how can I create a real-time replica of this data?" to protect databases, and the standby server for that doesn't need to be an expensive system. That said, there is no reason to set things up so that they only work with that Intel RAID controller, given that it's not a very good piece of hardware anyway.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

Re: ZFS vs. UFS

From

Laszlo Nagy

Date:

31 July 2012, 05:50:34

> When Intel RAID controller is that?  All of the ones on the
> motherboard are pretty much useless if that's what you have. Those are
> slower than software RAID and it's going to add driver issues you
> could otherwise avoid.  Better to connect the drives to the non-RAID
> ports or configure the controller in JBOD mode first.
>
> Using one of the better RAID controllers, one of Dell's good PERC
> models for example, is one of the biggest hardware upgrades you could
> make to this server.  If your database is mostly read traffic, it
> won't matter very much.  Write-heavy loads really benefit from a good
> RAID controller's write cache.
Actually, it is a PERC with write-cache and BBU.
>
> ZFS will heavily use server RAM for caching by default, much more so
> than UFS.  Make sure you check into that, and leave enough RAM for the
> database to run too.  (Doing *some* caching that way is good for
> Postgres; you just don't want *all* the memory to be used for that)
Right now, the size of the database is below 5GB. So I guess it will fit
into memory. I'm concerned about data safety and availability. I have
been in a situation where the RAID card went wrong and I was not able to
recover the data because I could not get an identical RAID card in time.
I have also been in a situation where the system was crashing two times
a day, and we didn't know why. (As it turned out, it was a bug in the
"stable" kernel and we could not identify this for two weeks.) However,
we had to do fsck after every crash. With a 10TB disk array, it was
extremely painful. ZFS is much better: short recovery time and it is
RAID card independent. So I think I have answered my own question - I'm
going to use ZFS to have better availability, even if it leads to poor
performance. (That was the original question: how bad it it to use ZFS
for PostgreSQL, instead of the native UFS.)
>
> Moving disks to another server is a very low probability fix for a
> broken system.  The disks are a likely place for the actual failure to
> happen at in the first place.
Yes, but we don't have to worry about that. raidz2 + hot spare is safe
enough. The RAID card is the only single point of failure.
> I like to think more in terms of "how can I create a real-time replica
> of this data?" to protect databases, and the standby server for that
> doesn't need to be an expensive system.  That said, there is no reason
> to set things up so that they only work with that Intel RAID
> controller, given that it's not a very good piece of hardware anyway.
I'm not sure how to create a real-time replica. This database is updated
frequently. There is always a process that reads/writes into the
database. I was thinking about using slony to create slave databases. I
have no experience with that. We have a 100Mbit connection. I'm not sure
how much bandwidth we need to maintain a real-time slave database. It
might be a good idea.

I'm sorry, I feel I'm being off-topic.

Re: ZFS vs. UFS

From

Craig James

Date:

31 July 2012, 11:33:46

On Tue, Jul 31, 2012 at 1:50 AM, Laszlo Nagy <gandalf@shopzeus.com> wrote:

When Intel RAID controller is that? All of the ones on the motherboard are pretty much useless if that's what you have. Those are slower than software RAID and it's going to add driver issues you could otherwise avoid. Better to connect the drives to the non-RAID ports or configure the controller in JBOD mode first.

Using one of the better RAID controllers, one of Dell's good PERC models for example, is one of the biggest hardware upgrades you could make to this server. If your database is mostly read traffic, it won't matter very much. Write-heavy loads really benefit from a good RAID controller's write cache.
Actually, it is a PERC with write-cache and BBU.

Last time I checked, "PERC" was a meaningless name. Dell put that label on a variety of different controllers ... some were quite good, some were terrible. The latest PERC controllers are pretty good. If your machine is a few years old, the PERC controller may be a piece of junk.

Craig

ZFS will heavily use server RAM for caching by default, much more so than UFS. Make sure you check into that, and leave enough RAM for the database to run too. (Doing *some* caching that way is good for Postgres; you just don't want *all* the memory to be used for that)
Right now, the size of the database is below 5GB. So I guess it will fit into memory. I'm concerned about data safety and availability. I have been in a situation where the RAID card went wrong and I was not able to recover the data because I could not get an identical RAID card in time. I have also been in a situation where the system was crashing two times a day, and we didn't know why. (As it turned out, it was a bug in the "stable" kernel and we could not identify this for two weeks.) However, we had to do fsck after every crash. With a 10TB disk array, it was extremely painful. ZFS is much better: short recovery time and it is RAID card independent. So I think I have answered my own question - I'm going to use ZFS to have better availability, even if it leads to poor performance. (That was the original question: how bad it it to use ZFS for PostgreSQL, instead of the native UFS.)

Moving disks to another server is a very low probability fix for a broken system. The disks are a likely place for the actual failure to happen at in the first place.
Yes, but we don't have to worry about that. raidz2 + hot spare is safe enough. The RAID card is the only single point of failure.
I like to think more in terms of "how can I create a real-time replica of this data?" to protect databases, and the standby server for that doesn't need to be an expensive system. That said, there is no reason to set things up so that they only work with that Intel RAID controller, given that it's not a very good piece of hardware anyway.
I'm not sure how to create a real-time replica. This database is updated frequently. There is always a process that reads/writes into the database. I was thinking about using slony to create slave databases. I have no experience with that. We have a 100Mbit connection. I'm not sure how much bandwidth we need to maintain a real-time slave database. It might be a good idea.

I'm sorry, I feel I'm being off-topic.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Thread: ZFS vs. UFS

Attachment