Thread: ZFS vs. UFS
Hello,
Under FreeBSD 9, what filesystem should I use for PostgreSQL? (Dell PowerEdge 2900, 24G mem, 10x2T SATA2 disk, Intel RAID controller.)
- ZFS is journaled, and it is more independent of the hardware. So if the computer goes wrong, I can move the zfs array to a different server.
- UFS is not journaled. Also I have to rely on the RAID card to build the RAID array. If there is a hw problem with it, then I won't be able to recover the data easily.
I wonder if UFS has better performance or not. Or can you suggest another fs? Just of the PGDATA directory.
Thanks,
Laszlo
Hi. As far as I know UFS is faster than ZFS on FreeBSD 9.0. Some users reported stability problem with ZFS on AMD64 and maybe UFS is better choice. Best regards Georgi On 07/24/2012 03:51 PM, Laszlo Nagy wrote: > > Hello, > > Under FreeBSD 9, what filesystem should I use for PostgreSQL? (Dell > PowerEdge 2900, 24G mem, 10x2T SATA2 disk, Intel RAID controller.) > > * ZFS is journaled, and it is more independent of the hardware. So > if the computer goes wrong, I can move the zfs array to a > different server. > * UFS is not journaled. Also I have to rely on the RAID card to > build the RAID array. If there is a hw problem with it, then I > won't be able to recover the data easily. > > I wonder if UFS has better performance or not. Or can you suggest > another fs? Just of the PGDATA directory. > > Thanks, > > Laszlo >
On 24/07/2012 14:51, Laszlo Nagy wrote: > > Hello, > > Under FreeBSD 9, what filesystem should I use for PostgreSQL? (Dell > PowerEdge 2900, 24G mem, 10x2T SATA2 disk, Intel RAID controller.) > > * ZFS is journaled, and it is more independent of the hardware. So if > the computer goes wrong, I can move the zfs array to a different server. > * UFS is not journaled. Also I have to rely on the RAID card to build > the RAID array. If there is a hw problem with it, then I won't be > able to recover the data easily. > > I wonder if UFS has better performance or not. Or can you suggest > another fs? Just of the PGDATA directory. Hi, I think you might actually get a bit more performance out of ZFS, depending on your load, server configuration and (more so) the tuning of ZFS... however UFS is IMO more stable so I use it more often. A hardware RAID card would be good to have, but you can use soft-RAID the usual way and not be locked-in by the controller. You can activate softupdates-journalling on UFS if you really want it, but I find that regular softupdates is perfectly fine for PostgreSQL, which has its own journalling.
Attachment
Hello, The Postres 9.0 database we use gets about 20K inserts per minute. As long as you don't query at the same time the database is copying fine. However long running queries seems to delay so much the db that the application server buffers the incoming data as it cannot insert them fast enough. The server has 4 HD. One is used for archive, past static tables, the second is the index of the current live tables and the third is the current data. The fourth is the OS. The serve specs are: Intel(R) Xeon(R) CPU W3520 @ 2.67GHz 4 cores 18GB Ram Do you think that this work load is high that requires an upgrade to cluster or RAID 10 to cope with it? Kind Regards Yiannis
On 24.07.2012 14:51, Laszlo Nagy wrote: > * UFS is not journaled. There is journal support for UFS as far as i know. Please have a look at the gjournal manpage. Greetings, Torsten
On Tue, Jul 24, 2012 at 6:22 AM, Ioannis Anagnostopoulos <ioannis@anatec.com> wrote:
You need to learn more about what exactly is your bottleneck ... memory, CPU, or I/O. That said, I suspect you'd be way better off with this hardware if you built a single software RAID 10 array and put everything on it.
Right now, the backup disk and the OS disk are sitting idle most of the time. With a RAID10 array, you'd at least double, maybe quadruple your I/O. And if you added a battery-backed RAID controller, you'd have a pretty fast system.
Craig
Hello,
The Postres 9.0 database we use gets about 20K inserts per minute. As long as you don't query at the same time the database is copying fine. However long running queries seems to delay so much the db that the application server buffers the incoming data as it cannot insert them fast enough. The server has 4 HD. One is used for archive, past static tables, the second is the index of the current live tables and the third is the current data. The fourth is the OS.
The serve specs are:
Intel(R) Xeon(R) CPU W3520 @ 2.67GHz
4 cores
18GB Ram
Do you think that this work load is high that requires an upgrade to cluster or RAID 10 to cope with it?
You need to learn more about what exactly is your bottleneck ... memory, CPU, or I/O. That said, I suspect you'd be way better off with this hardware if you built a single software RAID 10 array and put everything on it.
Right now, the backup disk and the OS disk are sitting idle most of the time. With a RAID10 array, you'd at least double, maybe quadruple your I/O. And if you added a battery-backed RAID controller, you'd have a pretty fast system.
Craig
Kind Regards
Yiannis
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On 24/07/2012 14:51, Laszlo Nagy wrote:
>
> Hello,
>
> Under FreeBSD 9, what filesystem should I use for PostgreSQL? (Dell
> PowerEdge 2900, 24G mem, 10x2T SATA2 disk, Intel RAID controller.)
>
> * ZFS is journaled, and it is more independent of the hardware. So if
> the computer goes wrong, I can move the zfs array to a different server.
> * UFS is not journaled. Also I have to rely on the RAID card to build
> the RAID array. If there is a hw problem with it, then I won't be
> able to recover the data easily.
>
> I wonder if UFS has better performance or not. Or can you suggest
> another fs? Just of the PGDATA directory.
Relying on physically moving a disk isn't a good backup/recovery strategy. Disks are the least reliable single component in a modern computer. You should figure out the best file system for your application, and separately figure out a recovery strategy, one that can survive the failure of *any* component in your system, including the disk itself.
Craig
On 24/07/2012 15:30, Craig James wrote:
I can only assume that it is an i/o issue. At last this is what I can read from iostat:On Tue, Jul 24, 2012 at 6:22 AM, Ioannis Anagnostopoulos <ioannis@anatec.com> wrote:Hello,
The Postres 9.0 database we use gets about 20K inserts per minute. As long as you don't query at the same time the database is copying fine. However long running queries seems to delay so much the db that the application server buffers the incoming data as it cannot insert them fast enough. The server has 4 HD. One is used for archive, past static tables, the second is the index of the current live tables and the third is the current data. The fourth is the OS.
The serve specs are:
Intel(R) Xeon(R) CPU W3520 @ 2.67GHz
4 cores
18GB Ram
Do you think that this work load is high that requires an upgrade to cluster or RAID 10 to cope with it?
You need to learn more about what exactly is your bottleneck ... memory, CPU, or I/O. That said, I suspect you'd be way better off with this hardware if you built a single software RAID 10 array and put everything on it.
Right now, the backup disk and the OS disk are sitting idle most of the time. With a RAID10 array, you'd at least double, maybe quadruple your I/O. And if you added a battery-backed RAID controller, you'd have a pretty fast system.
Craig
Kind Regards
Yiannis
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 277.50 0.00 20.00 0.00 2344.00 117.20 0.09 2.25 4.50 9.00
sdb 1.00 0.50 207.50 4.50 45228.00 33.50 213.50 2.40 11.34 4.13 87.50
sdc 0.00 0.00 29.50 0.00 4916.00 0.00 166.64 0.11 3.73 1.36 4.00
sdd 0.00 0.00 4.00 179.50 96.00 3010.00 16.93 141.25 828.77 5.45 100.00
avg-cpu: %user %nice %system %iowait %steal %idle
7.60 0.00 2.08 46.45 0.00 43.87
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 61.50 0.00 28.00 0.00 704.00 25.14 0.04 3.04 1.43 4.00
sdb 2.00 0.00 90.50 162.00 19560.00 2992.00 89.31 78.92 194.26 3.76 95.00
sdc 0.00 0.00 10.50 0.00 2160.00 0.00 205.71 0.02 1.90 1.90 2.00
sdd 0.00 0.00 1.50 318.50 24.00 5347.00 16.78 134.72 572.81 3.12 100.00
Where sdb is the data disk and sdd is the index disk. "Top" hardly reports anything more than 10% per postgress process ever, while when the query is running, these numbers on iostat are consistatnly high. At least I can identify my buffering the moment that index hits 100% util. Is there any other way that I can identify bottlenecks in a more positive way?
This is why I use a RAID array of 10 disks. So there is no single point of failure. What else could I do? (Yes, I can make regular backups, but that is not the same. I can still loose data...)> I wonder if UFS has better performance or not. Or can you suggest
> another fs? Just of the PGDATA directory.
Relying on physically moving a disk isn't a good backup/recovery strategy. Disks are the least reliable single component in a modern computer. You should figure out the best file system for your application, and separately figure out a recovery strategy, one that can survive the failure of *any* component in your system, including the disk itself.
> On 24.07.2012 14:51, Laszlo Nagy wrote: > >> * UFS is not journaled. > > There is journal support for UFS as far as i know. Please have a look > at the gjournal manpage. Yes, but gjournal works for disk devices. I would have rely on the hw card for RAID. When the card goes wrong I won't be able to access my data. I could also buy an identical RAID card. In fact I could buy a complete backup server. But right now I don't have the money for that. So I would like to use a solution that allows me to recover from a failure even if the RAID card goes wrong. It might also be possible to combine gmirror + gjournal, but that is not good enough. Performance and stability of a simple gmirror with two disks is much worse then a raidz array with 10 disks (and hot spare), or even a raid 1+0 (and hot spare) that is supported by the hw RAID card. So I would like to stick with UFS+hw card support (and then I need to buy an identical RAID card if I can), or ZFS.
On Tue, Jul 24, 2012 at 11:27 AM, Laszlo Nagy <gandalf@shopzeus.com> wrote:
This is why I use a RAID array of 10 disks. So there is no single point of failure. What else could I do? (Yes, I can make regular backups, but that is not the same. I can still loose data...)> I wonder if UFS has better performance or not. Or can you suggest
> another fs? Just of the PGDATA directory.
Relying on physically moving a disk isn't a good backup/recovery strategy. Disks are the least reliable single component in a modern computer. You should figure out the best file system for your application, and separately figure out a recovery strategy, one that can survive the failure of *any* component in your system, including the disk itself.
Only you can answer that because it depends on your application. If you're operating PayPal, you probably want 24/7 100% reliability. If you're operating a social networking site for teenagers, losing data is probably not a catastrophe.
In my experience, most data loss is NOT from equipment failure. It's from software bugs and operator errors. If your recovery plan doesn't cover this, you have a problem.
Craig
>> On 24.07.2012 14:51, Laszlo Nagy wrote: >> >>> * UFS is not journaled. >> >> There is journal support for UFS as far as i know. Please have a look >> at the gjournal manpage. > > Yes, but gjournal works for disk devices. That isn't completly correct! gjournal works with all GEOM-devices, which could be not only disk devices, but also (remote) disk devices, (remote) files, (remote) software-raids etc. It is very easy to mirror the *complete* disk from one *server* to another. I use this technic for customers which need cheap backups of their complete server. But a RAID card will be much faster than this. I just wanted to make this clear. Greetings, Torsten
On 07/24/2012 08:51 AM, Laszlo Nagy wrote:
When Intel RAID controller is that? All of the ones on the motherboard are pretty much useless if that's what you have. Those are slower than software RAID and it's going to add driver issues you could otherwise avoid. Better to connect the drives to the non-RAID ports or configure the controller in JBOD mode first.
Using one of the better RAID controllers, one of Dell's good PERC models for example, is one of the biggest hardware upgrades you could make to this server. If your database is mostly read traffic, it won't matter very much. Write-heavy loads really benefit from a good RAID controller's write cache.
You should be able to get UFS working with a software mirror and journaling using gstripe/gmirror or vinum. It doesn't matter that much for PostgreSQL though. The data writes are journaled by the database, and it tries to sync data to disk after updating metadata too. There are plenty of PostgreSQL installs on FreeBSD/UFS that work fine.
ZFS needs more RAM and has higher CPU overhead than UFS does. It's a heavier filesystem all around than UFS is. Your server is fast enough that you should be able to afford it though, and the feature set is nice. In addition to the RAID setup being simple to handle, having checksums on your data is a good safety feature for PostgreSQL.
ZFS will heavily use server RAM for caching by default, much more so than UFS. Make sure you check into that, and leave enough RAM for the database to run too. (Doing *some* caching that way is good for Postgres; you just don't want *all* the memory to be used for that)
Moving disks to another server is a very low probability fix for a broken system. The disks are a likely place for the actual failure to happen at in the first place. I like to think more in terms of "how can I create a real-time replica of this data?" to protect databases, and the standby server for that doesn't need to be an expensive system. That said, there is no reason to set things up so that they only work with that Intel RAID controller, given that it's not a very good piece of hardware anyway.
Under FreeBSD 9, what filesystem should I use for PostgreSQL? (Dell PowerEdge 2900, 24G mem, 10x2T SATA2 disk, Intel RAID controller.)
When Intel RAID controller is that? All of the ones on the motherboard are pretty much useless if that's what you have. Those are slower than software RAID and it's going to add driver issues you could otherwise avoid. Better to connect the drives to the non-RAID ports or configure the controller in JBOD mode first.
Using one of the better RAID controllers, one of Dell's good PERC models for example, is one of the biggest hardware upgrades you could make to this server. If your database is mostly read traffic, it won't matter very much. Write-heavy loads really benefit from a good RAID controller's write cache.
- ZFS is journaled, and it is more independent of the hardware. So if the computer goes wrong, I can move the zfs array to a different server.
- UFS is not journaled. Also I have to rely on the RAID card to build the RAID array. If there is a hw problem with it, then I won't be able to recover the data easily.
You should be able to get UFS working with a software mirror and journaling using gstripe/gmirror or vinum. It doesn't matter that much for PostgreSQL though. The data writes are journaled by the database, and it tries to sync data to disk after updating metadata too. There are plenty of PostgreSQL installs on FreeBSD/UFS that work fine.
ZFS needs more RAM and has higher CPU overhead than UFS does. It's a heavier filesystem all around than UFS is. Your server is fast enough that you should be able to afford it though, and the feature set is nice. In addition to the RAID setup being simple to handle, having checksums on your data is a good safety feature for PostgreSQL.
ZFS will heavily use server RAM for caching by default, much more so than UFS. Make sure you check into that, and leave enough RAM for the database to run too. (Doing *some* caching that way is good for Postgres; you just don't want *all* the memory to be used for that)
Moving disks to another server is a very low probability fix for a broken system. The disks are a likely place for the actual failure to happen at in the first place. I like to think more in terms of "how can I create a real-time replica of this data?" to protect databases, and the standby server for that doesn't need to be an expensive system. That said, there is no reason to set things up so that they only work with that Intel RAID controller, given that it's not a very good piece of hardware anyway.
-- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
> When Intel RAID controller is that? All of the ones on the > motherboard are pretty much useless if that's what you have. Those are > slower than software RAID and it's going to add driver issues you > could otherwise avoid. Better to connect the drives to the non-RAID > ports or configure the controller in JBOD mode first. > > Using one of the better RAID controllers, one of Dell's good PERC > models for example, is one of the biggest hardware upgrades you could > make to this server. If your database is mostly read traffic, it > won't matter very much. Write-heavy loads really benefit from a good > RAID controller's write cache. Actually, it is a PERC with write-cache and BBU. > > ZFS will heavily use server RAM for caching by default, much more so > than UFS. Make sure you check into that, and leave enough RAM for the > database to run too. (Doing *some* caching that way is good for > Postgres; you just don't want *all* the memory to be used for that) Right now, the size of the database is below 5GB. So I guess it will fit into memory. I'm concerned about data safety and availability. I have been in a situation where the RAID card went wrong and I was not able to recover the data because I could not get an identical RAID card in time. I have also been in a situation where the system was crashing two times a day, and we didn't know why. (As it turned out, it was a bug in the "stable" kernel and we could not identify this for two weeks.) However, we had to do fsck after every crash. With a 10TB disk array, it was extremely painful. ZFS is much better: short recovery time and it is RAID card independent. So I think I have answered my own question - I'm going to use ZFS to have better availability, even if it leads to poor performance. (That was the original question: how bad it it to use ZFS for PostgreSQL, instead of the native UFS.) > > Moving disks to another server is a very low probability fix for a > broken system. The disks are a likely place for the actual failure to > happen at in the first place. Yes, but we don't have to worry about that. raidz2 + hot spare is safe enough. The RAID card is the only single point of failure. > I like to think more in terms of "how can I create a real-time replica > of this data?" to protect databases, and the standby server for that > doesn't need to be an expensive system. That said, there is no reason > to set things up so that they only work with that Intel RAID > controller, given that it's not a very good piece of hardware anyway. I'm not sure how to create a real-time replica. This database is updated frequently. There is always a process that reads/writes into the database. I was thinking about using slony to create slave databases. I have no experience with that. We have a 100Mbit connection. I'm not sure how much bandwidth we need to maintain a real-time slave database. It might be a good idea. I'm sorry, I feel I'm being off-topic.
On Tue, Jul 31, 2012 at 1:50 AM, Laszlo Nagy <gandalf@shopzeus.com> wrote:
Last time I checked, "PERC" was a meaningless name. Dell put that label on a variety of different controllers ... some were quite good, some were terrible. The latest PERC controllers are pretty good. If your machine is a few years old, the PERC controller may be a piece of junk.
Craig
When Intel RAID controller is that? All of the ones on the motherboard are pretty much useless if that's what you have. Those are slower than software RAID and it's going to add driver issues you could otherwise avoid. Better to connect the drives to the non-RAID ports or configure the controller in JBOD mode first.Actually, it is a PERC with write-cache and BBU.
Using one of the better RAID controllers, one of Dell's good PERC models for example, is one of the biggest hardware upgrades you could make to this server. If your database is mostly read traffic, it won't matter very much. Write-heavy loads really benefit from a good RAID controller's write cache.
Last time I checked, "PERC" was a meaningless name. Dell put that label on a variety of different controllers ... some were quite good, some were terrible. The latest PERC controllers are pretty good. If your machine is a few years old, the PERC controller may be a piece of junk.
Craig
Right now, the size of the database is below 5GB. So I guess it will fit into memory. I'm concerned about data safety and availability. I have been in a situation where the RAID card went wrong and I was not able to recover the data because I could not get an identical RAID card in time. I have also been in a situation where the system was crashing two times a day, and we didn't know why. (As it turned out, it was a bug in the "stable" kernel and we could not identify this for two weeks.) However, we had to do fsck after every crash. With a 10TB disk array, it was extremely painful. ZFS is much better: short recovery time and it is RAID card independent. So I think I have answered my own question - I'm going to use ZFS to have better availability, even if it leads to poor performance. (That was the original question: how bad it it to use ZFS for PostgreSQL, instead of the native UFS.)
ZFS will heavily use server RAM for caching by default, much more so than UFS. Make sure you check into that, and leave enough RAM for the database to run too. (Doing *some* caching that way is good for Postgres; you just don't want *all* the memory to be used for that)Yes, but we don't have to worry about that. raidz2 + hot spare is safe enough. The RAID card is the only single point of failure.
Moving disks to another server is a very low probability fix for a broken system. The disks are a likely place for the actual failure to happen at in the first place.I like to think more in terms of "how can I create a real-time replica of this data?" to protect databases, and the standby server for that doesn't need to be an expensive system. That said, there is no reason to set things up so that they only work with that Intel RAID controller, given that it's not a very good piece of hardware anyway.I'm not sure how to create a real-time replica. This database is updated frequently. There is always a process that reads/writes into the database. I was thinking about using slony to create slave databases. I have no experience with that. We have a 100Mbit connection. I'm not sure how much bandwidth we need to maintain a real-time slave database. It might be a good idea.
I'm sorry, I feel I'm being off-topic.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance