Thread: Best filesystem for a high load db
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, I need to install a Postgresql 9.4 cluster with two servers at first instance. The database will be used in an enviroment with high load, a lot number of clients will do queries that require fast replies. I have to install Ubuntu 14.04 as OS and I would like to know which filesystem you would suggest me to use. I have read that ZFS with ZIL is now mature for a production server: do you think it is that true or that it would be better to stay safer with a common EXT4? Thanks in advance. Maila Fatticcioni -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iEYEARECAAYFAlR0o68ACgkQi2q3wPb3FcM0/ACgwWEPZ+8dhvJPcjvoZq0GSfDJ 1p8AnjNFd2Mb9Nms4jTfRywn0MBmtVAW =wvKu -----END PGP SIGNATURE-----
On Tue, 25 Nov 2014 16:43:49 +0100 Maila Fatticcioni <mfatticcioni@mbigroup.it> wrote: > Hello, > I need to install a Postgresql 9.4 cluster with two servers at first > instance. The database will be used in an enviroment with high load, a > lot number of clients will do queries that require fast replies. > I have to install Ubuntu 14.04 as OS and I would like to know which > filesystem you would suggest me to use. I have read that ZFS with ZIL > is now mature for a production server: do you think it is that true or > that it would be better to stay safer with a common EXT4? We did a lot of research into this at my last employer. Unfortunatley, I don't remember most of it, but I'll relay what I do remember. Anything with a journal is a performance problem. PostgreSQL effectivly does its own journalling with the WAL logs. That's not to say that there's no value to crash recovery to having a journalling filesystem, but it's just to say that our experience showed journaling filesystems to be slower. That rules out ext4, unless you disable the journal. I seem to remember ext4 with journalling disabled being one of the faster filesystems, but I could be remembering wrong. At the time we tested it, ZFS performance was terrible for databases. ZFS continues to evolve, so that information might not be accurate any more. Now ... if I remember correctly, we decided that XFS was the fastest of the ones we tested. If I'm remembering correctly, the performance of XFS was only a little bit better than ext4 with journalling disabled and there was considerable debate as to whether it was worth going with something less widely used and supported (XFS) to gain only a tiny % of improvement over something heavily used and tested (ext4). Sorry I can't be more definite, but I don't have access to the information any more. On a related note, I remember that the sysops guys had a list of mount options that they used to improve performance ... noatime being one that I remember, but whatever filesystem you use, be sure to research and tweak the mount options. -- Bill Moran I need your help to succeed: http://gamesbybill.com
Re: Bill Moran 2014-11-25 <20141125111630.d05d58a9eb083c7cf80ed9f8@potentialtech.com> > Anything with a journal is a performance problem. PostgreSQL effectivly > does its own journalling with the WAL logs. That's not to say that there's > no value to crash recovery to having a journalling filesystem, but it's > just to say that our experience showed journaling filesystems to be slower. > That rules out ext4, unless you disable the journal. I seem to remember > ext4 with journalling disabled being one of the faster filesystems, but I > could be remembering wrong. If you are using a non-journalling FS, you'll be waiting for a full fsck after a system crash. Not sure that's an improvement. Christoph -- cb@df7cb.de | http://www.df7cb.de/
On Tue, 25 Nov 2014 17:27:18 +0100 Christoph Berg <cb@df7cb.de> wrote: > Re: Bill Moran 2014-11-25 <20141125111630.d05d58a9eb083c7cf80ed9f8@potentialtech.com> > > Anything with a journal is a performance problem. PostgreSQL effectivly > > does its own journalling with the WAL logs. That's not to say that there's > > no value to crash recovery to having a journalling filesystem, but it's > > just to say that our experience showed journaling filesystems to be slower. > > That rules out ext4, unless you disable the journal. I seem to remember > > ext4 with journalling disabled being one of the faster filesystems, but I > > could be remembering wrong. > > If you are using a non-journalling FS, you'll be waiting for a full > fsck after a system crash. Not sure that's an improvement. It's an improvement if: a) You're investing in high-quality hardware, so the chance of a system crash is very low. b) The database is replicated, so your plan in the event of a primary crash is to fail over to the backup anyway. If both of those are in place (as they were at my previous job) then the time it takes to fsck isn't an issue, and taking action that causes the database to run faster when nothing is wrong can be considered. Obviously, the OP needs to assess the specific needs of the product in question. Your point is very valid, and I'm glad you brought it up (as a lot of people forget about it) but sometimes it's not the most important factor. -- Bill Moran I need your help to succeed: http://gamesbybill.com
On 11/25/2014 9:43 AM, Maila Fatticcioni wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hello, > I need to install a Postgresql 9.4 cluster with two servers at first > instance. The database will be used in an enviroment with high load, a > lot number of clients will do queries that require fast replies. > I have to install Ubuntu 14.04 as OS and I would like to know which > filesystem you would suggest me to use. I have read that ZFS with ZIL > is now mature for a production server: do you think it is that true or > that it would be better to stay safer with a common EXT4? > > Thanks in advance. > > Maila Fatticcioni > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1 > > iEYEARECAAYFAlR0o68ACgkQi2q3wPb3FcM0/ACgwWEPZ+8dhvJPcjvoZq0GSfDJ > 1p8AnjNFd2Mb9Nms4jTfRywn0MBmtVAW > =wvKu > -----END PGP SIGNATURE----- > > A long while back I benched Ext4 vs xfs, and xfs won. I've been using xfs for a long while now and love it. You don't need to guess with mount options anymore. The defaults are best. -Andy
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 11/25/2014 05:54 PM, Bill Moran wrote: > On Tue, 25 Nov 2014 17:27:18 +0100 Christoph Berg <cb@df7cb.de> > wrote: > >> Re: Bill Moran 2014-11-25 >> <20141125111630.d05d58a9eb083c7cf80ed9f8@potentialtech.com> >>> Anything with a journal is a performance problem. PostgreSQL >>> effectivly does its own journalling with the WAL logs. That's >>> not to say that there's no value to crash recovery to having a >>> journalling filesystem, but it's just to say that our >>> experience showed journaling filesystems to be slower. That >>> rules out ext4, unless you disable the journal. I seem to >>> remember ext4 with journalling disabled being one of the faster >>> filesystems, but I could be remembering wrong. >> >> If you are using a non-journalling FS, you'll be waiting for a >> full fsck after a system crash. Not sure that's an improvement. > > It's an improvement if: a) You're investing in high-quality > hardware, so the chance of a system crash is very low. b) The > database is replicated, so your plan in the event of a primary > crash is to fail over to the backup anyway. > > If both of those are in place (as they were at my previous job) > then the time it takes to fsck isn't an issue, and taking action > that causes the database to run faster when nothing is wrong can be > considered. > > Obviously, the OP needs to assess the specific needs of the product > in question. Your point is very valid, and I'm glad you brought it > up (as a lot of people forget about it) but sometimes it's not the > most important factor. > Thank you a lot to have shared with me your experiences. Indeed we will have two servers in cluster with high quality hardware so a fsck restore shouldn't be a big problem. I will analize the xfs option as well and then I will decide. Thank you again, Maila Fatticcioni - -- ______________________________________________________________ Maila Fatticcioni ______________________________________________________________ Mediterranean Broadband Infrastructure s.r.l. via Francesco Squartini n°7 56121 Ospedaletto (PI) - ITALY ______________________________________________________________ Phone: +39(050)3870851 Fax: +39(050)3870809 ______________________________________________________________ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iEYEARECAAYFAlR1qI4ACgkQi2q3wPb3FcO72QCg2zEq+5SRfpcVkq8+QprPHiu1 SZ4An3cVJCRePrIlNDQFLJde3uLYoS0k =/FjW -----END PGP SIGNATURE-----
On 11/26/2014 4:16 AM, Maila Fatticcioni wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 11/25/2014 05:54 PM, Bill Moran wrote: >> On Tue, 25 Nov 2014 17:27:18 +0100 Christoph Berg <cb@df7cb.de> >> wrote: >> >>> Re: Bill Moran 2014-11-25 >>> <20141125111630.d05d58a9eb083c7cf80ed9f8@potentialtech.com> >>>> Anything with a journal is a performance problem. PostgreSQL >>>> effectivly does its own journalling with the WAL logs. That's >>>> not to say that there's no value to crash recovery to having a >>>> journalling filesystem, but it's just to say that our >>>> experience showed journaling filesystems to be slower. That >>>> rules out ext4, unless you disable the journal. I seem to >>>> remember ext4 with journalling disabled being one of the faster >>>> filesystems, but I could be remembering wrong. >>> >>> If you are using a non-journalling FS, you'll be waiting for a >>> full fsck after a system crash. Not sure that's an improvement. >> >> It's an improvement if: a) You're investing in high-quality >> hardware, so the chance of a system crash is very low. b) The >> database is replicated, so your plan in the event of a primary >> crash is to fail over to the backup anyway. >> >> If both of those are in place (as they were at my previous job) >> then the time it takes to fsck isn't an issue, and taking action >> that causes the database to run faster when nothing is wrong can be >> considered. >> >> Obviously, the OP needs to assess the specific needs of the product >> in question. Your point is very valid, and I'm glad you brought it >> up (as a lot of people forget about it) but sometimes it's not the >> most important factor. >> > > Thank you a lot to have shared with me your experiences. > Indeed we will have two servers in cluster with high quality hardware > so a fsck restore shouldn't be a big problem. > I will analize the xfs option as well and then I will decide. > > Thank you again, > Maila Fatticcioni > Also, if you do some timings, please share it with us, it'd be nice to have some more data points. -Andy
Currently I use FreeBSD 10 with ZFS filesystem for our Production database. Speed wise it's fine, i'm sure other filesystems could be faster, even though we have never compared it with other filesystems. The reason we do ZFS is to take advantage of the data compression and snapshots. It is very easy to generate a new slave just by copying the filesystem to another machine. Having different compression for tablespaces that don't get accessed as much, or tablespaces on faster disks. Doing big data migrations or pushes we are able to rollback if something fails. Also when upgrading to a newer version of Postgres, just take a snapshot and upgrade that.
Same with database backups. We issue a pg_start_backup(), take a few snapshots, issue pg_stop_backup(). Then ship the entire filesystem to a different machine and that's your backup.
One thing I am pushing to do is using SSDs for the ZIL and L2ARC. This would allow for a pretty nice boost in speed.
-Joseph
On Wed, Nov 26, 2014 at 9:50 AM, Andy Colson <andy@squeakycode.net> wrote:
On 11/26/2014 4:16 AM, Maila Fatticcioni wrote:-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 11/25/2014 05:54 PM, Bill Moran wrote:On Tue, 25 Nov 2014 17:27:18 +0100 Christoph Berg <cb@df7cb.de>
wrote:Re: Bill Moran 2014-11-25
<20141125111630.d05d58a9eb083c7cf80ed9f8@potentialtech.com>Anything with a journal is a performance problem. PostgreSQL
effectivly does its own journalling with the WAL logs. That's
not to say that there's no value to crash recovery to having a
journalling filesystem, but it's just to say that our
experience showed journaling filesystems to be slower. That
rules out ext4, unless you disable the journal. I seem to
remember ext4 with journalling disabled being one of the faster
filesystems, but I could be remembering wrong.
If you are using a non-journalling FS, you'll be waiting for a
full fsck after a system crash. Not sure that's an improvement.
It's an improvement if: a) You're investing in high-quality
hardware, so the chance of a system crash is very low. b) The
database is replicated, so your plan in the event of a primary
crash is to fail over to the backup anyway.
If both of those are in place (as they were at my previous job)
then the time it takes to fsck isn't an issue, and taking action
that causes the database to run faster when nothing is wrong can be
considered.
Obviously, the OP needs to assess the specific needs of the product
in question. Your point is very valid, and I'm glad you brought it
up (as a lot of people forget about it) but sometimes it's not the
most important factor.
Thank you a lot to have shared with me your experiences.
Indeed we will have two servers in cluster with high quality hardware
so a fsck restore shouldn't be a big problem.
I will analize the xfs option as well and then I will decide.
Thank you again,
Maila Fatticcioni
Also, if you do some timings, please share it with us, it'd be nice to have some more data points.
-Andy
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Wed, Nov 26, 2014 at 10:53 AM, Joseph Kregloh <jkregloh@sproutloud.com> wrote:
One thing I am pushing to do is using SSDs for the ZIL and L2ARC. This would allow for a pretty nice boost in speed.
I use ZFS on freebsd as my large production server configuration as well. I have SSDs for the L2ARC and that works *really* well. I do not use a ZIL at all. Using the ZFS compression is a big win, along with all the other management advantages you already outlined.
There is a minor speed penalty, but the other advantages far outweigh those. That and throwing 256GB of RAM at the problem makes it more or less go away.