Thread: PostgreSQL Reliability when fsync = false on Linux-XFS
(Please follow Mail-Followup-To, I'm not on the pgsql-performance mailing list but am on the Linux-XFS mailing list. My apologies too for the cross-post. I'm cc'ing the Linux-XFS mailing list in case people there will be interested in this, too.) Hi, We have a server running PostgreSQL v7.3.3 on Debian GNU/Linux with Linux kernel 2.4.21-xfs. The PostgreSQL data is stored on an XFS[1] partition mounted with the options "rw,noatime,logbufs=8". The machine is an Intel Pentium III 733MHz with 512MB RAM and a four-disk hardware IDE RAID-5 array with a 3ware controller. Among other databases, we have a centralized Snort[2] database that is analyzed by ACIDLab[3]. I noticed performance problems during SELECT and INSERT operations when the tables reach around 200,000 records. Because of timeout issues, the PHP-based ACIDLab can't be used properly. I read the performance section of the ACID FAQ[4] as well as the PostgreSQL "Managing Kernel Resources" document , and so far have tuned my system by setting /proc/sys/kernel/{shmall,shmmax} to 134217728. I also turned off fsync in /etc/postgresql/postgresql.conf. The latter did a LOT to improve INSERT performance, which is now CPU-bound instead of I/O-bound. However, as expected, I am concerned about the reliability penalty this will cause. Our server has been up and running without problems for 67 days since the last reboot, but this doesn't mean it will never hiccup either because of some random problem or because of an extended power outage. Would anyone have "authoritative" information with respect to: - the way PostgreSQL expects data to be written to disk without the fsync calls for things not to get corrupted in the event of a crash, and - the way XFS writes data to disk without the fsync calls that PostgreSQL normally does and how this will affect PostgreSQL data integrity in the event of a system crash? I know that at the end of the day, if I value my data, I must (1) back it up regularly, and (2) keep fsync enabled in PostgreSQL. However given the significance performance hit (at least as far as massive INSERT or UPDATE operations are concerned) and the journalling component of XFS, it would be great to find out just how bad the odds are if the system goes down unexpectedly. Thank you very much for your time. :) --> Jijo Note- I should also have selected RAID10 instead of RAID5, but that's a change I can't afford to do at this point so I have to explore other options. [1] http://oss.sgi.com/projects/xfs/ [2] http://www.snort.org [3] http://acidlab.sourceforge.net [4] http://www.andrew.cmu.edu/~rdanyliw/snort/acid_faq.html#faq_c9 [5] http://developer.postgresql.org/docs/postgres/kernel-resources.html -- Federico Sevilla III : http://jijo.free.net.ph : When we speak of free Network Administrator : The Leather Collection, Inc. : software we refer to GnuPG Key ID : 0x93B746BE : freedom, not price.
> - the way PostgreSQL expects data to be written to disk without the > fsync calls for things not to get corrupted in the event of a crash, > and If you want the filesystem to deal with this, I believe it is necessary for it to write the data out in the same order the write requests are supplied in between ALL PostgreSQL processes. If you can accomplish this, you do not need WAL. There are shortcuts which can be taken in the above, which is where WAL comes in. WAL writes are ordered between processes and WAL of a single process always hits disk prior to commit -- fsync forces both of these. Due to WAL being in place, data can be written at almost any time. The benefit to WAL is a single file fsync rather than the entire database requiring one (PostgreSQL pre-7.1 method). > I know that at the end of the day, if I value my data, I must (1) back > it up regularly, and (2) keep fsync enabled in PostgreSQL. However given > the significance performance hit (at least as far as massive INSERT or If you want good performance, invest in a SCSI controller that has battery backed write cache. A few megs will do it. You will find performance similar to fsync being off (you don't wait for disk rotation) but without the whole dataloss issue. Another alternative is to buy a small 15krpm disk dedicated for WAL. In theory you can achieve one commit per rotation. I assume your inserts are not supplied in Bulk. The fsync overhead is per transaction, not per insert.
Attachment
Rod Taylor kirjutas N, 04.09.2003 kell 06:36: > Another alternative is > to buy a small 15krpm disk dedicated for WAL. In theory you can achieve > one commit per rotation. One commit per rotation would still be only 15000/60. = 250 tps, but fortunately you can get better results if you use multiple concurrent backends, then in the best case you can get one commit per backend per rotation. ----------------- Hannu
On 3 Sep 2003 at 23:36, Rod Taylor wrote: > > - the way PostgreSQL expects data to be written to disk without the > > fsync calls for things not to get corrupted in the event of a crash, > > and > > If you want the filesystem to deal with this, I believe it is necessary > for it to write the data out in the same order the write requests are > supplied in between ALL PostgreSQL processes. If you can accomplish > this, you do not need WAL. > > There are shortcuts which can be taken in the above, which is where WAL > comes in. WAL writes are ordered between processes and WAL of a single > process always hits disk prior to commit -- fsync forces both of these. > Due to WAL being in place, data can be written at almost any time. The > benefit to WAL is a single file fsync rather than the entire database > requiring one (PostgreSQL pre-7.1 method). > > > I know that at the end of the day, if I value my data, I must (1) back > > it up regularly, and (2) keep fsync enabled in PostgreSQL. However given > > the significance performance hit (at least as far as massive INSERT or > > If you want good performance, invest in a SCSI controller that has > battery backed write cache. A few megs will do it. You will find > performance similar to fsync being off (you don't wait for disk > rotation) but without the whole dataloss issue. Another alternative is > to buy a small 15krpm disk dedicated for WAL. In theory you can achieve > one commit per rotation. Just wonderin. What if you symlink WAL to a directory which is on mounted USB RAM drive? Will that increase any throughput? I am sure a 256/512MB flash drive will cost lot less than a SCSI disk. May be even a GB on flash drive would do.. Just a thought.. Bye Shridhar -- Ambition, n: An overmastering desire to be vilified by enemies while living and made ridiculous by friends when dead. -- Ambrose Bierce
> Just wonderin. What if you symlink WAL to a directory which is on > mounted USB RAM drive? USB 2.0 you mean? It supposedly runs at 1394 speeds, but USB 1.0/1.1 runs at 1MB/s under ideal circumstances... that's slower than even old IDE drives. > Will that increase any throughput? Probably not... > I am sure a 256/512MB flash drive will cost lot less than a SCSI > disk. May be even a GB on flash drive would do.. That's true... but on a per $$/MB, you're better off investing in RAM and increasing your effective_cache_size. If dd to a flash card is faster than to an IDE drive, please let me know. :) -sc -- Sean Chittenden UNIX(TM), a BSD like Operating System
Sean Chittenden <sean@chittenden.org> writes: >> Just wonderin. What if you symlink WAL to a directory which is on >> mounted USB RAM drive? > USB 2.0 you mean? It supposedly runs at 1394 speeds, but USB 1.0/1.1 > runs at 1MB/s under ideal circumstances... that's slower than even old > IDE drives. >> Will that increase any throughput? > Probably not... Also, doesn't flash memory have a very limited lifetime in write cycles? Using it as WAL, you'd wear it out PDQ. regards, tom lane
On Thu, 4 Sep 2003, Federico Sevilla III wrote: > (Please follow Mail-Followup-To, I'm not on the pgsql-performance > mailing list but am on the Linux-XFS mailing list. My apologies too for > the cross-post. I'm cc'ing the Linux-XFS mailing list in case people > there will be interested in this, too.) > > > Hi, > > We have a server running PostgreSQL v7.3.3 on Debian GNU/Linux with > Linux kernel 2.4.21-xfs. The PostgreSQL data is stored on an XFS[1] Two points. 1: 7.3.3 has a data loss issue fixed in 7.3.4. You should upgrade to avoid the pain associated with this problem. 2: When you turn off fsync, all bets are off. If the data doesn't get written in the right order, your database may be corrupted if power is shut off.