Thread: -F option
manual page states that -F Disable an automatic fsync() call after each transaction. This option improves performance, but an operating system crash while a transaction is in progress may cause the loss of the most recently entered data. Without the fsync() call the data is buffered by the operating system, and written to disk sometime later. What I would like to know is what 'sometime later' means. Is it one hour? 30 seconds? 30 minutes? 24 hours? I really don't mind losing the last 3 minutes or so of data. If we are talking about 10 hours or so then I will not use that switch. I'm not looking for an answer with milisecond accuracy; just an upper bound +- 5 minutes will be ok. My usage is frequent lookup of small pieces of information, ocassionally insert of small pieces of information and even less frequent update of small pieces of information. I have 7.0.3 on linux 2.2.18 and accessing from Apache::DBI mod_perl. I have about three tables or so with the largest table being over 1000 rows and growing. Is postgres an overkill for this size of data? Thanks
On Mon, Dec 11, 2000 at 06:45:22PM -0500, newsreader@mediaone.net wrote: > manual page states that > > -F Disable an automatic fsync() call after each transaction. This option > improves performance, but an operating system crash while a transaction is > in progress may cause the loss of the most recently entered data. Without > the fsync() call the data is buffered by the operating system, and written > to disk sometime later. > > What I would like to know is what 'sometime later' means. > Is it one hour? 30 seconds? 30 minutes? 24 hours? I really don't > mind losing the last 3 minutes or so of data. If we are talking about 10 > hours or so then I will not use that switch. I'm not looking > for an answer with milisecond accuracy; just an upper bound +- 5 minutes > will be ok. Read "man fsync()". The answer is, "When your operating system gets around to it." The answer depends on a great many factors: hard drive speed, drive bus speed (SCSI or IDE), RAID overhead, if any, file system overhead, processor loading. Whether the proper sacrifices to Murphy were made. Just to list the ones that come easily to mind. Having written disk drivers, I can tell you that you should be OK with a 5 minute latency requirement. In fact, on a reasonably loaded system, I would hope all the data would be written within a minute. But I shan't be greatly concerned unless latency is routinely greater than a minute. I am assuming these are disk writes. With network writes (say, via NFS) all bets are off. -- -- C^2 No windows were crashed in the making of this email. Looking for fine software and/or web pages? http://w3.trib.com/~ccurley
Attachment
Thank you. I'm doing disk writes not nfs -- plain ide drive on an plain dell celeron box. I see though that 'man fsync' does not come down to my level of literacy. I'm quite reassured by your email and will turn on -F switch in no time. Coupled with linux stability I have a feeling that the probability of my losing any data due to an os crash is smaller than that due to some natural disaster striking the server. kz On Mon, Dec 11, 2000 at 05:20:18PM -0700, Charles Curley wrote: > On Mon, Dec 11, 2000 at 06:45:22PM -0500, newsreader@mediaone.net wrote: > > manual page states that > > > > -F Disable an automatic fsync() call after each transaction. This option > Read "man fsync()". > > The answer is, "When your operating system gets around to it." The answer > depends on a great many factors: hard drive speed, drive bus speed (SCSI
newsreader@mediaone.net writes: > What I would like to know is what 'sometime later' means. > Is it one hour? 30 seconds? 30 minutes? 24 hours? On a typical unix setup it's the cycle length of your syncer daemon (typically 30 seconds), plus however long it physically takes the OS to push the data out to the drive and then the drive to get around to writing it. The nearby estimate of 1 minute sounds good to me as a (fairly conservative) upper bound, at least under normal conditions. The standard advice about -F is that it's cool if you trust your OS, your hardware, and your UPS. You do *not* need to worry about Postgres crashes --- the backend will write the data to the kernel at commit in any case. The only question is whether we try to encourage the kernel to push the data down to disk before we report that the transaction has been committed. There is a long thread on pghackers recently to the effect that even without -F, you are at the mercy of disk drive and power supply failures, because fsync() only guarantees that the kernel has given the data to the disk drive; modern disk drives may buffer the data for awhile before they plop it down onto the platter. So, you probably want a UPS in any case. Beyond that, how many kernel crashes and hardware failures have you seen lately? > My usage is frequent lookup of small pieces of information, > ocassionally insert of small pieces of information and even less > frequent update of small pieces of information. OTOH, if you are not doing a lot of insert/update/delete then -F gains little performance anyway... regards, tom lane
On Mon, Dec 11, 2000 at 09:02:54PM -0500, Tom Lane wrote: > newsreader@mediaone.net writes: > > What I would like to know is what 'sometime later' means. > > Is it one hour? 30 seconds? 30 minutes? 24 hours? > > to writing it. The nearby estimate of 1 minute sounds good to me as > > ocassionally insert of small pieces of information and even less > > frequent update of small pieces of information. > > OTOH, if you are not doing a lot of insert/update/delete then -F gains > little performance anyway... > > regards, tom lane No os or hardware crashes for as long as I can remember on this box. I also have ups running properly. Anyway before I started using postgres -- just last week -- I was reading in data from dbm file. The main purpose was just to lookup one record out of over 1000 but I felt that as the size of that file grow I might start to see performance hit in the future. As far as I can see Apache::DBI is keeping the backends alive. Whenever I check top I see that there is always the exact same number of apache children as the number of postgres backends. So... my semi educated guess is that data is being retrieved at least as fast as opening a dbm file and picking out a record. No?? In any case the real reason I started using postgres was that after graduating my server to mod_perl I was having a nightmare getting my previously working?? file locking mechanisms to work right. With postgres I just don't have this kind of problem. As a side effect my codes are much cleaner now because of the sheer power of the postgres. Thanks much kz