Re: FAQ Q - Mailing list pgsql-admin
From | Tom Lane |
---|---|
Subject | Re: FAQ Q |
Date | |
Msg-id | 10978.1023759032@sss.pgh.pa.us Whole thread Raw |
In response to | Re: FAQ Q (Tim Ellis <Tim.Ellis@gamet.com>) |
Responses |
Re: FAQ Q
|
List | pgsql-admin |
Tim Ellis <Tim.Ellis@gamet.com> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> You do not have to trust Postgres itself: in all cases we push the log >> entries out to the OS before declaring a transaction committed. > In all cases while fsync mode is enabled, not in all cases (including > no-fsync) right? All cases is what I said, and all cases is what I meant. fsync mode only controls whether there's an fsync() after the write(). > So no-fsync simply means that "commit" != "sync'd to disk"? How is this > different than the big boys do it? From Sybase internals classes and > Oracle documentation, I know that when you say "COMMIT" you don't get any > response back until the RDBMS has written that transaction TO DISK. Or thinks it has. One of the points that people don't much like to talk about is the difficulty of knowing how far the data has really been pushed. In fsync mode we report commit when the kernel has told us it's written the data; we have no way of knowing whether the kernel lied, and even less way of knowing whether the disk drive has actually written the information or only cached it on the controller board. On modern disk drives the truth is likely to be that the bits have only gone as far as the drive controller; so you may lose data if you lose power and the power supply does not have enough capacity to hold up the drive while it's finishing its pending writes. However, the big boys running on the same hardware are going to have the same problem. If you can figure out how to configure your drives not to report write complete until it's really complete, then you can feel secure with either Postgres or the big boys. Or you can buy a UPS and make sure you are configured to shut down before the UPS runs out of steam. In no-fsync mode we report commit after pushing the data out to the kernel, but we don't try to force the kernel to push it out to disk. So you are safe against a Postgres crash, but not against kernel or hardware failure. Your transaction could be lost if the system dies before the kernel gets around to syncing it out to disk (typically 30 sec max, on most Unixen). > Is the FAQ being too generous in speed and too unfair in recoverability to > the commercial offerings here? I didn't write the FAQ and won't take responsibility for its claims about commercial databases. I really don't know what tradeoffs they offer in this area. > Sybase at least just tries to avoid writing tranlogs to disk except in > batches, so when you say COMMIT, it pauses until X amount of time passes > or Y number of committed transactions have piled up, then it writes those > trans to disk. But your client is blocked on the COMMIT until the > transaction actually makes it to disk. We have that too, although it's not on by default, and probably needs further tuning work. > In 7.1/7.2, when you COMMIT, you are guaranteed that WAL (and probably > nothing but WAL) is written to disk before you get a return and that > therefore if your instance immediately thereafter crashes, that after > recovery (which is required), you will have a consistent (not-corrupted) > database. Modulo the above issues, yes. Note also that WAL fsync protects you against data corruption in the case of a mid-transaction system-level failure: if all the WAL updates have made it to disk, then we will be able to fix any incomplete or missing writes in the data files. Without fsync there is a distinct risk of corrupted data. This has to do with forcing fsync on the WAL files before we start to modify data pages intra-transaction. regards, tom lane
pgsql-admin by date: