Thread: pgsql-server: Add: > > * Allow buffered WAL writes and fsync > >
pgsql-server: Add: > > * Allow buffered WAL writes and fsync > >
From
momjian@svr1.postgresql.org (Bruce Momjian)
Date:
Log Message: ----------- Add: > > * Allow buffered WAL writes and fsync > > Instead of guaranteeing recovery of all committed transactions, this > would provide improved performance by delaying WAL writes and fsync > so an abrupt operating system restart might lose a few seconds of > committed transactions but still be consistent. We could perhaps > remove the 'fsync' parameter (which results in an an inconsistent > database) in favor of this capability. Modified Files: -------------- pgsql-server/doc: TODO (r1.1328 -> r1.1329) (http://developer.postgresql.org/cvsweb.cgi/pgsql-server/doc/TODO.diff?r1=1.1328&r2=1.1329)
momjian@svr1.postgresql.org (Bruce Momjian) writes: > Add: >> * Allow buffered WAL writes and fsync >> >> Instead of guaranteeing recovery of all committed transactions, this >> would provide improved performance by delaying WAL writes and fsync >> so an abrupt operating system restart might lose a few seconds of >> committed transactions but still be consistent. Who exactly signed onto this as a good idea? It sure doesn't square with my ideas of an ACID database. Committed means committed, not "maybe if you're lucky committed". regards, tom lane
Tom Lane wrote: > momjian@svr1.postgresql.org (Bruce Momjian) writes: > > Add: > >> * Allow buffered WAL writes and fsync > >> > >> Instead of guaranteeing recovery of all committed transactions, this > >> would provide improved performance by delaying WAL writes and fsync > >> so an abrupt operating system restart might lose a few seconds of > >> committed transactions but still be consistent. > > Who exactly signed onto this as a good idea? It sure doesn't square > with my ideas of an ACID database. Committed means committed, not > "maybe if you're lucky committed". True but we support fsync. Certainly it would be more useful than fsync, and it might allow us to remove fsync. No one has to sign TODO items, BTW. They are added and removed as requested. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> Who exactly signed onto this as a good idea? It sure doesn't square >> with my ideas of an ACID database. Committed means committed, not >> "maybe if you're lucky committed". > True but we support fsync. Certainly it would be more useful than > fsync, and it might allow us to remove fsync. How so? fsync off is for I-don't-care-about-this-data-at-all cases (primarily development, though loading already-archived data can qualify too). I'm not seeing a use-case for "I care about this data, but only once it's more than N seconds old". It certainly does not replace "just go as fast as you can", which is what fsync off means. > No one has to sign TODO items, BTW. They are added and removed as > requested. [ shrug... ] So if I request removal of this item, it will go away again? It hasn't reached the age needed to guarantee commit ;-) regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> Who exactly signed onto this as a good idea? It sure doesn't square > >> with my ideas of an ACID database. Committed means committed, not > >> "maybe if you're lucky committed". > > > True but we support fsync. Certainly it would be more useful than > > fsync, and it might allow us to remove fsync. > > How so? fsync off is for I-don't-care-about-this-data-at-all cases > (primarily development, though loading already-archived data can > qualify too). I'm not seeing a use-case for "I care about this data, > but only once it's more than N seconds old". It certainly does not > replace "just go as fast as you can", which is what fsync off means. > > > No one has to sign TODO items, BTW. They are added and removed as > > requested. > > [ shrug... ] So if I request removal of this item, it will go away > again? It hasn't reached the age needed to guarantee commit ;-) Many databases offer this feature. The submitter asked for it, and I think it is a good idea. For cases where you are running an in-house app, you can tell your employees to re-key the stuff they did just before the crash. It doesn't work for web apps and stuff, but for smaller cases it is fine. With Informix, the logic used by most customers I dealt with was that unbuffered logging was too slow and they were willing to do a few rekeys for the performance gain. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Many databases offer this feature. The submitter asked for it, Actually he didn't --- AFAICS you misinterpreted the thread completely. The original suggestion was that we might be able to exploit a transactional filesystem to improve performance *without* sacrificing any correctness guarantees. Delayed fsync has nothing to do with that. (I'm dubious whether there's any performance improvement to be had that would be worth the code uglification involved, since we're surely not going to *require* a transactional filesystem and so two very different code paths seem to be needed. But it's at least something to think about.) Again, the fact that Oracle offers such a feature doesn't make it a good idea. regards, tom lane
On Fri, 13 Aug 2004, Bruce Momjian wrote: > Tom Lane wrote: >> Bruce Momjian <pgman@candle.pha.pa.us> writes: >>> Tom Lane wrote: >>>> Who exactly signed onto this as a good idea? It sure doesn't square >>>> with my ideas of an ACID database. Committed means committed, not >>>> "maybe if you're lucky committed". >> >>> True but we support fsync. Certainly it would be more useful than >>> fsync, and it might allow us to remove fsync. >> >> How so? fsync off is for I-don't-care-about-this-data-at-all cases >> (primarily development, though loading already-archived data can >> qualify too). I'm not seeing a use-case for "I care about this data, >> but only once it's more than N seconds old". It certainly does not >> replace "just go as fast as you can", which is what fsync off means. >> >>> No one has to sign TODO items, BTW. They are added and removed as >>> requested. >> >> [ shrug... ] So if I request removal of this item, it will go away >> again? It hasn't reached the age needed to guarantee commit ;-) > > Many databases offer this feature. The submitter asked for it, and I > think it is a good idea. For cases where you are running an in-house > app, you can tell your employees to re-key the stuff they did just > before the crash. It doesn't work for web apps and stuff, but for > smaller cases it is fine. > > With Informix, the logic used by most customers I dealt with was that > unbuffered logging was too slow and they were willing to do a few rekeys > for the performance gain. I tend to agree with Tom that this is a bad idea, but ... if we do foolishly implement this, can it be a disfeature that is only available via a special configure flag on compile, that creates a special GUC variable that defaults to the standard behaviour? Basically, if you desire to risk cutting off your left hand for the sake of speed, put them through a couple of hoops to get there first ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
On Sat, 14 Aug 2004, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: >> Many databases offer this feature. The submitter asked for it, > > Actually he didn't --- AFAICS you misinterpreted the thread completely. > The original suggestion was that we might be able to exploit a > transactional filesystem to improve performance *without* sacrificing > any correctness guarantees. Delayed fsync has nothing to do with that. > > (I'm dubious whether there's any performance improvement to be had that > would be worth the code uglification involved, since we're surely not > going to *require* a transactional filesystem and so two very different > code paths seem to be needed. But it's at least something to think about.) Just to expand on the 'dubiousness' ... remember awhile back when I worked through the 'no-WAL' version of PostgreSQL to test loading a database with WAL disabled? The performance improvements on loading a database weren't enough, I seem to recall, to warrant getting rid of WAL altogether ... so I can't see 'delayed WAL' being faster then 'no WAL' ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Many databases offer this feature. The submitter asked for it, > > Actually he didn't --- AFAICS you misinterpreted the thread completely. > The original suggestion was that we might be able to exploit a > transactional filesystem to improve performance *without* sacrificing > any correctness guarantees. Delayed fsync has nothing to do with that. > > (I'm dubious whether there's any performance improvement to be had that > would be worth the code uglification involved, since we're surely not > going to *require* a transactional filesystem and so two very different > code paths seem to be needed. But it's at least something to think about.) > > Again, the fact that Oracle offers such a feature doesn't make it a good > idea. Agreed. I was addressing his second question: > > Is there also a possibility to tell Postgres : "I don't care if I lose 30 > > seconds of transactions on this table if the power goes out, I just want > > to be sure it's still ACID et al. compliant but you can fsync less often > > and thus be faster" (with a possibility of setting that on a per-table > > basis) ? I disagree on the per-table part but I can see cases where this middle mode would be useful. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Marc G. Fournier wrote: > > With Informix, the logic used by most customers I dealt with was that > > unbuffered logging was too slow and they were willing to do a few rekeys > > for the performance gain. > > I tend to agree with Tom that this is a bad idea, but ... if we do > foolishly implement this, can it be a disfeature that is only available > via a special configure flag on compile, that creates a special GUC > variable that defaults to the standard behaviour? > > Basically, if you desire to risk cutting off your left hand for the sake > of speed, put them through a couple of hoops to get there first ... It isn't going to be any worse than fsync. At least the system is consistent while fsync leaves it inconsistent. In cases where people are using fsync, I bet some would prefer this middle ground. Now if you want to make fsync have the same restriction, that would make sense. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Marc G. Fournier wrote: > On Sat, 14 Aug 2004, Tom Lane wrote: > > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> Many databases offer this feature. The submitter asked for it, > > > > Actually he didn't --- AFAICS you misinterpreted the thread completely. > > The original suggestion was that we might be able to exploit a > > transactional filesystem to improve performance *without* sacrificing > > any correctness guarantees. Delayed fsync has nothing to do with that. > > > > (I'm dubious whether there's any performance improvement to be had that > > would be worth the code uglification involved, since we're surely not > > going to *require* a transactional filesystem and so two very different > > code paths seem to be needed. But it's at least something to think about.) > > Just to expand on the 'dubiousness' ... remember awhile back when I worked > through the 'no-WAL' version of PostgreSQL to test loading a database with > WAL disabled? The performance improvements on loading a database weren't > enough, I seem to recall, to warrant getting rid of WAL altogether ... so > I can't see 'delayed WAL' being faster then 'no WAL' ... Uh, you mean fsync isn't a performance hit as it once was? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Sat, 14 Aug 2004, Bruce Momjian wrote: > Marc G. Fournier wrote: >> On Sat, 14 Aug 2004, Tom Lane wrote: >> >>> Bruce Momjian <pgman@candle.pha.pa.us> writes: >>>> Many databases offer this feature. The submitter asked for it, >>> >>> Actually he didn't --- AFAICS you misinterpreted the thread completely. >>> The original suggestion was that we might be able to exploit a >>> transactional filesystem to improve performance *without* sacrificing >>> any correctness guarantees. Delayed fsync has nothing to do with that. >>> >>> (I'm dubious whether there's any performance improvement to be had that >>> would be worth the code uglification involved, since we're surely not >>> going to *require* a transactional filesystem and so two very different >>> code paths seem to be needed. But it's at least something to think about.) >> >> Just to expand on the 'dubiousness' ... remember awhile back when I worked >> through the 'no-WAL' version of PostgreSQL to test loading a database with >> WAL disabled? The performance improvements on loading a database weren't >> enough, I seem to recall, to warrant getting rid of WAL altogether ... so >> I can't see 'delayed WAL' being faster then 'no WAL' ... > > Uh, you mean fsync isn't a performance hit as it once was? No, I mean that writing WAL doesn't appear to be a performance hit ... removing WAL writing and doing a large db load, the load is a bit faster, but not as big as one would hope ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
Marc G. Fournier wrote: > On Sat, 14 Aug 2004, Bruce Momjian wrote: > > > Marc G. Fournier wrote: > >> On Sat, 14 Aug 2004, Tom Lane wrote: > >> > >>> Bruce Momjian <pgman@candle.pha.pa.us> writes: > >>>> Many databases offer this feature. The submitter asked for it, > >>> > >>> Actually he didn't --- AFAICS you misinterpreted the thread completely. > >>> The original suggestion was that we might be able to exploit a > >>> transactional filesystem to improve performance *without* sacrificing > >>> any correctness guarantees. Delayed fsync has nothing to do with that. > >>> > >>> (I'm dubious whether there's any performance improvement to be had that > >>> would be worth the code uglification involved, since we're surely not > >>> going to *require* a transactional filesystem and so two very different > >>> code paths seem to be needed. But it's at least something to think about.) > >> > >> Just to expand on the 'dubiousness' ... remember awhile back when I worked > >> through the 'no-WAL' version of PostgreSQL to test loading a database with > >> WAL disabled? The performance improvements on loading a database weren't > >> enough, I seem to recall, to warrant getting rid of WAL altogether ... so > >> I can't see 'delayed WAL' being faster then 'no WAL' ... > > > > Uh, you mean fsync isn't a performance hit as it once was? > > No, I mean that writing WAL doesn't appear to be a performance hit ... > removing WAL writing and doing a large db load, the load is a bit faster, > but not as big as one would hope ... OK. My idea is to remove fsync (poorly named) and add a new parameter called transaction_loss. It would specify the number of seconds or milliseconds before an abrupt server restart that you were willing to lose transactions. The default would be zero (no loss), and we can support -1 for the same as fsync off. Positive values would delay WAL write/fsync for that many seconds/milliseconds. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Sat, 14 Aug 2004, Bruce Momjian wrote: > Marc G. Fournier wrote: >> On Sat, 14 Aug 2004, Bruce Momjian wrote: >> >>> Marc G. Fournier wrote: >>>> On Sat, 14 Aug 2004, Tom Lane wrote: >>>> >>>>> Bruce Momjian <pgman@candle.pha.pa.us> writes: >>>>>> Many databases offer this feature. The submitter asked for it, >>>>> >>>>> Actually he didn't --- AFAICS you misinterpreted the thread completely. >>>>> The original suggestion was that we might be able to exploit a >>>>> transactional filesystem to improve performance *without* sacrificing >>>>> any correctness guarantees. Delayed fsync has nothing to do with that. >>>>> >>>>> (I'm dubious whether there's any performance improvement to be had that >>>>> would be worth the code uglification involved, since we're surely not >>>>> going to *require* a transactional filesystem and so two very different >>>>> code paths seem to be needed. But it's at least something to think about.) >>>> >>>> Just to expand on the 'dubiousness' ... remember awhile back when I worked >>>> through the 'no-WAL' version of PostgreSQL to test loading a database with >>>> WAL disabled? The performance improvements on loading a database weren't >>>> enough, I seem to recall, to warrant getting rid of WAL altogether ... so >>>> I can't see 'delayed WAL' being faster then 'no WAL' ... >>> >>> Uh, you mean fsync isn't a performance hit as it once was? >> >> No, I mean that writing WAL doesn't appear to be a performance hit ... >> removing WAL writing and doing a large db load, the load is a bit faster, >> but not as big as one would hope ... > > OK. My idea is to remove fsync (poorly named) and add a new parameter > called transaction_loss. It would specify the number of seconds or > milliseconds before an abrupt server restart that you were willing to > lose transactions. The default would be zero (no loss), and we can > support -1 for the same as fsync off. Positive values would delay WAL > write/fsync for that many seconds/milliseconds. 'k, right now we have a seperate variable for wal_fsync vs regular fsync ... are you looking at merging them into one? Or leaving them as being treated seperately? One question, more directed to Tom here ... since they are seperate right now, if WAL is fsync, and "regular writing" is no-fsync, doesn't that potentially open us up to some *serious* problems? WAL sees the transaction as complete, but the write for the rest of the system hasn't happened yet? ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
Marc G. Fournier wrote: > >>>> Just to expand on the 'dubiousness' ... remember awhile back when I worked > >>>> through the 'no-WAL' version of PostgreSQL to test loading a database with > >>>> WAL disabled? The performance improvements on loading a database weren't > >>>> enough, I seem to recall, to warrant getting rid of WAL altogether ... so > >>>> I can't see 'delayed WAL' being faster then 'no WAL' ... > >>> > >>> Uh, you mean fsync isn't a performance hit as it once was? > >> > >> No, I mean that writing WAL doesn't appear to be a performance hit ... > >> removing WAL writing and doing a large db load, the load is a bit faster, > >> but not as big as one would hope ... > > > > OK. My idea is to remove fsync (poorly named) and add a new parameter > > called transaction_loss. It would specify the number of seconds or > > milliseconds before an abrupt server restart that you were willing to > > lose transactions. The default would be zero (no loss), and we can > > support -1 for the same as fsync off. Positive values would delay WAL > > write/fsync for that many seconds/milliseconds. > > 'k, right now we have a seperate variable for wal_fsync vs regular fsync > ... are you looking at merging them into one? Or leaving them as being > treated seperately? I was going to leave wal_fsync alone because it controls the method of fsync, not the frequency of fsync. We could call it wal_transaction_loss to clarify what it controls. I am also thinking we should make -1 fsync enough so it maintains a consistent database, and not even allow the behavior we have now with fsync=off where the database is left inconsistent after and OS crash. I will have to run some tests but I bet that fsync to maintain consistency would provide similar performance to fsync=off, but with the benefit of returing a consistent database. > One question, more directed to Tom here ... since they are seperate right > now, if WAL is fsync, and "regular writing" is no-fsync, doesn't that > potentially open us up to some *serious* problems? WAL sees the > transaction as complete, but the write for the rest of the system hasn't > happened yet? "regular writing"? Are you talking about the fsync we do from the background writer (in 8.0) during checkpoint? That has to be done to maintain consistency no matter what delay they choose. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
"Marc G. Fournier" <scrappy@postgresql.org> writes: > One question, more directed to Tom here ... since they are seperate right > now, if WAL is fsync, and "regular writing" is no-fsync, doesn't that > potentially open us up to some *serious* problems? WAL sees the > transaction as complete, but the write for the rest of the system hasn't > happened yet? No, that's pretty much the whole point of WAL: once you've fsynced the transaction's log entries to WAL, it's committed. You don't have to fsync the data-file writes, or even write the changes out at all. (In most cases the pages stay in shared buffer cache, dirty, until the background writer gets to them.) If you crash then the data-file changes will get redone by replaying the WAL entries. You do have to fsync data-file writes when trying to complete a checkpoint, but that's outside the critical path for normal transactions. One of the reasons I dislike Bruce's proposal is that I don't think it pays any attention to this basic duality between normal operations (fsync WAL) and checkpoints (fsync data). We just finished finding a long-standing bug in this area, so I'm pretty hesitant to whack it around on the basis of unproven ideas about performance improvements. regards, tom lane
Tom Lane wrote: > "Marc G. Fournier" <scrappy@postgresql.org> writes: > > One question, more directed to Tom here ... since they are seperate right > > now, if WAL is fsync, and "regular writing" is no-fsync, doesn't that > > potentially open us up to some *serious* problems? WAL sees the > > transaction as complete, but the write for the rest of the system hasn't > > happened yet? > > No, that's pretty much the whole point of WAL: once you've fsynced the > transaction's log entries to WAL, it's committed. You don't have to > fsync the data-file writes, or even write the changes out at all. > (In most cases the pages stay in shared buffer cache, dirty, until the > background writer gets to them.) If you crash then the data-file > changes will get redone by replaying the WAL entries. > > You do have to fsync data-file writes when trying to complete a > checkpoint, but that's outside the critical path for normal > transactions. > > One of the reasons I dislike Bruce's proposal is that I don't think it > pays any attention to this basic duality between normal operations > (fsync WAL) and checkpoints (fsync data). We just finished finding a > long-standing bug in this area, so I'm pretty hesitant to whack it > around on the basis of unproven ideas about performance improvements. I have to show some performance numbers to make it worthwhile. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073