Thread: commit_delay, siblings
Hackers: I've been trying to get a test result for 8.1 that shows that we can eliminate commit_delay and commit_siblings, as I believe that these settings no longer have any real effect on performance. However, the checkpointing performance issues have so far prevented me from getting a good test result for this. Just a warning, because I might bring it up after feature freeze. -- Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: > I've been trying to get a test result for 8.1 that shows that we can eliminate > commit_delay and commit_siblings, as I believe that these settings no longer > have any real effect on performance. I don't think they ever did :-(. The theory is good, but useful values for commit_delay would probably be under a millisecond, and there isn't any portable way to sleep for such short periods. We've been leaving them there just in case somebody can find a use for 'em, but I wouldn't object to taking them out. regards, tom lane
Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: > >>I've been trying to get a test result for 8.1 that shows that we can eliminate >>commit_delay and commit_siblings, as I believe that these settings no longer >>have any real effect on performance. > > > I don't think they ever did :-(. The theory is good, but useful values > for commit_delay would probably be under a millisecond, and there isn't > any portable way to sleep for such short periods. We've been leaving > them there just in case somebody can find a use for 'em, but I wouldn't > object to taking them out. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) We have done extensive testing some time ago. We could not see any difference on any platform we have tested (AIX, Linux, Solaris). I don't think that there is one at all - at least not on common systems. best regards, hans -- Cybertec Geschwinde u Schoenig Schoengrabern 134, A-2020 Hollabrunn, Austria Tel: +43/664/393 39 74 www.cybertec.at, www.postgresql.at
Hans, Tom, > We have done extensive testing some time ago. > We could not see any difference on any platform we have tested (AIX, > Linux, Solaris). I don't think that there is one at all - at least not > on common systems. Keen then. Any objections to removing the GUC? We desperately need means to cut down on GUC options. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
Hans-Jürgen Schönig <postgres@cybertec.at> writes: > > The theory is good, but useful values for commit_delay would probably be > > under a millisecond, and there isn't any portable way to sleep for such > > short periods. Just because there's no "portable" way to be sure it'll work doesn't mean there's no point in trying. If one user sets it to 5ms and it's effective for him there's no reason to take out the option for him just because it doesn't work out as well on all platforms. Linux, for example has moved to higher clock speeds precisely because things like movie and music players need to be able to control their timing to much more precision than 10ms. -- greg
"Josh Berkus" <josh@agliodbs.com> writes > Hackers: > > I've been trying to get a test result for 8.1 that shows that we can eliminate > commit_delay and commit_siblings, as I believe that these settings no longer > have any real effect on performance. However, the checkpointing performance > issues have so far prevented me from getting a good test result for this. > In my understadning, the commit_delay/commit_siblings combination simulate the background xlog writer mechanisms in some database like Oracle. This might be separate issue. We have code in xlogflush() like: /* done already? */if (!XLByteLE(record, LogwrtResult.Flush)){ /* now wait for the write lock */ LWLockAcquire(WALWriteLock,LW_EXCLUSIVE); if (XLByteLE(record, LogwrtResult.Flush)) LWLockRelease(WALWriteLock); /*if done already, then release the lock */ else /* do it */ If the testing results turns out the "LWLockRelease(WALWriteLock)" actually happened often, then it indicates that we waste some time on acquiring WALWriteLock. Would commit_delay/commit_siblings helps or we need a background xlog writer and notify us the completion of xlogflush is better (so we don't compete for this lock)? Regards, Qingqing
"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes: > Would commit_delay/commit_siblings helps or we need a > background xlog writer and notify us the completion of xlogflush is better > (so we don't compete for this lock)? The existing bgwriter already does a certain amount of xlog flushing (since it must flush WAL at least as far as the LSN of any dirty page it wants to write out). However I'm not sure that this is very effective --- in a few strace tests that I've done, it seemed that committing backends still ended up doing the bulk of the xlog writes, especially if they were doing small transactions. It'd be interesting to look into making the bgwriter (or a new dedicated xlog bgwriter) responsible for all xlog writes. You could imagine a loop like forever do if (something new in xlog) write and flush it; else sleep 10 msec;done together with some kind of IPC to waken backends once xlog was flushed past the point they needed. (Designing that is the hard part.) But in any case, the existing commit_delay doesn't seem like it's got anything to do with a path to a better answer, so this is not an argument against removing it. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes > > > together with some kind of IPC to waken backends once xlog was flushed > past the point they needed. (Designing that is the hard part.) > I think we could use ProcSendSignal()/ProcWaitForSignal() mechanism to cope with the problem, because they won't lost any wake-ups. So there will be a MaxBackend sized shared memory arrary with each cell is a XLogRecPtr recptr; /* record request */ bool status; /* execution results */ structure. The initial value of the cell is <(0, 0), *doesn't matter*>. Also, we need a spinlock to protect "recptr" value since it is not a sig_atomic_t value. A backend requests a xlogflush will do: spinlock_acquire; fill in the XLogRecPtr value; spinlock_release; ProcWaitForSignal(); After waken up, it will examine the "status" value and acts accordingly. The xlog-writer is the only one who does real xlog write in postmaster mode. It does not work in standalone mode or recovery mode. It works based on a periodical loop + waken up when the xlog buffer is 70% full. A cancel/die interrupts could happen during wait, so we will plug in a ProcCancelWaitForSignal() at AbortTransaction() or error handling in xlog-writer loop. There also could be various error conditions in its life. Any error happened during xlogflush will be PANIC. Some small errors in the loop will be hopefully recoverable. If everything is good, it would scan the arrary, for each cell do: spinlock_acquire; make a local copy of XLogRecPtr; spinlock_release; if (recptr is (0, 0)) nothing to do; /* no request at all */ if (recptr is satisfied) set XLogRecPtr to (0, 0); status = true; /* successfully done */ ProcSendSignal(targetbackendid);else check if the recptr is passed the end of xlog file, if so set XLogRecPtr to(0, 0); set status = false; /* bad request */ ProcSendSignal(targetbackendid); I am not sure how to check bad recptr. Currently we could do this by comparing request and real flush point after xlogwrite(request). However, seems this is not a solution for the xlog writer case. Regards, Qingqing
Josh Berkus wrote: > Hackers: > > I've been trying to get a test result for 8.1 that shows that we can eliminate > commit_delay and commit_siblings, as I believe that these settings no longer > have any real effect on performance. However, the checkpointing performance > issues have so far prevented me from getting a good test result for this. > > Just a warning, because I might bring it up after feature freeze. If we yank them ( and I agree) I think we have to do it before feature freeze. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce, > > Just a warning, because I might bring it up after feature freeze. > > If we yank them ( and I agree) I think we have to do it before feature > freeze. I believe that we have consensus to yank them. Hans says that he did extensive testing back as far as 7.4 and the options had no effect. -- Josh Berkus Aglio Database Solutions San Francisco
> > > Just a warning, because I might bring it up after feature freeze. > > > > If we yank them ( and I agree) I think we have to do it before feature > > freeze. > > I believe that we have consensus to yank them. Hans says that he did > extensive testing back as far as 7.4 and the options had no effect. My opinion is, we'd better test with at least 8.0, or even better with current. I think I can do the testing after Jul 1 if those features are remained. I have a dual Xeon system with a 15000RPM SCSI disk system in my office. -- Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes: >>> If we yank them ( and I agree) I think we have to do it before feature >>> freeze. >> >> I believe that we have consensus to yank them. Hans says that he did >> extensive testing back as far as 7.4 and the options had no effect. > My opinion is, we'd better test with at least 8.0, or even better with > current. I think I can do the testing after Jul 1 if those features > are remained. I have a dual Xeon system with a 15000RPM SCSI disk > system in my office. Well, the proposal is on the table, and the implementation is pretty obvious. If you want to be sticky about the feature freeze rule, someone could generate a diff to remove the variables and post it to -patches before July 1, and then it would be fully per-rules to evaluate it after July 1. I vote not to require ourselves to go through that pushup. If Tatsuo can do some testing next week, I'm happy to hold off removing the variables until then. regards, tom lane
On Tue, Jun 28, 2005 at 10:35:43AM -0400, Tom Lane wrote: > Tatsuo Ishii <t-ishii@sra.co.jp> writes: > >>> If we yank them ( and I agree) I think we have to do it before feature > >>> freeze. > >> > >> I believe that we have consensus to yank them. Hans says that he did > >> extensive testing back as far as 7.4 and the options had no effect. > > > My opinion is, we'd better test with at least 8.0, or even better with > > current. I think I can do the testing after Jul 1 if those features > > are remained. I have a dual Xeon system with a 15000RPM SCSI disk > > system in my office. > > Well, the proposal is on the table, and the implementation is pretty > obvious. If you want to be sticky about the feature freeze rule, > someone could generate a diff to remove the variables and post it to > -patches before July 1, and then it would be fully per-rules to evaluate > it after July 1. That'd be needlessly legalistic ... I propose we stick to the "spirit" of the rules, rather than the letter. > I vote not to require ourselves to go through that pushup. I agree. -- Alvaro Herrera (<alvherre[a]surnet.cl>) "Cada quien es cada cual y baja las escaleras como quiere" (JMSerrat)
Tom, Incidentally, I have tests in the queue. It's just that the STP has been very unreliable for the last month so I've not been able to get definitive test results. More important than commit_*, is, of course the WAL/CRC stuff for checkpoint cost, which I'm also getting impatient to test. Will be setting up my own test machines today ... -- --Josh Josh Berkus Aglio Database Solutions San Francisco
On Wed, 2005-06-22 at 11:11 -0700, Josh Berkus wrote: > Hans, Tom, > > > We have done extensive testing some time ago. > > We could not see any difference on any platform we have tested (AIX, > > Linux, Solaris). I don't think that there is one at all - at least not > > on common systems. > > Keen then. Any objections to removing the GUC? We desperately need means > to cut down on GUC options. Group commit is a well-documented technique for improving performance, but the gains only show themselves on very busy systems. It is possible in earlier testing any apparent value was actually hidden by the BufMgrLock issues we have now resolved in 8.1. We now see XLogInsert as being very nearly the highest routine on the oprofile. That tells me that it could now be time for group commit to show us some value, if any exists. DB2 and Berkeley-DB use group commit, while other rdbms use log writer processes which effectively provide the same thing. It would surprise me if we were unable to make use of such a technique, and worry me too. I would ask that we hold off on their execution, at least for the complete 8.1 beta performance test cycle. We may yet see gains albeit, as Tom points out, that benefit may only be possible on only some platforms. Best Regards, Simon Riggs
Simon Riggs wrote: > Group commit is a well-documented technique for improving performance, > but the gains only show themselves on very busy systems. It is possible > in earlier testing any apparent value was actually hidden by the > BufMgrLock issues we have now resolved in 8.1. We now see XLogInsert as > being very nearly the highest routine on the oprofile. That tells me > that it could now be time for group commit to show us some value, if any > exists. > > DB2 and Berkeley-DB use group commit, while other rdbms use log writer > processes which effectively provide the same thing. It would surprise me > if we were unable to make use of such a technique, and worry me too. > > I would ask that we hold off on their execution, at least for the > complete 8.1 beta performance test cycle. We may yet see gains albeit, > as Tom points out, that benefit may only be possible on only some > platforms. I don't remember the details exactly, but isn't it so that postgres has some kind of group commits even without the commit_delay option? I.e. when several backends are waiting for commit concurrently, the one to get to commit will actually commit wal for all waiting transactions to disk? I remember the term "ganged wal writes" or something similar. Tom, can you elaborate on this? Please tell me if I am totally off track. ;-) Best Regards, Michael Paesold
Simon Riggs wrote: > On Wed, 2005-06-22 at 11:11 -0700, Josh Berkus wrote: > > Hans, Tom, > > > > > We have done extensive testing some time ago. > > > We could not see any difference on any platform we have tested (AIX, > > > Linux, Solaris). I don't think that there is one at all - at least not > > > on common systems. > > > > Keen then. Any objections to removing the GUC? We desperately need means > > to cut down on GUC options. > > Group commit is a well-documented technique for improving performance, > but the gains only show themselves on very busy systems. It is possible > in earlier testing any apparent value was actually hidden by the > BufMgrLock issues we have now resolved in 8.1. We now see XLogInsert as > being very nearly the highest routine on the oprofile. That tells me > that it could now be time for group commit to show us some value, if any > exists. > > DB2 and Berkeley-DB use group commit, while other rdbms use log writer > processes which effectively provide the same thing. It would surprise me > if we were unable to make use of such a technique, and worry me too. > > I would ask that we hold off on their execution, at least for the > complete 8.1 beta performance test cycle. We may yet see gains albeit, > as Tom points out, that benefit may only be possible on only some > platforms. Interesting. I didn't know other databases used group commits. Your idea of keeping it for the 8.1 testing cycle has merit. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Simon Riggs <simon@2ndquadrant.com> writes: > Group commit is a well-documented technique for improving performance, The issue here is not "is group commit a good idea in the abstract?". It is "is the commit_delay implementation of the idea worth a dime?" ... and the evidence we have all points to the answer "NO". We should not let theoretical arguments blind us to this. I posted an analysis some time ago showing that under heavy load, we already have the effect of ganged commits, without commit_delay: http://archives.postgresql.org/pgsql-hackers/2002-10/msg00331.php It's likely that there is more we can and should do, but that doesn't mean that commit_delay is the right answer. commit_delay doesn't do anything to encourage ganging of writes, it just inserts an arbitrary delay that's not synchronized to anything, and is probably an order of magnitude too large anyway on most platforms. > I would ask that we hold off on their execution, at least for the > complete 8.1 beta performance test cycle. I'm willing to wait a week while Tatsuo runs some fresh tests. I'm not willing to wait indefinitely for evidence that I'm privately certain will not be forthcoming. regards, tom lane
On Wed, Jun 29, 2005 at 08:14:36AM +0100, Simon Riggs wrote: > > Group commit is a well-documented technique for improving performance, > but the gains only show themselves on very busy systems. It is possible > in earlier testing any apparent value was actually hidden by the > BufMgrLock issues we have now resolved in 8.1. We now see XLogInsert as > being very nearly the highest routine on the oprofile. That tells me > that it could now be time for group commit to show us some value, if any > exists. > > DB2 and Berkeley-DB use group commit, while other rdbms use log writer > processes which effectively provide the same thing. It would surprise me > if we were unable to make use of such a technique, and worry me too. > > I would ask that we hold off on their execution, at least for the > complete 8.1 beta performance test cycle. We may yet see gains albeit, > as Tom points out, that benefit may only be possible on only some > platforms. > > Best Regards, Simon Riggs > > ---------------------------(end of broadcast)--------------------------- I would like to wiegh in on Simon's side on this issue. The fact that no benefit has been seen from the group commint yet may be in part do to the current WAL fsync structure where a page at a time is sync'd. I saw a patch/test just recently mentioned that showed dramatic performance improvements, up to the level of "fsync = off", by writing multiple blocks with a gather algorithm. I would hope that with a similar patch, we should begin to see the benefit of the commit_delay GUC. Ken Marshall
On Wed, 2005-06-29 at 10:16 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > Group commit is a well-documented technique for improving performance, > > The issue here is not "is group commit a good idea in the abstract?". > It is "is the commit_delay implementation of the idea worth a dime?" > ... and the evidence we have all points to the answer "NO". We should > not let theoretical arguments blind us to this. OK, sometimes I sound too theoretical when I do my World History of RDBMS notes, :-) ... all I meant was "lets hold off till we've measured it". > > I would ask that we hold off on their execution, at least for the > > complete 8.1 beta performance test cycle. > > I'm willing to wait a week while Tatsuo runs some fresh tests. I'm > not willing to wait indefinitely for evidence that I'm privately > certain will not be forthcoming. I'm inclined to agree with you, but I see no need to move quickly. The code's been there a while now. Best Regards, Simon Riggs
Hi, > Simon Riggs <simon@2ndquadrant.com> writes: > > Group commit is a well-documented technique for improving performance, > > The issue here is not "is group commit a good idea in the abstract?". > It is "is the commit_delay implementation of the idea worth a dime?" > ... and the evidence we have all points to the answer "NO". We should > not let theoretical arguments blind us to this. > > I posted an analysis some time ago showing that under heavy load, > we already have the effect of ganged commits, without commit_delay: > http://archives.postgresql.org/pgsql-hackers/2002-10/msg00331.php > > It's likely that there is more we can and should do, but that doesn't > mean that commit_delay is the right answer. commit_delay doesn't do > anything to encourage ganging of writes, it just inserts an arbitrary > delay that's not synchronized to anything, and is probably an order > of magnitude too large anyway on most platforms. > > > I would ask that we hold off on their execution, at least for the > > complete 8.1 beta performance test cycle. > > I'm willing to wait a week while Tatsuo runs some fresh tests. I'm > not willing to wait indefinitely for evidence that I'm privately > certain will not be forthcoming. > > regards, tom lane Here are the results from testings I did this morning. Summary: The effect of commit_delay cannot be ignored. I got almost 3 times performance differnce among different commit_delay settings. Details: Xeon 2.8GHz x2, HT on, mem 2GB, Ultra 320 SCSI, 15000RPM, HT on Redhat AS 3/kernel 2.4.21( 2.4.21-9.30AXsmp) PostgreSQL current (July 2 12:18 JST) FS: /dev/cciss/c0d0p3 28G 2.1G 25G 8% / /dev/cciss/c0d0p1 985M 28M 907M 3% /boot /dev/cciss/c0d1p1 67G 1.7G 62G 3% /data1 /dev/cciss/c0d2p1 67G 33M 64G 1% /data2 /dev/cciss/c0d3p1 67G 33M 64G 1% /data3 none 1.3G 0 1.3G 0% /dev/shm OS & PostgreSQL binaries are on /. data is on /data1. All postgresql.conf directives are set to defaults except: max_connections = 512 shared_buffers = 10000 Benchmarking is done using pgbench. The test database was initialized by following commands: pgbench -i -s 100 test (10,000,000 rows in accounts table) case 1: commit_delay = 0 $ time pgbench -N -c 128 -t 100 test (128 concurrent uses) starting vacuum...end. transaction type: Update only accounts scaling factor: 100 number of clients: 128 number of transactions per client: 100 number of transactions actually processed: 12800/12800 tps = 47.400291 (including connections establishing) tps = 47.509689 (excluding connections establishing) real 4m30.065s user 0m3.530s sys 0m11.210s case 2: commit_delay = 10 starting vacuum...end. transaction type: Update only accounts scaling factor: 100 number of clients: 128 number of transactions per client: 100 number of transactions actually processed: 12800/12800 tps = 140.024294 (including connections establishing) tps = 141.038901 (excluding connections establishing) real 1m31.431s user 0m2.340s sys 0m5.850s case 3: commit_delay = 50 starting vacuum...end. transaction type: Update only accounts scaling factor: 100 number of clients: 128 number of transactions per client: 100 number of transactions actually processed: 12800/12800 tps = 137.207500 (including connections establishing) tps = 138.083489 (excluding connections establishing) real 1m33.312s user 0m2.790s sys 0m6.490s case 4: commit_delay = 100 starting vacuum...end. transaction type: Update only accounts scaling factor: 100 number of clients: 128 number of transactions per client: 100 number of transactions actually processed: 12800/12800 tps = 133.458149 (including connections establishing) tps = 134.298841 (excluding connections establishing) real 1m35.931s user 0m2.750s sys 0m7.030s As you can see commit_delay = 10 outperforms commit_delay = 0 by 3 times. -- Tatsuo Ishii