Thread: commit_delay, siblings

commit_delay, siblings

From
Josh Berkus
Date:
Hackers:

I've been trying to get a test result for 8.1 that shows that we can eliminate 
commit_delay and commit_siblings, as I believe that these settings no longer 
have any real effect on performance.  However, the checkpointing performance 
issues have so far prevented me from getting a good test result for this. 

Just a warning, because I might bring it up after feature freeze.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco


Re: commit_delay, siblings

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> I've been trying to get a test result for 8.1 that shows that we can eliminate 
> commit_delay and commit_siblings, as I believe that these settings no longer 
> have any real effect on performance.

I don't think they ever did :-(.  The theory is good, but useful values
for commit_delay would probably be under a millisecond, and there isn't
any portable way to sleep for such short periods.  We've been leaving
them there just in case somebody can find a use for 'em, but I wouldn't
object to taking them out.
        regards, tom lane


Re: commit_delay, siblings

From
Hans-Jürgen Schönig
Date:
Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
> 
>>I've been trying to get a test result for 8.1 that shows that we can eliminate 
>>commit_delay and commit_siblings, as I believe that these settings no longer 
>>have any real effect on performance.
> 
> 
> I don't think they ever did :-(.  The theory is good, but useful values
> for commit_delay would probably be under a millisecond, and there isn't
> any portable way to sleep for such short periods.  We've been leaving
> them there just in case somebody can find a use for 'em, but I wouldn't
> object to taking them out.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)


We have done extensive testing some time ago.
We could not see any difference on any platform we have tested (AIX, 
Linux, Solaris). I don't think that there is one at all - at least not 
on common systems.
best regards,
    hans

-- 
Cybertec Geschwinde u Schoenig
Schoengrabern 134, A-2020 Hollabrunn, Austria
Tel: +43/664/393 39 74
www.cybertec.at, www.postgresql.at



Re: commit_delay, siblings

From
Josh Berkus
Date:
Hans, Tom,

> We have done extensive testing some time ago.
> We could not see any difference on any platform we have tested (AIX,
> Linux, Solaris). I don't think that there is one at all - at least not
> on common systems.

Keen then.  Any objections to removing the GUC?   We desperately need means 
to cut down on GUC options.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco


Re: commit_delay, siblings

From
Greg Stark
Date:
Hans-Jürgen Schönig <postgres@cybertec.at> writes:

> > The theory is good, but useful values for commit_delay would probably be
> > under a millisecond, and there isn't any portable way to sleep for such
> > short periods.

Just because there's no "portable" way to be sure it'll work doesn't mean
there's no point in trying. If one user sets it to 5ms and it's effective for
him there's no reason to take out the option for him just because it doesn't
work out as well on all platforms.

Linux, for example has moved to higher clock speeds precisely because things
like movie and music players need to be able to control their timing to much
more precision than 10ms.

-- 
greg



Re: commit_delay, siblings

From
"Qingqing Zhou"
Date:
"Josh Berkus" <josh@agliodbs.com> writes
> Hackers:
>
> I've been trying to get a test result for 8.1 that shows that we can
eliminate
> commit_delay and commit_siblings, as I believe that these settings no
longer
> have any real effect on performance.  However, the checkpointing
performance
> issues have so far prevented me from getting a good test result for this.
>

In my understadning, the commit_delay/commit_siblings combination simulate
the background xlog writer mechanisms in some database like Oracle.

This might be separate issue. We have code in xlogflush() like:
/* done already? */if (!XLByteLE(record, LogwrtResult.Flush)){ /* now wait for the write lock */
LWLockAcquire(WALWriteLock,LW_EXCLUSIVE); if (XLByteLE(record, LogwrtResult.Flush))     LWLockRelease(WALWriteLock);
/*if done already, then release the
 
lock */ else    /* do it */

If the testing results turns out the "LWLockRelease(WALWriteLock)" actually
happened often, then it indicates that we waste some time on acquiring
WALWriteLock. Would commit_delay/commit_siblings helps or we need a
background xlog writer and notify us the completion of xlogflush is better
(so we don't compete for this lock)?

Regards,
Qingqing






Re: commit_delay, siblings

From
Tom Lane
Date:
"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:
> Would commit_delay/commit_siblings helps or we need a
> background xlog writer and notify us the completion of xlogflush is better
> (so we don't compete for this lock)?

The existing bgwriter already does a certain amount of xlog flushing
(since it must flush WAL at least as far as the LSN of any dirty page it
wants to write out).  However I'm not sure that this is very effective
--- in a few strace tests that I've done, it seemed that committing
backends still ended up doing the bulk of the xlog writes, especially
if they were doing small transactions.  It'd be interesting to look into
making the bgwriter (or a new dedicated xlog bgwriter) responsible for
all xlog writes.  You could imagine a loop like
forever do    if (something new in xlog)        write and flush it;    else        sleep 10 msec;done

together with some kind of IPC to waken backends once xlog was flushed
past the point they needed.  (Designing that is the hard part.)

But in any case, the existing commit_delay doesn't seem like it's got
anything to do with a path to a better answer, so this is not an
argument against removing it.
        regards, tom lane


Re: commit_delay, siblings

From
"Qingqing Zhou"
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes
>
>
> together with some kind of IPC to waken backends once xlog was flushed
> past the point they needed.  (Designing that is the hard part.)
>

I think we could use ProcSendSignal()/ProcWaitForSignal() mechanism to cope
with the problem, because they won't lost any wake-ups.

So there will be a MaxBackend sized shared memory arrary with each cell is a
 XLogRecPtr recptr;  /* record request */ bool status; /* execution results */

structure. The initial value of the cell is <(0, 0), *doesn't matter*>.
Also, we need a spinlock to protect "recptr" value since it is not a
sig_atomic_t value.

A backend requests a xlogflush will do: spinlock_acquire;   fill in the XLogRecPtr value; spinlock_release;
ProcWaitForSignal();
After waken up, it will examine the "status" value and acts accordingly.


The xlog-writer is the only one who does real xlog write in postmaster mode.
It does not work in standalone mode or recovery mode. It works based on a
periodical loop + waken up when the xlog buffer is 70% full. A cancel/die
interrupts could happen during wait, so we will plug in a
ProcCancelWaitForSignal() at AbortTransaction() or error handling in
xlog-writer loop. There also could be various error conditions in its life.
Any error happened during xlogflush will be PANIC. Some small errors in the
loop will be hopefully recoverable. If everything is good, it would scan the
arrary, for each cell do:
 spinlock_acquire;    make a local copy of XLogRecPtr; spinlock_release;
 if (recptr is (0, 0))    nothing to do;    /* no request at all */
 if (recptr is satisfied)    set XLogRecPtr to (0, 0);    status = true;    /* successfully done */
ProcSendSignal(targetbackendid);else    check if the recptr is passed the end of xlog file, if so      set XLogRecPtr
to(0, 0);      set status = false;    /* bad request */      ProcSendSignal(targetbackendid);
 

I am not sure how to check bad recptr. Currently we could do this by
comparing request and real flush point after xlogwrite(request). However,
seems this is not a solution for the xlog writer case.

Regards,
Qingqing





Re: commit_delay, siblings

From
Bruce Momjian
Date:
Josh Berkus wrote:
> Hackers:
> 
> I've been trying to get a test result for 8.1 that shows that we can eliminate 
> commit_delay and commit_siblings, as I believe that these settings no longer 
> have any real effect on performance.  However, the checkpointing performance 
> issues have so far prevented me from getting a good test result for this. 
> 
> Just a warning, because I might bring it up after feature freeze.

If we yank them ( and I agree) I think we have to do it before feature
freeze.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: commit_delay, siblings

From
Josh Berkus
Date:
Bruce,

> > Just a warning, because I might bring it up after feature freeze.
>
> If we yank them ( and I agree) I think we have to do it before feature
> freeze.

I believe that we have consensus to yank them.   Hans says that he did 
extensive testing back as far as 7.4 and the options had no effect.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco


Re: commit_delay, siblings

From
Tatsuo Ishii
Date:
> > > Just a warning, because I might bring it up after feature freeze.
> >
> > If we yank them ( and I agree) I think we have to do it before feature
> > freeze.
> 
> I believe that we have consensus to yank them.   Hans says that he did 
> extensive testing back as far as 7.4 and the options had no effect.

My opinion is, we'd better test with at least 8.0, or even better with
current. I think I can do the testing after Jul 1 if those features
are remained. I have a dual Xeon system with a 15000RPM SCSI disk
system in my office.
--
Tatsuo Ishii


Re: commit_delay, siblings

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>>> If we yank them ( and I agree) I think we have to do it before feature
>>> freeze.
>> 
>> I believe that we have consensus to yank them.   Hans says that he did 
>> extensive testing back as far as 7.4 and the options had no effect.

> My opinion is, we'd better test with at least 8.0, or even better with
> current. I think I can do the testing after Jul 1 if those features
> are remained. I have a dual Xeon system with a 15000RPM SCSI disk
> system in my office.

Well, the proposal is on the table, and the implementation is pretty
obvious.  If you want to be sticky about the feature freeze rule,
someone could generate a diff to remove the variables and post it to
-patches before July 1, and then it would be fully per-rules to evaluate
it after July 1.  I vote not to require ourselves to go through that
pushup.

If Tatsuo can do some testing next week, I'm happy to hold off removing
the variables until then.
        regards, tom lane


Re: commit_delay, siblings

From
Alvaro Herrera
Date:
On Tue, Jun 28, 2005 at 10:35:43AM -0400, Tom Lane wrote:
> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> >>> If we yank them ( and I agree) I think we have to do it before feature
> >>> freeze.
> >> 
> >> I believe that we have consensus to yank them.   Hans says that he did 
> >> extensive testing back as far as 7.4 and the options had no effect.
> 
> > My opinion is, we'd better test with at least 8.0, or even better with
> > current. I think I can do the testing after Jul 1 if those features
> > are remained. I have a dual Xeon system with a 15000RPM SCSI disk
> > system in my office.
> 
> Well, the proposal is on the table, and the implementation is pretty
> obvious.  If you want to be sticky about the feature freeze rule,
> someone could generate a diff to remove the variables and post it to
> -patches before July 1, and then it would be fully per-rules to evaluate
> it after July 1.

That'd be needlessly legalistic ... I propose we stick to the "spirit"
of the rules, rather than the letter.

> I vote not to require ourselves to go through that pushup.

I agree.

-- 
Alvaro Herrera (<alvherre[a]surnet.cl>)
"Cada quien es cada cual y baja las escaleras como quiere" (JMSerrat)


Re: commit_delay, siblings

From
Josh Berkus
Date:
Tom,

Incidentally, I have tests in the queue.   It's just that the STP has been 
very unreliable for the last month so I've not been able to get definitive 
test results.

More important than commit_*, is, of course the WAL/CRC stuff for 
checkpoint cost, which I'm also getting impatient to test.   Will be 
setting up my own test machines today ...

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco


Re: commit_delay, siblings

From
Simon Riggs
Date:
On Wed, 2005-06-22 at 11:11 -0700, Josh Berkus wrote: 
> Hans, Tom,
> 
> > We have done extensive testing some time ago.
> > We could not see any difference on any platform we have tested (AIX,
> > Linux, Solaris). I don't think that there is one at all - at least not
> > on common systems.
> 
> Keen then.  Any objections to removing the GUC?   We desperately need means 
> to cut down on GUC options.

Group commit is a well-documented technique for improving performance,
but the gains only show themselves on very busy systems. It is possible
in earlier testing any apparent value was actually hidden by the
BufMgrLock issues we have now resolved in 8.1. We now see XLogInsert as
being very nearly the highest routine on the oprofile. That tells me
that it could now be time for group commit to show us some value, if any
exists.

DB2 and Berkeley-DB use group commit, while other rdbms use log writer
processes which effectively provide the same thing. It would surprise me
if we were unable to make use of such a technique, and worry me too.

I would ask that we hold off on their execution, at least for the
complete 8.1 beta performance test cycle. We may yet see gains albeit,
as Tom points out, that benefit may only be possible on only some
platforms.

Best Regards, Simon Riggs



Re: commit_delay, siblings

From
"Michael Paesold"
Date:
Simon Riggs wrote:
> Group commit is a well-documented technique for improving performance,
> but the gains only show themselves on very busy systems. It is possible
> in earlier testing any apparent value was actually hidden by the
> BufMgrLock issues we have now resolved in 8.1. We now see XLogInsert as
> being very nearly the highest routine on the oprofile. That tells me
> that it could now be time for group commit to show us some value, if any
> exists.
>
> DB2 and Berkeley-DB use group commit, while other rdbms use log writer
> processes which effectively provide the same thing. It would surprise me
> if we were unable to make use of such a technique, and worry me too.
>
> I would ask that we hold off on their execution, at least for the
> complete 8.1 beta performance test cycle. We may yet see gains albeit,
> as Tom points out, that benefit may only be possible on only some
> platforms.

I don't remember the details exactly, but isn't it so that postgres has some 
kind of group commits even without the commit_delay option? I.e. when 
several backends are waiting for commit concurrently, the one to get to 
commit will actually commit wal for all waiting transactions to disk?

I remember the term "ganged wal writes" or something similar. Tom, can you 
elaborate on this? Please tell me if I am totally off track. ;-)

Best Regards,
Michael Paesold 



Re: commit_delay, siblings

From
Bruce Momjian
Date:
Simon Riggs wrote:
> On Wed, 2005-06-22 at 11:11 -0700, Josh Berkus wrote: 
> > Hans, Tom,
> > 
> > > We have done extensive testing some time ago.
> > > We could not see any difference on any platform we have tested (AIX,
> > > Linux, Solaris). I don't think that there is one at all - at least not
> > > on common systems.
> > 
> > Keen then.  Any objections to removing the GUC?   We desperately need means 
> > to cut down on GUC options.
> 
> Group commit is a well-documented technique for improving performance,
> but the gains only show themselves on very busy systems. It is possible
> in earlier testing any apparent value was actually hidden by the
> BufMgrLock issues we have now resolved in 8.1. We now see XLogInsert as
> being very nearly the highest routine on the oprofile. That tells me
> that it could now be time for group commit to show us some value, if any
> exists.
> 
> DB2 and Berkeley-DB use group commit, while other rdbms use log writer
> processes which effectively provide the same thing. It would surprise me
> if we were unable to make use of such a technique, and worry me too.
> 
> I would ask that we hold off on their execution, at least for the
> complete 8.1 beta performance test cycle. We may yet see gains albeit,
> as Tom points out, that benefit may only be possible on only some
> platforms.

Interesting.  I didn't know other databases used group commits.  Your
idea of keeping it for the 8.1 testing cycle has merit.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: commit_delay, siblings

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> Group commit is a well-documented technique for improving performance,

The issue here is not "is group commit a good idea in the abstract?".
It is "is the commit_delay implementation of the idea worth a dime?"
... and the evidence we have all points to the answer "NO".  We should
not let theoretical arguments blind us to this.

I posted an analysis some time ago showing that under heavy load,
we already have the effect of ganged commits, without commit_delay:
http://archives.postgresql.org/pgsql-hackers/2002-10/msg00331.php

It's likely that there is more we can and should do, but that doesn't
mean that commit_delay is the right answer.  commit_delay doesn't do
anything to encourage ganging of writes, it just inserts an arbitrary
delay that's not synchronized to anything, and is probably an order
of magnitude too large anyway on most platforms.

> I would ask that we hold off on their execution, at least for the
> complete 8.1 beta performance test cycle.

I'm willing to wait a week while Tatsuo runs some fresh tests.  I'm
not willing to wait indefinitely for evidence that I'm privately
certain will not be forthcoming.
        regards, tom lane


Re: commit_delay, siblings

From
Kenneth Marshall
Date:
On Wed, Jun 29, 2005 at 08:14:36AM +0100, Simon Riggs wrote:
> 
> Group commit is a well-documented technique for improving performance,
> but the gains only show themselves on very busy systems. It is possible
> in earlier testing any apparent value was actually hidden by the
> BufMgrLock issues we have now resolved in 8.1. We now see XLogInsert as
> being very nearly the highest routine on the oprofile. That tells me
> that it could now be time for group commit to show us some value, if any
> exists.
> 
> DB2 and Berkeley-DB use group commit, while other rdbms use log writer
> processes which effectively provide the same thing. It would surprise me
> if we were unable to make use of such a technique, and worry me too.
> 
> I would ask that we hold off on their execution, at least for the
> complete 8.1 beta performance test cycle. We may yet see gains albeit,
> as Tom points out, that benefit may only be possible on only some
> platforms.
> 
> Best Regards, Simon Riggs
> 
> ---------------------------(end of broadcast)---------------------------

I would like to wiegh in on Simon's side on this issue. The fact that
no benefit has been seen from the group commint yet may be in part do
to the current WAL fsync structure where a page at a time is sync'd.
I saw a patch/test just recently mentioned that showed dramatic
performance improvements, up to the level of "fsync = off", by writing
multiple blocks with a gather algorithm. I would hope that with a
similar patch, we should begin to see the benefit of the commit_delay
GUC.

Ken Marshall


Re: commit_delay, siblings

From
Simon Riggs
Date:
On Wed, 2005-06-29 at 10:16 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > Group commit is a well-documented technique for improving performance,
> 
> The issue here is not "is group commit a good idea in the abstract?".
> It is "is the commit_delay implementation of the idea worth a dime?"
> ... and the evidence we have all points to the answer "NO".  We should
> not let theoretical arguments blind us to this.

OK, sometimes I sound too theoretical when I do my World History of
RDBMS notes, :-) ... all I meant was "lets hold off till we've measured
it".

> > I would ask that we hold off on their execution, at least for the
> > complete 8.1 beta performance test cycle.
> 
> I'm willing to wait a week while Tatsuo runs some fresh tests.  I'm
> not willing to wait indefinitely for evidence that I'm privately
> certain will not be forthcoming.

I'm inclined to agree with you, but I see no need to move quickly. The
code's been there a while now.

Best Regards, Simon Riggs



Re: commit_delay, siblings

From
Tatsuo Ishii
Date:
Hi,

> Simon Riggs <simon@2ndquadrant.com> writes:
> > Group commit is a well-documented technique for improving performance,
> 
> The issue here is not "is group commit a good idea in the abstract?".
> It is "is the commit_delay implementation of the idea worth a dime?"
> ... and the evidence we have all points to the answer "NO".  We should
> not let theoretical arguments blind us to this.
> 
> I posted an analysis some time ago showing that under heavy load,
> we already have the effect of ganged commits, without commit_delay:
> http://archives.postgresql.org/pgsql-hackers/2002-10/msg00331.php
> 
> It's likely that there is more we can and should do, but that doesn't
> mean that commit_delay is the right answer.  commit_delay doesn't do
> anything to encourage ganging of writes, it just inserts an arbitrary
> delay that's not synchronized to anything, and is probably an order
> of magnitude too large anyway on most platforms.
> 
> > I would ask that we hold off on their execution, at least for the
> > complete 8.1 beta performance test cycle.
> 
> I'm willing to wait a week while Tatsuo runs some fresh tests.  I'm
> not willing to wait indefinitely for evidence that I'm privately
> certain will not be forthcoming.
> 
>             regards, tom lane

Here are the results from testings I did this morning.

Summary:
The effect of commit_delay cannot be ignored. I got almost 3 times
performance differnce among different commit_delay settings.

Details:

Xeon 2.8GHz x2, HT on, mem 2GB, Ultra 320 SCSI, 15000RPM, HT on
Redhat AS 3/kernel 2.4.21( 2.4.21-9.30AXsmp)
PostgreSQL current (July 2 12:18 JST)

FS:
/dev/cciss/c0d0p3      28G  2.1G   25G   8% /
/dev/cciss/c0d0p1     985M   28M  907M   3% /boot
/dev/cciss/c0d1p1      67G  1.7G   62G   3% /data1
/dev/cciss/c0d2p1      67G   33M   64G   1% /data2
/dev/cciss/c0d3p1      67G   33M   64G   1% /data3
none                  1.3G     0  1.3G   0% /dev/shm

OS & PostgreSQL binaries are on /. data is on /data1.

All postgresql.conf directives are set to defaults except:

max_connections = 512
shared_buffers = 10000

Benchmarking is done using pgbench. The test database was initialized
by following commands:
pgbench -i -s 100 test (10,000,000 rows in accounts table)

case 1: commit_delay = 0
$ time pgbench -N -c 128 -t 100 test (128 concurrent uses)
starting vacuum...end.
transaction type: Update only accounts
scaling factor: 100
number of clients: 128
number of transactions per client: 100
number of transactions actually processed: 12800/12800
tps = 47.400291 (including connections establishing)
tps = 47.509689 (excluding connections establishing)

real    4m30.065s
user    0m3.530s
sys     0m11.210s

case 2: commit_delay = 10
starting vacuum...end.
transaction type: Update only accounts
scaling factor: 100
number of clients: 128
number of transactions per client: 100
number of transactions actually processed: 12800/12800
tps = 140.024294 (including connections establishing)
tps = 141.038901 (excluding connections establishing)

real    1m31.431s
user    0m2.340s
sys     0m5.850s

case 3: commit_delay = 50
starting vacuum...end.
transaction type: Update only accounts
scaling factor: 100
number of clients: 128
number of transactions per client: 100
number of transactions actually processed: 12800/12800
tps = 137.207500 (including connections establishing)
tps = 138.083489 (excluding connections establishing)

real    1m33.312s
user    0m2.790s
sys     0m6.490s

case 4: commit_delay = 100
starting vacuum...end.
transaction type: Update only accounts
scaling factor: 100
number of clients: 128
number of transactions per client: 100
number of transactions actually processed: 12800/12800
tps = 133.458149 (including connections establishing)
tps = 134.298841 (excluding connections establishing)

real    1m35.931s
user    0m2.750s
sys     0m7.030s

As you can see commit_delay = 10 outperforms commit_delay = 0 by 3
times.
--
Tatsuo Ishii