Thread: CommitDelay performance improvement

CommitDelay performance improvement

From
Tom Lane
Date:
Looking at the XLOG stuff, I notice that we already have a field
(logRec) in the per-backend PROC structures that shows whether a
transaction is currently in progress with at least one change made
(ie at least one XLOG entry written).

It would be very easy to extend the existing code so that the commit
delay is not done unless there is at least one other backend with
nonzero logRec --- or, more generally, at least N other backends with
nonzero logRec.  We cannot tell if any of them are actually nearing
their commits, but this seems better than just blindly waiting.  Larger
values of N would presumably improve the odds that at least one of them
is nearing its commit.

A further refinement, still quite cheap to implement since the info is
in the PROC struct, would be to not count backends that are blocked
waiting for locks.  These guys are less likely to be ready to commit
in the next few milliseconds than the guys who are actively running;
indeed they cannot commit until someone else has committed/aborted to
release the lock they need.

Comments?  What should the threshold N be ... or do we need to make
that a tunable parameter?
        regards, tom lane


Re: CommitDelay performance improvement

From
Bruce Momjian
Date:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Why not just set a flag in there when someone nears commit and clear
> > when they are about to commit?
> 
> Define "nearing commit", in such a way that you can specify where you
> plan to set that flag.

Is there significant time between entry of CommitTransaction() and the
fsync()?  Maybe not.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: CommitDelay performance improvement

From
Bruce Momjian
Date:
> Looking at the XLOG stuff, I notice that we already have a field
> (logRec) in the per-backend PROC structures that shows whether a
> transaction is currently in progress with at least one change made
> (ie at least one XLOG entry written).
> 
> It would be very easy to extend the existing code so that the commit
> delay is not done unless there is at least one other backend with
> nonzero logRec --- or, more generally, at least N other backends with
> nonzero logRec.  We cannot tell if any of them are actually nearing
> their commits, but this seems better than just blindly waiting.  Larger
> values of N would presumably improve the odds that at least one of them
> is nearing its commit.

Why not just set a flag in there when someone nears commit and clear
when they are about to commit?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: CommitDelay performance improvement

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Is there significant time between entry of CommitTransaction() and the
> fsync()?  Maybe not.

I doubt it.  No I/O anymore, anyway, unless the commit record happens to
overrun an xlog block boundary.
        regards, tom lane


Re: CommitDelay performance improvement

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Why not just set a flag in there when someone nears commit and clear
> when they are about to commit?

Define "nearing commit", in such a way that you can specify where you
plan to set that flag.
        regards, tom lane


Re: CommitDelay performance improvement

From
Bruce Momjian
Date:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Is there significant time between entry of CommitTransaction() and the
> > fsync()?  Maybe not.
> 
> I doubt it.  No I/O anymore, anyway, unless the commit record happens to
> overrun an xlog block boundary.

That's what I was afraid of.  Since we don't write the dirty blocks to
the kernel anymore, we don't really have much happening before someone
says they are about to commit.  In the old days, we were write()'ing
those buffers, and we had some delay and kernel calls in there.

Guess that idea is dead.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: CommitDelay performance improvement

From
ncm@zembu.com (Nathan Myers)
Date:
On Fri, Feb 23, 2001 at 11:32:21AM -0500, Tom Lane wrote:
> A further refinement, still quite cheap to implement since the info is
> in the PROC struct, would be to not count backends that are blocked
> waiting for locks.  These guys are less likely to be ready to commit
> in the next few milliseconds than the guys who are actively running;
> indeed they cannot commit until someone else has committed/aborted to
> release the lock they need.
> 
> Comments?  What should the threshold N be ... or do we need to make
> that a tunable parameter?

Once you make it tuneable, you're stuck with it.  You can always add
a knob later, after somebody discovers a real need.

Nathan Myers
ncm@zembu.com


Re: CommitDelay performance improvement

From
Bruce Momjian
Date:
> On Fri, Feb 23, 2001 at 11:32:21AM -0500, Tom Lane wrote:
> > A further refinement, still quite cheap to implement since the info is
> > in the PROC struct, would be to not count backends that are blocked
> > waiting for locks.  These guys are less likely to be ready to commit
> > in the next few milliseconds than the guys who are actively running;
> > indeed they cannot commit until someone else has committed/aborted to
> > release the lock they need.
> > 
> > Comments?  What should the threshold N be ... or do we need to make
> > that a tunable parameter?
> 
> Once you make it tuneable, you're stuck with it.  You can always add
> a knob later, after somebody discovers a real need.

I wonder if Tom should implement it, but leave it at zero until people
can report that a non-zero helps.  We already have the parameter, we can
just make it smarter and let people test it.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: CommitDelay performance improvement

From
Tom Lane
Date:
ncm@zembu.com (Nathan Myers) writes:
>> Comments?  What should the threshold N be ... or do we need to make
>> that a tunable parameter?

> Once you make it tuneable, you're stuck with it.  You can always add
> a knob later, after somebody discovers a real need.

If we had a good idea what the default level should be, I'd be willing
to go without a knob.  I'm thinking of a default of about 5 (ie, at
least 5 other active backends to trigger a commit delay) ... but I'm not
so confident of that that I think it needn't be tunable.  It's really
dependent on your average and peak transaction lengths, and that's
going to vary across installations, so unless we want to try to make it
self-adjusting, a knob seems like a good idea.

A self-adjusting delay might well be a great idea, BTW, but I'm trying
to be conservative about how much complexity we should add right now.
        regards, tom lane


Re: CommitDelay performance improvement

From
Bruce Momjian
Date:
> ncm@zembu.com (Nathan Myers) writes:
> >> Comments?  What should the threshold N be ... or do we need to make
> >> that a tunable parameter?
> 
> > Once you make it tuneable, you're stuck with it.  You can always add
> > a knob later, after somebody discovers a real need.
> 
> If we had a good idea what the default level should be, I'd be willing
> to go without a knob.  I'm thinking of a default of about 5 (ie, at
> least 5 other active backends to trigger a commit delay) ... but I'm not
> so confident of that that I think it needn't be tunable.  It's really
> dependent on your average and peak transaction lengths, and that's
> going to vary across installations, so unless we want to try to make it
> self-adjusting, a knob seems like a good idea.
> 
> A self-adjusting delay might well be a great idea, BTW, but I'm trying
> to be conservative about how much complexity we should add right now.

OH, so you are saying N backends should have dirtied buffers before
doing the delay?  Hmm, that seems almost untunable to me.

Let's suppose we decide to sleep.  When we wake up, can we know that
someone else has fsync'ed for us?  And if they have, should we be more
likely to fsync() in the future?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: CommitDelay performance improvement

From
Bruce Momjian
Date:
> > And if they have, should we be more
> > likely to fsync() in the future?

I meant more likely to sleep().

> You mean less likely.  My thought for a self-adjusting delay was to
> ratchet the delay up a little every time it succeeds in avoiding an
> fsync, and down a little every time it fails to do so.  No change when
> we don't delay at all (because of no other active backends).  But
> testing this and making sure it behaves reasonably seems like more work
> than we should try to accomplish before 7.1.

It could be tough.  Imagine the delay increasing to 3 seconds?  Seems
there has to be an upper bound on the sleep.  The more you delay, the
more likely you will be to find someone to fsync you.  Are we waking
processes up after we have fsync()'ed them?  If so, we can keep
increasing the delay.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: CommitDelay performance improvement

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> A self-adjusting delay might well be a great idea, BTW, but I'm trying
>> to be conservative about how much complexity we should add right now.

> OH, so you are saying N backends should have dirtied buffers before
> doing the delay?  Hmm, that seems almost untunable to me.

> Let's suppose we decide to sleep.  When we wake up, can we know that
> someone else has fsync'ed for us?

XLogFlush will find that it has nothing to do, so yes we can.

> And if they have, should we be more
> likely to fsync() in the future?

You mean less likely.  My thought for a self-adjusting delay was to
ratchet the delay up a little every time it succeeds in avoiding an
fsync, and down a little every time it fails to do so.  No change when
we don't delay at all (because of no other active backends).  But
testing this and making sure it behaves reasonably seems like more work
than we should try to accomplish before 7.1.
        regards, tom lane


Re: CommitDelay performance improvement

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> It could be tough.  Imagine the delay increasing to 3 seconds?  Seems
> there has to be an upper bound on the sleep.  The more you delay, the
> more likely you will be to find someone to fsync you.

Good point, and an excellent illustration of the fact that
self-adjusting algorithms aren't that easy to get right the first
time ;-)

> Are we waking processes up after we have fsync()'ed them?

Not at the moment.  That would be another good mechanism to investigate
for 7.2; but right now there's no infrastructure that would allow a
backend to discover which other ones were sleeping for fsync.
        regards, tom lane


Re: CommitDelay performance improvement

From
ncm@zembu.com (Nathan Myers)
Date:
On Fri, Feb 23, 2001 at 05:18:19PM -0500, Tom Lane wrote:
> ncm@zembu.com (Nathan Myers) writes:
> >> Comments?  What should the threshold N be ... or do we need to make
> >> that a tunable parameter?
> 
> > Once you make it tuneable, you're stuck with it.  You can always add
> > a knob later, after somebody discovers a real need.
> 
> If we had a good idea what the default level should be, I'd be willing
> to go without a knob.  I'm thinking of a default of about 5 (ie, at
> least 5 other active backends to trigger a commit delay) ... but I'm not
> so confident of that that I think it needn't be tunable.  It's really
> dependent on your average and peak transaction lengths, and that's
> going to vary across installations, so unless we want to try to make it
> self-adjusting, a knob seems like a good idea.
> 
> A self-adjusting delay might well be a great idea, BTW, but I'm trying
> to be conservative about how much complexity we should add right now.

When thinking about tuning N, I like to consider what are the interesting 
possible values for N:
 0: Ignore any other potential committers. 1: The minimum possible responsiveness to other committers. 5: Tom's guess
forwhat might be a good choice. 10: Harry's guess. ~0: Always delay.
 

I would rather release with N=1 than with 0, because it actually responds 
to conditions.  What N might best be, >1, probably varies on a lot of 
hard-to-guess parameters.

It seems to me that comparing various choices (and other, more interesting,
algorithms) to the N=1 case would be more productive than comparing them 
to the N=0 case, so releasing at N=1 would yield better statistics for 
actually tuning in 7.2.

Nathan Myers
ncm@zembu.com


Re: CommitDelay performance improvement

From
Bruce Momjian
Date:
> When thinking about tuning N, I like to consider what are the interesting 
> possible values for N:
> 
>   0: Ignore any other potential committers.
>   1: The minimum possible responsiveness to other committers.
>   5: Tom's guess for what might be a good choice.
>   10: Harry's guess.
>   ~0: Always delay.
> 
> I would rather release with N=1 than with 0, because it actually responds 
> to conditions.  What N might best be, >1, probably varies on a lot of 
> hard-to-guess parameters.
> 
> It seems to me that comparing various choices (and other, more interesting,
> algorithms) to the N=1 case would be more productive than comparing them 
> to the N=0 case, so releasing at N=1 would yield better statistics for 
> actually tuning in 7.2.

We don't release code becuase it has better tuning oportunities for
later releases.  What we can do is give people parameters where the
default is safe, and they can play and report to us.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: CommitDelay performance improvement

From
Bruce Momjian
Date:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > It could be tough.  Imagine the delay increasing to 3 seconds?  Seems
> > there has to be an upper bound on the sleep.  The more you delay, the
> > more likely you will be to find someone to fsync you.
> 
> Good point, and an excellent illustration of the fact that
> self-adjusting algorithms aren't that easy to get right the first
> time ;-)

I see.  I am concerned that anything done to 7.1 at this point may cause
problems with performance under certain circumstances.  Let's see what
the new code shows our testers.

> 
> > Are we waking processes up after we have fsync()'ed them?
> 
> Not at the moment.  That would be another good mechanism to investigate
> for 7.2; but right now there's no infrastructure that would allow a
> backend to discover which other ones were sleeping for fsync.

Can we put the backends to sleep waiting for a lock, and have them wake
up later?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: CommitDelay performance improvement

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Can we put the backends to sleep waiting for a lock, and have them wake
> up later?

Locks don't have timeouts.  There is no existing mechanism that will
serve this purpose; we'll have to create a new one.
        regards, tom lane


Re: CommitDelay performance improvement

From
ncm@zembu.com (Nathan Myers)
Date:
On Fri, Feb 23, 2001 at 06:37:06PM -0500, Bruce Momjian wrote:
> > When thinking about tuning N, I like to consider what are the interesting 
> > possible values for N:
> > 
> >   0: Ignore any other potential committers.
> >   1: The minimum possible responsiveness to other committers.
> >   5: Tom's guess for what might be a good choice.
> >   10: Harry's guess.
> >   ~0: Always delay.
> > 
> > I would rather release with N=1 than with 0, because it actually
> > responds to conditions. What N might best be, >1, probably varies on
> > a lot of hard-to-guess parameters.
> >
> > It seems to me that comparing various choices (and other, more
> > interesting, algorithms) to the N=1 case would be more productive
> > than comparing them to the N=0 case, so releasing at N=1 would yield
> > better statistics for actually tuning in 7.2.
>
> We don't release code because it has better tuning opportunities for
> later releases. What we can do is give people parameters where the
> default is safe, and they can play and report to us.

Perhaps I misunderstood.  I had perceived N=1 as a conservative choice
that was nevertheless preferable to N=0.

Nathan Myers
ncm@zembu.com


Re: CommitDelay performance improvement

From
Bruce Momjian
Date:
> > > It seems to me that comparing various choices (and other, more
> > > interesting, algorithms) to the N=1 case would be more productive
> > > than comparing them to the N=0 case, so releasing at N=1 would yield
> > > better statistics for actually tuning in 7.2.
> >
> > We don't release code because it has better tuning opportunities for
> > later releases. What we can do is give people parameters where the
> > default is safe, and they can play and report to us.
> 
> Perhaps I misunderstood.  I had perceived N=1 as a conservative choice
> that was nevertheless preferable to N=0.

I think zero delay is the conservative choice at this point, unless we
hear otherwise from testers.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: CommitDelay performance improvement

From
Bruce Momjian
Date:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Can we put the backends to sleep waiting for a lock, and have them wake
> > up later?
> 
> Locks don't have timeouts.  There is no existing mechanism that will
> serve this purpose; we'll have to create a new one.

That is what I suspected.

Having thought about it, We currently have a few options:
1) let every backend fsync on its own2) try to delay backends so they all fsync() at the same time3) delay fsync until
aftercommit
 

Items 2 and 3 attempt to bunch up fsyncs.  Option 2 has backends waiting
to fsync() on the expectation that some other backend may commit soon. 
Option 3 I may turn out to be the best solution. No matter how smart we
make the code, we will never know for sure if someone is about to commit
and whether it is worth waiting.

My idea would be to let committing backends return "COMMIT" to the user,
and set a need_fsync flag that is guaranteed to cause an fsync within X
milliseconds.  This way, if other backends commit in the next X
millisecond, they can all use one fsync().

Now, I know many will complain that we are returning commit while not
having the stuff on the platter.  But consider, we only lose data from a
OS crash or hardware failure.  Do people who commit something, and then
the machines crashes 2 milliseconds after the commit, really expect the
data to be on the disk when they restart?  Maybe they do, but it seems
the benefit of grouped fsyncs() is large enough that many will say they
would rather have this option.

This was my point long ago that we could offer sub-second reliability
with no-fsync performance if we just had some process running that wrote
dirty pages and fsynced every 20 milliseconds.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: CommitDelay performance improvement

From
Philip Warner
Date:
At 21:31 23/02/01 -0500, Bruce Momjian wrote:
>Now, I know many will complain that we are returning commit while not
>having the stuff on the platter. 

You're definitely right there.

>Maybe they do, but it seems
>the benefit of grouped fsyncs() is large enough that many will say they
>would rather have this option.

I'd prefer to wait for a lock manager that supports timeouts and contention
notification.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: CommitDelay performance improvement

From
Bruce Momjian
Date:
> At 21:31 23/02/01 -0500, Bruce Momjian wrote:
> >Now, I know many will complain that we are returning commit while not
> >having the stuff on the platter. 
> 
> You're definitely right there.
> 
> >Maybe they do, but it seems
> >the benefit of grouped fsyncs() is large enough that many will say they
> >would rather have this option.
> 
> I'd prefer to wait for a lock manager that supports timeouts and contention
> notification.

I understand, and if that was going to fix the problem completely, but
it isn't.  It is just going to allow us more flexibility at guessing who
may be about to commit.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: CommitDelay performance improvement

From
Philip Warner
Date:
At 14:57 23/02/01 -0800, Nathan Myers wrote:
>
>When thinking about tuning N, I like to consider what are the interesting 
>possible values for N:
>

It may have been much earler in the debate, but has anyone checked to see
what the maximum possible gains might be - or is it self-evident to people
who know the code?

Would it be worth considering creating a test case with no flush in
RecordTransactionCommit, and rely on checkpointing to flush? I realize this
is never an option in production, but is it possible to modify the code in
this way? I *should* give an upper limit on the gains that can be made by
flushing at the best possible time.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: CommitDelay performance improvement

From
Philip Warner
Date:
At 11:32 23/02/01 -0500, Tom Lane wrote:
>Looking at the XLOG stuff, I notice that we already have a field
>(logRec) in the per-backend PROC structures that shows whether a
>transaction is currently in progress with at least one change made
>(ie at least one XLOG entry written).

Would it be worth adding a field 'waiting for fsync since xxx', so the
second process can (a) log that it is expecting someone else to FSYNC (for
perf stats, if we want them), and (b) wait for (xxx + delta)ms/us etc?




----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: CommitDelay performance improvement

From
Philip Warner
Date:
At 23:14 23/02/01 -0500, Bruce Momjian wrote:
>
>There is one more thing.  Even though the kernel says the data is on the
>platter, it still may not be there.

This is true, but it does not mean we should say 'the disk is slightly
unreliable, so we can be too'. Also, IIRC, the last time this was
discussed, someone commented that buying expensive disks and a UPS gets you
reliability (barring a direct lightining strike) - it had something to do
with write-ordering and hardware caches. In any case, I'd hate to see DB
design decisions based closely on harware capability. At least two of my
customers use high performance ram disks for databases - do these also
suffer from 'flush is not really flush' problems?

>Basically, I am not sure how much we lose by doing the delay after
>returning COMMIT, and I know we gain quite a bit by enabling us to group
>fsync calls.

If included, this should be an option only, and not the default option. In
fact I'd quite like to see such a feature, although I'd not only do a
'flush every X ms', but I'd also do a 'flush every X transactions' - this
way a DBA can say 'I dont mind losing the last 20 TXs in a crash'. Bear in
mind that on a fast system, 20ms is a lot of transactions.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: CommitDelay performance improvement

From
ncm@zembu.com (Nathan Myers)
Date:
On Fri, Feb 23, 2001 at 09:05:20PM -0500, Bruce Momjian wrote:
> > > > It seems to me that comparing various choices (and other, more
> > > > interesting, algorithms) to the N=1 case would be more productive
> > > > than comparing them to the N=0 case, so releasing at N=1 would yield
> > > > better statistics for actually tuning in 7.2.
> > >
> > > We don't release code because it has better tuning opportunities for
> > > later releases. What we can do is give people parameters where the
> > > default is safe, and they can play and report to us.
> > 
> > Perhaps I misunderstood.  I had perceived N=1 as a conservative choice
> > that was nevertheless preferable to N=0.
> 
> I think zero delay is the conservative choice at this point, unless we
> hear otherwise from testers.

I see, I had it backwards: N=0 corresponds to "always delay", and 
N=infinity (~0) is "never delay", or what you call zero delay.  N=1 is 
not interesting.  N=M/2 or N=sqrt(M) or N=log(M) might be interesting, 
where M is the number of backends, or the number of backends with begun 
transactions, or something.  N=10 would be conservative (and maybe 
pointless) just because it would hardly ever trigger a delay.

Nathan Myers
ncm@zembu.com


Re: CommitDelay performance improvement

From
Bruce Momjian
Date:
> At 23:14 23/02/01 -0500, Bruce Momjian wrote:
> >
> >There is one more thing.  Even though the kernel says the data is on the
> >platter, it still may not be there.
> 
> This is true, but it does not mean we should say 'the disk is slightly
> unreliable, so we can be too'. Also, IIRC, the last time this was
> discussed, someone commented that buying expensive disks and a UPS gets you
> reliability (barring a direct lightining strike) - it had something to do
> with write-ordering and hardware caches. In any case, I'd hate to see DB
> design decisions based closely on harware capability. At least two of my
> customers use high performance ram disks for databases - do these also
> suffer from 'flush is not really flush' problems?

Well, I am saying we are being pretty rigid here when we may be on top
of a system that is not, meaning that our rigidity is buying us little.

> 
> >Basically, I am not sure how much we lose by doing the delay after
> >returning COMMIT, and I know we gain quite a bit by enabling us to group
> >fsync calls.
> 
> If included, this should be an option only, and not the default option. In
> fact I'd quite like to see such a feature, although I'd not only do a
> 'flush every X ms', but I'd also do a 'flush every X transactions' - this
> way a DBA can say 'I dont mind losing the last 20 TXs in a crash'. Bear in
> mind that on a fast system, 20ms is a lot of transactions.

Yes, I can see this as a good option for many users.  My old complaint
was that we allowed only two very extreme options, fsync() all the time,
or fsync() never and recover from a crash.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: CommitDelay performance improvement

From
Bruce Momjian
Date:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > My idea would be to let committing backends return "COMMIT" to the user,
> > and set a need_fsync flag that is guaranteed to cause an fsync within X
> > milliseconds.  This way, if other backends commit in the next X
> > millisecond, they can all use one fsync().
> 
> Guaranteed by what?  We have no mechanism available to make an fsync
> happen while the backend is waiting for input.

We would need a separate binary that can look at shared memory and fsync
is someone requested it.  Again, nothing for 7.1.X.

> > Now, I know many will complain that we are returning commit while not
> > having the stuff on the platter.
> 
> I think that's unacceptable on its face.  A remote client may take
> action on the basis that COMMIT was returned.  If the server then
> crashes, the client is unlikely to realize this for some time (certainly
> at least one TCP timeout interval).  It won't look like a "milliseconds
> later" situation to that client.  In fact, the client might *never*
> realize there was a problem; what if it disconnects after getting the
> COMMIT?
> 
> If the dbadmin thinks he doesn't need fsync before commit, he'll likely
> be running with fsync off anyway.  For the ones who do think they need
> fsync, I don't believe that we get to rearrange the fsync to occur after
> commit.

I can see someone wanting some fsync, but not take the hit.  My argument
is that having this ability, there would be no need to turn off fsync.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: CommitDelay performance improvement

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> My idea would be to let committing backends return "COMMIT" to the user,
> and set a need_fsync flag that is guaranteed to cause an fsync within X
> milliseconds.  This way, if other backends commit in the next X
> millisecond, they can all use one fsync().

Guaranteed by what?  We have no mechanism available to make an fsync
happen while the backend is waiting for input.

> Now, I know many will complain that we are returning commit while not
> having the stuff on the platter.

I think that's unacceptable on its face.  A remote client may take
action on the basis that COMMIT was returned.  If the server then
crashes, the client is unlikely to realize this for some time (certainly
at least one TCP timeout interval).  It won't look like a "milliseconds
later" situation to that client.  In fact, the client might *never*
realize there was a problem; what if it disconnects after getting the
COMMIT?

If the dbadmin thinks he doesn't need fsync before commit, he'll likely
be running with fsync off anyway.  For the ones who do think they need
fsync, I don't believe that we get to rearrange the fsync to occur after
commit.
        regards, tom lane


Re: CommitDelay performance improvement

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> It may have been much earler in the debate, but has anyone checked to see
> what the maximum possible gains might be - or is it self-evident to people
> who know the code?

fsync off provides an upper bound to the speed achievable from being
smarter about when to fsync... I doubt that fsync-once-per-checkpoint
would be much different.
        regards, tom lane


Re: CommitDelay performance improvement

From
Bruce Momjian
Date:
> At 21:31 23/02/01 -0500, Bruce Momjian wrote:
> >Now, I know many will complain that we are returning commit while not
> >having the stuff on the platter. 
> 
> You're definitely right there.
> 
> >Maybe they do, but it seems
> >the benefit of grouped fsyncs() is large enough that many will say they
> >would rather have this option.
> 
> I'd prefer to wait for a lock manager that supports timeouts and contention
> notification.
> 

There is one more thing.  Even though the kernel says the data is on the
platter, it still may not be there.  Some OS's may return from fsync
when the data is _queued_ to the disk, rather than actually wanting for
the drive return code to say it completed.  Second, some disks report
back that the data is on the disk when it is actually in the disk memory
buffer, not really on the disk.

Basically, I am not sure how much we lose by doing the delay after
returning COMMIT, and I know we gain quite a bit by enabling us to group
fsync calls.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: CommitDelay performance improvement

From
Tom Lane
Date:
Preliminary results from experimenting with an
N-transactions-must-be-running-to-cause-commit-delay heuristic are
attached.  It seems to be a pretty definite win.  I'm currently running
a more extensive set of cases on another machine for comparison.

The test case is pgbench, unmodified, but run at scalefactor 10
to reduce write contention on the 'branch' rows.  Postmaster
parameters are -N 100 -B 1024 in all cases.  The fsync-off (with,
of course, no commit delay either) case is shown for comparison.
"commit siblings" is the number of other backends that must be
running active (unblocked, at least one XLOG entry made) transactions
before we will do a precommit delay.

commit delay=1 is effectively commit delay=10000 (10msec) on this
hardware.  Interestingly, it seems that we can push the delay up
to two or three clock ticks without degradation, given positive N.

            regards, tom lane


Attachment

Re: CommitDelay performance improvement

From
Tom Lane
Date:
ncm@zembu.com (Nathan Myers) writes:
> I see, I had it backwards: N=0 corresponds to "always delay", and 
> N=infinity (~0) is "never delay", or what you call zero delay.  N=1 is 
> not interesting.  N=M/2 or N=sqrt(M) or N=log(M) might be interesting, 
> where M is the number of backends, or the number of backends with begun 
> transactions, or something.  N=10 would be conservative (and maybe 
> pointless) just because it would hardly ever trigger a delay.

Why is N=1 not interesting?  That requires at least one other backend
to be in a transaction before you'll delay.  That would seem to be
the minimum useful value --- N=0 (always delay) seems clearly to be
too stupid to be useful.
        regards, tom lane


Re: CommitDelay performance improvement

From
Bruce Momjian
Date:
> Philip Warner <pjw@rhyme.com.au> writes:
> > It may have been much earler in the debate, but has anyone checked to see
> > what the maximum possible gains might be - or is it self-evident to people
> > who know the code?
> 
> fsync off provides an upper bound to the speed achievable from being
> smarter about when to fsync... I doubt that fsync-once-per-checkpoint
> would be much different.

That was my point, people should be doing fsync once per checkpoint
rather than never.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: CommitDelay performance improvement

From
ncm@zembu.com (Nathan Myers)
Date:
On Sat, Feb 24, 2001 at 01:07:17AM -0500, Tom Lane wrote:
> ncm@zembu.com (Nathan Myers) writes:
> > I see, I had it backwards: N=0 corresponds to "always delay", and 
> > N=infinity (~0) is "never delay", or what you call zero delay.  N=1 is 
> > not interesting.  N=M/2 or N=sqrt(M) or N=log(M) might be interesting, 
> > where M is the number of backends, or the number of backends with begun 
> > transactions, or something.  N=10 would be conservative (and maybe 
> > pointless) just because it would hardly ever trigger a delay.
> 
> Why is N=1 not interesting?  That requires at least one other backend
> to be in a transaction before you'll delay.  That would seem to be
> the minimum useful value --- N=0 (always delay) seems clearly to be
> too stupid to be useful.

N=1 seems arbitrarily aggressive.  It assumes any open transaction will 
commit within a few milliseconds; otherwise the delay is wasted.  On a 
fairly busy system, it seems to me to impose a strict upper limit on 
transaction rate for any client, regardless of actual system I/O load.  
(N=0 would impose that strict upper limit even for a single client.)

Delaying isn't free, because it means that the client can't turn around 
and do even a cheap query for a while.  In a sense, when you delay you are 
charging the committer a tax to try to improve overall throughput.  If the 
delay lets you reduce I/O churn enough to increase the total bandwidth, 
then it was worthwhile; if not, you just cut system performance, and 
responsiveness to each client, for nothing.

The above suggests that maybe N should depend on recent disk I/O activity,
so you get a larger N (and thus less likely delay and more certain payoff) 
for a more lightly-loaded system.  On a system that has maxed its I/O 
bandwidth, clients will suffer delays anyhow, so they might as well 
suffer controlled delays that result in better total throughput.  On a 
lightly-loaded system there's no need, or payoff, for such throttling.

Can we measure disk system load by averaging the times taken for fsyncs?

Nathan Myers
ncm@zembu.com


Re: CommitDelay performance improvement

From
Tom Lane
Date:
Attached are graphs from more thorough runs of pgbench with a commit
delay that occurs only when at least N other backends are running active
transactions.

My initial try at this proved to be too noisy to tell much.  The noise
seems to be coming from WAL checkpoints that occur during a run and
push down the reported TPS value for the particular case that's running.
While we'd need to include WAL checkpoints to make an honest performance
comparison against another RDBMS, I think they are best ignored for the
purpose of figuring out what the commit-delay behavior ought to be.
Accordingly, I modified my test script to minimize the occurrence of
checkpoint activity during runs (see attached script).  There are still
some data points that are unexpectedly low compared to their neighbors;
presumably these were affected by checkpoints or other system activity.

It's not entirely clear what set of parameters is best, but it is
absolutely clear that a flat zero-commit-delay policy is NOT best.

The test conditions are postmaster options -N 100 -B 1024, pgbench scale
factor 10, pgbench -t (transactions per client) 100.  (Hence the results
for a single client rely on only 100 transactions, and are pretty noisy.
The noise level should decrease as the number of clients increases.)

Comments anyone?
        regards, tom lane

#! /bin/sh

# Expected postmaster options: -N 100 -B 1024 -c checkpoint_timeout=1800
# Recommended pgbench setup: pgbench -i -s 10 bench

for del in 0 ; do
for sib in 1 ; do
for cli in 1 10 20 30 40 50 ; do
echo "commit_delay = $del"
echo "commit_siblings = $sib"
psql -c "vacuum branches; vacuum tellers; delete from history; vacuum history; checkpoint;" bench
PGOPTIONS="-c commit_delay=$del -c commit_siblings=$sib" \pgbench -c $cli -t 100 -n bench
done
done
done

for del in 10000 30000 50000 100000 ; do
for sib in 1 5 10 20 ; do
for cli in 1 10 20 30 40 50 ; do
echo "commit_delay = $del"
echo "commit_siblings = $sib"
psql -c "vacuum branches; vacuum tellers; delete from history; vacuum history; checkpoint;" bench
PGOPTIONS="-c commit_delay=$del -c commit_siblings=$sib" \pgbench -c $cli -t 100 -n bench
done
done
done

Re: CommitDelay performance improvement

From
Philip Warner
Date:
At 00:41 25/02/01 -0500, Tom Lane wrote:
>
>Comments anyone?
>

Don't suppose you could post the original data?


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: CommitDelay performance improvement

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> Don't suppose you could post the original data?

Sure.
        regards, tom lane

commit_delay = 0
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 1
number of transactions per client: 100
number of transactions actually processed: 100/100
tps = 10.996953(including connections establishing)
tps = 11.051216(excluding connections establishing)
commit_delay = 0
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 100
number of transactions actually processed: 1000/1000
tps = 17.779923(including connections establishing)
tps = 17.924390(excluding connections establishing)
commit_delay = 0
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 20
number of transactions per client: 100
number of transactions actually processed: 2000/2000
tps = 17.289815(including connections establishing)
tps = 17.429343(excluding connections establishing)
commit_delay = 0
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 30
number of transactions per client: 100
number of transactions actually processed: 3000/3000
tps = 17.292171(including connections establishing)
tps = 17.432905(excluding connections establishing)
commit_delay = 0
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 40
number of transactions per client: 100
number of transactions actually processed: 4000/4000
tps = 17.733478(including connections establishing)
tps = 17.913251(excluding connections establishing)
commit_delay = 0
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 50
number of transactions per client: 100
number of transactions actually processed: 5000/5000
tps = 18.325273(including connections establishing)
tps = 18.534556(excluding connections establishing)
commit_delay = 10000
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 1
number of transactions per client: 100
number of transactions actually processed: 100/100
tps = 10.449347(including connections establishing)
tps = 10.500278(excluding connections establishing)
commit_delay = 10000
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 100
number of transactions actually processed: 1000/1000
tps = 17.865721(including connections establishing)
tps = 18.015078(excluding connections establishing)
commit_delay = 10000
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 20
number of transactions per client: 100
number of transactions actually processed: 2000/2000
tps = 17.980234(including connections establishing)
tps = 18.131986(excluding connections establishing)
commit_delay = 10000
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 30
number of transactions per client: 100
number of transactions actually processed: 3000/3000
tps = 18.858489(including connections establishing)
tps = 19.027436(excluding connections establishing)
commit_delay = 10000
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 40
number of transactions per client: 100
number of transactions actually processed: 4000/4000
tps = 19.320221(including connections establishing)
tps = 19.496999(excluding connections establishing)
commit_delay = 10000
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 50
number of transactions per client: 100
number of transactions actually processed: 5000/5000
tps = 19.440978(including connections establishing)
tps = 19.621221(excluding connections establishing)
commit_delay = 10000
commit_siblings = 5
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 1
number of transactions per client: 100
number of transactions actually processed: 100/100
tps = 11.298701(including connections establishing)
tps = 11.357102(excluding connections establishing)
commit_delay = 10000
commit_siblings = 5
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 100
number of transactions actually processed: 1000/1000
tps = 19.722266(including connections establishing)
tps = 19.903373(excluding connections establishing)
commit_delay = 10000
commit_siblings = 5
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 20
number of transactions per client: 100
number of transactions actually processed: 2000/2000
tps = 19.042737(including connections establishing)
tps = 19.214042(excluding connections establishing)
commit_delay = 10000
commit_siblings = 5
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 30
number of transactions per client: 100
number of transactions actually processed: 3000/3000
tps = 19.013869(including connections establishing)
tps = 19.185863(excluding connections establishing)
commit_delay = 10000
commit_siblings = 5
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 40
number of transactions per client: 100
number of transactions actually processed: 4000/4000
tps = 20.081644(including connections establishing)
tps = 20.273612(excluding connections establishing)
commit_delay = 10000
commit_siblings = 5
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 50
number of transactions per client: 100
number of transactions actually processed: 5000/5000
tps = 20.379646(including connections establishing)
tps = 20.577183(excluding connections establishing)
commit_delay = 10000
commit_siblings = 10
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 1
number of transactions per client: 100
number of transactions actually processed: 100/100
tps = 10.896660(including connections establishing)
tps = 10.951360(excluding connections establishing)
commit_delay = 10000
commit_siblings = 10
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 100
number of transactions actually processed: 1000/1000
tps = 19.506836(including connections establishing)
tps = 19.686328(excluding connections establishing)
commit_delay = 10000
commit_siblings = 10
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 20
number of transactions per client: 100
number of transactions actually processed: 2000/2000
tps = 18.801060(including connections establishing)
tps = 18.968530(excluding connections establishing)
commit_delay = 10000
commit_siblings = 10
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 30
number of transactions per client: 100
number of transactions actually processed: 3000/3000
tps = 19.855547(including connections establishing)
tps = 20.044110(excluding connections establishing)
commit_delay = 10000
commit_siblings = 10
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 40
number of transactions per client: 100
number of transactions actually processed: 4000/4000
tps = 20.557934(including connections establishing)
tps = 20.760724(excluding connections establishing)
commit_delay = 10000
commit_siblings = 10
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 50
number of transactions per client: 100
number of transactions actually processed: 5000/5000
tps = 20.278060(including connections establishing)
tps = 20.473699(excluding connections establishing)
commit_delay = 10000
commit_siblings = 20
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 1
number of transactions per client: 100
number of transactions actually processed: 100/100
tps = 11.098777(including connections establishing)
tps = 11.155340(excluding connections establishing)
commit_delay = 10000
commit_siblings = 20
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 100
number of transactions actually processed: 1000/1000
tps = 18.638060(including connections establishing)
tps = 18.801436(excluding connections establishing)
commit_delay = 10000
commit_siblings = 20
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 20
number of transactions per client: 100
number of transactions actually processed: 2000/2000
tps = 19.815520(including connections establishing)
tps = 20.003053(excluding connections establishing)
commit_delay = 10000
commit_siblings = 20
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 30
number of transactions per client: 100
number of transactions actually processed: 3000/3000
tps = 20.034017(including connections establishing)
tps = 20.231631(excluding connections establishing)
commit_delay = 10000
commit_siblings = 20
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 40
number of transactions per client: 100
number of transactions actually processed: 4000/4000
tps = 20.676088(including connections establishing)
tps = 20.879088(excluding connections establishing)
commit_delay = 10000
commit_siblings = 20
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 50
number of transactions per client: 100
number of transactions actually processed: 5000/5000
tps = 20.692725(including connections establishing)
tps = 20.895842(excluding connections establishing)
commit_delay = 30000
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 1
number of transactions per client: 100
number of transactions actually processed: 100/100
tps = 11.160902(including connections establishing)
tps = 11.218247(excluding connections establishing)
commit_delay = 30000
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 100
number of transactions actually processed: 1000/1000
tps = 18.831596(including connections establishing)
tps = 19.000649(excluding connections establishing)
commit_delay = 30000
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 20
number of transactions per client: 100
number of transactions actually processed: 2000/2000
tps = 20.239767(including connections establishing)
tps = 20.434566(excluding connections establishing)
commit_delay = 30000
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 30
number of transactions per client: 100
number of transactions actually processed: 3000/3000
tps = 20.686848(including connections establishing)
tps = 20.891519(excluding connections establishing)
commit_delay = 30000
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 40
number of transactions per client: 100
number of transactions actually processed: 4000/4000
tps = 21.014861(including connections establishing)
tps = 21.224443(excluding connections establishing)
commit_delay = 30000
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 50
number of transactions per client: 100
number of transactions actually processed: 5000/5000
tps = 21.315164(including connections establishing)
tps = 21.533027(excluding connections establishing)
commit_delay = 30000
commit_siblings = 5
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 1
number of transactions per client: 100
number of transactions actually processed: 100/100
tps = 11.384356(including connections establishing)
tps = 11.444286(excluding connections establishing)
commit_delay = 30000
commit_siblings = 5
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 100
number of transactions actually processed: 1000/1000
tps = 18.614866(including connections establishing)
tps = 18.780395(excluding connections establishing)
commit_delay = 30000
commit_siblings = 5
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 20
number of transactions per client: 100
number of transactions actually processed: 2000/2000
tps = 20.462955(including connections establishing)
tps = 20.661262(excluding connections establishing)
commit_delay = 30000
commit_siblings = 5
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 30
number of transactions per client: 100
number of transactions actually processed: 3000/3000
tps = 20.769457(including connections establishing)
tps = 20.975243(excluding connections establishing)
commit_delay = 30000
commit_siblings = 5
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 40
number of transactions per client: 100
number of transactions actually processed: 4000/4000
tps = 19.280678(including connections establishing)
tps = 19.457795(excluding connections establishing)
commit_delay = 30000
commit_siblings = 5
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 50
number of transactions per client: 100
number of transactions actually processed: 5000/5000
tps = 20.852166(including connections establishing)
tps = 21.057769(excluding connections establishing)
commit_delay = 30000
commit_siblings = 10
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 1
number of transactions per client: 100
number of transactions actually processed: 100/100
tps = 11.129848(including connections establishing)
tps = 11.188346(excluding connections establishing)
commit_delay = 30000
commit_siblings = 10
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 100
number of transactions actually processed: 1000/1000
tps = 19.154248(including connections establishing)
tps = 19.328718(excluding connections establishing)
commit_delay = 30000
commit_siblings = 10
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 20
number of transactions per client: 100
number of transactions actually processed: 2000/2000
tps = 19.487838(including connections establishing)
tps = 19.668323(excluding connections establishing)
commit_delay = 30000
commit_siblings = 10
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 30
number of transactions per client: 100
number of transactions actually processed: 3000/3000
tps = 20.387741(including connections establishing)
tps = 20.586291(excluding connections establishing)
commit_delay = 30000
commit_siblings = 10
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 40
number of transactions per client: 100
number of transactions actually processed: 4000/4000
tps = 21.187943(including connections establishing)
tps = 21.403037(excluding connections establishing)
commit_delay = 30000
commit_siblings = 10
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 50
number of transactions per client: 100
number of transactions actually processed: 5000/5000
tps = 20.870339(including connections establishing)
tps = 21.080454(excluding connections establishing)
commit_delay = 30000
commit_siblings = 20
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 1
number of transactions per client: 100
number of transactions actually processed: 100/100
tps = 11.119876(including connections establishing)
tps = 11.177152(excluding connections establishing)
commit_delay = 30000
commit_siblings = 20
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 100
number of transactions actually processed: 1000/1000
tps = 18.987202(including connections establishing)
tps = 19.157841(excluding connections establishing)
commit_delay = 30000
commit_siblings = 20
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 20
number of transactions per client: 100
number of transactions actually processed: 2000/2000
tps = 19.771415(including connections establishing)
tps = 19.957555(excluding connections establishing)
commit_delay = 30000
commit_siblings = 20
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 30
number of transactions per client: 100
number of transactions actually processed: 3000/3000
tps = 20.277710(including connections establishing)
tps = 20.473996(excluding connections establishing)
commit_delay = 30000
commit_siblings = 20
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 40
number of transactions per client: 100
number of transactions actually processed: 4000/4000
tps = 20.736168(including connections establishing)
tps = 20.942539(excluding connections establishing)
commit_delay = 30000
commit_siblings = 20
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 50
number of transactions per client: 100
number of transactions actually processed: 5000/5000
tps = 18.894930(including connections establishing)
tps = 19.064049(excluding connections establishing)
commit_delay = 50000
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 1
number of transactions per client: 100
number of transactions actually processed: 100/100
tps = 11.006743(including connections establishing)
tps = 11.062485(excluding connections establishing)
commit_delay = 50000
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 100
number of transactions actually processed: 1000/1000
tps = 18.240024(including connections establishing)
tps = 18.399169(excluding connections establishing)
commit_delay = 50000
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 20
number of transactions per client: 100
number of transactions actually processed: 2000/2000
tps = 19.817212(including connections establishing)
tps = 20.002657(excluding connections establishing)
commit_delay = 50000
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 30
number of transactions per client: 100
number of transactions actually processed: 3000/3000
tps = 20.260368(including connections establishing)
tps = 20.455821(excluding connections establishing)
commit_delay = 50000
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 40
number of transactions per client: 100
number of transactions actually processed: 4000/4000
tps = 20.928079(including connections establishing)
tps = 21.135532(excluding connections establishing)
commit_delay = 50000
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 50
number of transactions per client: 100
number of transactions actually processed: 5000/5000
tps = 21.216875(including connections establishing)
tps = 21.431381(excluding connections establishing)
commit_delay = 50000
commit_siblings = 5
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 1
number of transactions per client: 100
number of transactions actually processed: 100/100
tps = 11.362410(including connections establishing)
tps = 11.421545(excluding connections establishing)
commit_delay = 50000
commit_siblings = 5
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 100
number of transactions actually processed: 1000/1000
tps = 18.879526(including connections establishing)
tps = 19.047014(excluding connections establishing)
commit_delay = 50000
commit_siblings = 5
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 20
number of transactions per client: 100
number of transactions actually processed: 2000/2000
tps = 20.100514(including connections establishing)
tps = 20.292700(excluding connections establishing)
commit_delay = 50000
commit_siblings = 5
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 30
number of transactions per client: 100
number of transactions actually processed: 3000/3000
tps = 20.108420(including connections establishing)
tps = 20.326053(excluding connections establishing)
commit_delay = 50000
commit_siblings = 5
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 40
number of transactions per client: 100
number of transactions actually processed: 4000/4000
tps = 20.876438(including connections establishing)
tps = 21.083252(excluding connections establishing)
commit_delay = 50000
commit_siblings = 5
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 50
number of transactions per client: 100
number of transactions actually processed: 5000/5000
tps = 20.929535(including connections establishing)
tps = 21.139167(excluding connections establishing)
commit_delay = 50000
commit_siblings = 10
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 1
number of transactions per client: 100
number of transactions actually processed: 100/100
tps = 11.037506(including connections establishing)
tps = 11.094671(excluding connections establishing)
commit_delay = 50000
commit_siblings = 10
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 100
number of transactions actually processed: 1000/1000
tps = 16.197469(including connections establishing)
tps = 16.321687(excluding connections establishing)
commit_delay = 50000
commit_siblings = 10
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 20
number of transactions per client: 100
number of transactions actually processed: 2000/2000
tps = 19.408106(including connections establishing)
tps = 19.586455(excluding connections establishing)
commit_delay = 50000
commit_siblings = 10
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 30
number of transactions per client: 100
number of transactions actually processed: 3000/3000
tps = 20.628612(including connections establishing)
tps = 20.832682(excluding connections establishing)
commit_delay = 50000
commit_siblings = 10
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 40
number of transactions per client: 100
number of transactions actually processed: 4000/4000
tps = 20.687795(including connections establishing)
tps = 20.892172(excluding connections establishing)
commit_delay = 50000
commit_siblings = 10
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 50
number of transactions per client: 100
number of transactions actually processed: 5000/5000
tps = 21.072593(including connections establishing)
tps = 21.285268(excluding connections establishing)
commit_delay = 50000
commit_siblings = 20
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 1
number of transactions per client: 100
number of transactions actually processed: 100/100
tps = 11.114714(including connections establishing)
tps = 11.172162(excluding connections establishing)
commit_delay = 50000
commit_siblings = 20
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 100
number of transactions actually processed: 1000/1000
tps = 19.558748(including connections establishing)
tps = 19.742513(excluding connections establishing)
commit_delay = 50000
commit_siblings = 20
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 20
number of transactions per client: 100
number of transactions actually processed: 2000/2000
tps = 18.631916(including connections establishing)
tps = 18.797678(excluding connections establishing)
commit_delay = 50000
commit_siblings = 20
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 30
number of transactions per client: 100
number of transactions actually processed: 3000/3000
tps = 19.825138(including connections establishing)
tps = 20.012726(excluding connections establishing)
commit_delay = 50000
commit_siblings = 20
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 40
number of transactions per client: 100
number of transactions actually processed: 4000/4000
tps = 20.088452(including connections establishing)
tps = 20.280854(excluding connections establishing)
commit_delay = 50000
commit_siblings = 20
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 50
number of transactions per client: 100
number of transactions actually processed: 5000/5000
tps = 20.297366(including connections establishing)
tps = 20.493717(excluding connections establishing)
commit_delay = 100000
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 1
number of transactions per client: 100
number of transactions actually processed: 100/100
tps = 15.439671(including connections establishing)
tps = 15.549962(excluding connections establishing)
commit_delay = 100000
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 100
number of transactions actually processed: 1000/1000
tps = 19.693075(including connections establishing)
tps = 19.876400(excluding connections establishing)
commit_delay = 100000
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 20
number of transactions per client: 100
number of transactions actually processed: 2000/2000
tps = 18.946142(including connections establishing)
tps = 19.115107(excluding connections establishing)
commit_delay = 100000
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 30
number of transactions per client: 100
number of transactions actually processed: 3000/3000
tps = 18.454647(including connections establishing)
tps = 18.616867(excluding connections establishing)
commit_delay = 100000
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 40
number of transactions per client: 100
number of transactions actually processed: 4000/4000
tps = 20.280877(including connections establishing)
tps = 20.476160(excluding connections establishing)
commit_delay = 100000
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 50
number of transactions per client: 100
number of transactions actually processed: 5000/5000
tps = 20.500824(including connections establishing)
tps = 20.701014(excluding connections establishing)
commit_delay = 100000
commit_siblings = 5
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 1
number of transactions per client: 100
number of transactions actually processed: 100/100
tps = 10.952132(including connections establishing)
tps = 11.006296(excluding connections establishing)
commit_delay = 100000
commit_siblings = 5
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 100
number of transactions actually processed: 1000/1000
tps = 17.366365(including connections establishing)
tps = 17.508544(excluding connections establishing)
commit_delay = 100000
commit_siblings = 5
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 20
number of transactions per client: 100
number of transactions actually processed: 2000/2000
tps = 19.543583(including connections establishing)
tps = 19.725347(excluding connections establishing)
commit_delay = 100000
commit_siblings = 5
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 30
number of transactions per client: 100
number of transactions actually processed: 3000/3000
tps = 20.115157(including connections establishing)
tps = 20.307981(excluding connections establishing)
commit_delay = 100000
commit_siblings = 5
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 40
number of transactions per client: 100
number of transactions actually processed: 4000/4000
tps = 20.223466(including connections establishing)
tps = 20.420063(excluding connections establishing)
commit_delay = 100000
commit_siblings = 5
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 50
number of transactions per client: 100
number of transactions actually processed: 5000/5000
tps = 20.148971(including connections establishing)
tps = 20.341425(excluding connections establishing)
commit_delay = 100000
commit_siblings = 10
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 1
number of transactions per client: 100
number of transactions actually processed: 100/100
tps = 10.751800(including connections establishing)
tps = 10.805719(excluding connections establishing)
commit_delay = 100000
commit_siblings = 10
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 100
number of transactions actually processed: 1000/1000
tps = 17.248793(including connections establishing)
tps = 17.389532(excluding connections establishing)
commit_delay = 100000
commit_siblings = 10
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 20
number of transactions per client: 100
number of transactions actually processed: 2000/2000
tps = 18.971746(including connections establishing)
tps = 19.141706(excluding connections establishing)
commit_delay = 100000
commit_siblings = 10
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 30
number of transactions per client: 100
number of transactions actually processed: 3000/3000
tps = 20.250238(including connections establishing)
tps = 20.445726(excluding connections establishing)
commit_delay = 100000
commit_siblings = 10
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 40
number of transactions per client: 100
number of transactions actually processed: 4000/4000
tps = 18.616027(including connections establishing)
tps = 18.782432(excluding connections establishing)
commit_delay = 100000
commit_siblings = 10
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 50
number of transactions per client: 100
number of transactions actually processed: 5000/5000
tps = 20.101571(including connections establishing)
tps = 20.293550(excluding connections establishing)
commit_delay = 100000
commit_siblings = 20
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 1
number of transactions per client: 100
number of transactions actually processed: 100/100
tps = 10.630630(including connections establishing)
tps = 10.682598(excluding connections establishing)
commit_delay = 100000
commit_siblings = 20
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 100
number of transactions actually processed: 1000/1000
tps = 17.308711(including connections establishing)
tps = 17.450166(excluding connections establishing)
commit_delay = 100000
commit_siblings = 20
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 20
number of transactions per client: 100
number of transactions actually processed: 2000/2000
tps = 18.041733(including connections establishing)
tps = 18.196939(excluding connections establishing)
commit_delay = 100000
commit_siblings = 20
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 30
number of transactions per client: 100
number of transactions actually processed: 3000/3000
tps = 18.610682(including connections establishing)
tps = 18.775963(excluding connections establishing)
commit_delay = 100000
commit_siblings = 20
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 40
number of transactions per client: 100
number of transactions actually processed: 4000/4000
tps = 19.522874(including connections establishing)
tps = 19.705095(excluding connections establishing)
commit_delay = 100000
commit_siblings = 20
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 50
number of transactions per client: 100
number of transactions actually processed: 5000/5000
tps = 20.085380(including connections establishing)
tps = 20.277826(excluding connections establishing)


Re: CommitDelay performance improvement

From
ncm@zembu.com (Nathan Myers)
Date:
On Sun, Feb 25, 2001 at 12:41:28AM -0500, Tom Lane wrote:
> Attached are graphs from more thorough runs of pgbench with a commit
> delay that occurs only when at least N other backends are running active
> transactions. ...
> It's not entirely clear what set of parameters is best, but it is
> absolutely clear that a flat zero-commit-delay policy is NOT best.
> 
> The test conditions are postmaster options -N 100 -B 1024, pgbench scale
> factor 10, pgbench -t (transactions per client) 100.  (Hence the results
> for a single client rely on only 100 transactions, and are pretty noisy.
> The noise level should decrease as the number of clients increases.)

It's hard to interpret these results.  In particular, "delay 10k, sibs 20"
(10k,20), or cyan-triangle, is almost the same as "delay 50k, sibs 1" 
(50k,1), or green X.  Those are pretty different parameters to get such
similar results.

The only really bad performers were (0), (10k,1), (100k,20).  The best
were (30k,1) and (30k,10), although (30k,5) also did well except at 40.
Why would 30k be a magic delay, regardless of siblings?  What happened
at 40?

At low loads, it seems (100k,1) (brown +) did best by far, which seems
very odd.  Even more odd, it did pretty well at very high loads but had 
problems at intermediate loads.  

Nathan Myers
ncm@zembu.com


Re: CommitDelay performance improvement

From
Philip Warner
Date:
At 00:42 25/02/01 -0800, Nathan Myers wrote:
>
>The only really bad performers were (0), (10k,1), (100k,20).  The best
>were (30k,1) and (30k,10), although (30k,5) also did well except at 40.
>Why would 30k be a magic delay, regardless of siblings?  What happened
>at 40?
>

I had assumed that 40 was one of the glitches - it would be good if Tom (or
someone else) could rerun the suite, to see if we see the same dip.

I agree that 30k looks like the magic delay, and probably 30/5 would be a
good conservative choice. But now I think about the choice of number, I
think it must vary with the speed of the machine and length of the
transactions; at 20tps, each TX is completing in around 50ms. Probably the
delay needs to be set at a value related to the average TX duration, and
since that is not really a known figure, perhaps we should go with 30% of
TX duration, with a max of 100k. 

Alternatively, can PG monitor the commits/second, then set the delay to
reflect half of the average TX time (or 100ms, whichever is smaller)? Is
this too baroque?

----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


RE: CommitDelay performance improvement

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Tom Lane
> 
> Attached are graphs from more thorough runs of pgbench with a commit
> delay that occurs only when at least N other backends are running active
> transactions.
> 
> My initial try at this proved to be too noisy to tell much.  The noise
> seems to be coming from WAL checkpoints that occur during a run and
> push down the reported TPS value for the particular case that's running.
> While we'd need to include WAL checkpoints to make an honest performance
> comparison against another RDBMS, I think they are best ignored for the
> purpose of figuring out what the commit-delay behavior ought to be.
> Accordingly, I modified my test script to minimize the occurrence of
> checkpoint activity during runs (see attached script).  There are still
> some data points that are unexpectedly low compared to their neighbors;
> presumably these were affected by checkpoints or other system activity.
> 
> It's not entirely clear what set of parameters is best, but it is
> absolutely clear that a flat zero-commit-delay policy is NOT best.
> 
> The test conditions are postmaster options -N 100 -B 1024, pgbench scale
> factor 10, pgbench -t (transactions per client) 100.  (Hence the results
> for a single client rely on only 100 transactions, and are pretty noisy.
> The noise level should decrease as the number of clients increases.)
> 
> Comments anyone?
>

How about the case with scaling factor 1 ?  i.e Could your
proposal detect lock conflicts in reality ? If so, I agree with
your proposal.

BTW there seems to be a misunderstanding about CommitDelay,
i.e
   CommitDelay is completely a waste of time unless there's   an overlap of commit.

If other backends use the delay(cpu cycle)  the delay is never
a waste of time totally.

Regards,
Hiroshi Inoue


Re: CommitDelay performance improvement

From
Tom Lane
Date:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> How about the case with scaling factor 1 ?  i.e Could your
> proposal detect lock conflicts in reality ?

The code is set up to not count backends that are waiting on locks.
That is, to do a commit delay there must be at least N other backends
that are in transactions, have written at least one XLOG entry in
their transaction (so it's not a read-only xact and will need to
write a commit record), and are not waiting on a lock.

Is that what you meant?

> BTW there seems to be a misunderstanding about CommitDelay,
> i.e
>     CommitDelay is completely a waste of time unless there's
>     an overlap of commit.
> If other backends use the delay(cpu cycle)  the delay is never
> a waste of time totally.

Good point.  In fact, if we measure only the total throughput in
transactions per second then the commit delay will not appear to be
hurting performance no matter how long it is, so long as other backends
are in the RUN state for the whole delay.  This suggests that pgbench
should also measure the average transaction time seen by any one client.
Is that a simple change?
        regards, tom lane


Re: CommitDelay performance improvement

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> At 00:42 25/02/01 -0800, Nathan Myers wrote:
>> The only really bad performers were (0), (10k,1), (100k,20).  The best
>> were (30k,1) and (30k,10), although (30k,5) also did well except at 40.
>> Why would 30k be a magic delay, regardless of siblings?  What happened
>> at 40?

> I had assumed that 40 was one of the glitches - it would be good if Tom (or
> someone else) could rerun the suite, to see if we see the same dip.

Yes, I assumed the same.  I posted the script; could someone else make
the same run?  We really need more than one test case ;-)

> I agree that 30k looks like the magic delay, and probably 30/5 would be a
> good conservative choice. But now I think about the choice of number, I
> think it must vary with the speed of the machine and length of the
> transactions; at 20tps, each TX is completing in around 50ms.

Yes, I think so too.  This machine is able to do about 40 pgbench tr/sec
single-client with fsync off, so the computational load is right about
25msec per transaction.  That's presumably why 30msec looks like a good
delay number.  What interested me was that there doesn't seem to be a
very sharp peak; anything from 10 to 100 msec yields fairly comparable
results.  This is a good thing ... if there *were* a sharp peak at the
average xact length, tuning the delay parameter would be an impossible
task in real-world cases where the transactions aren't all alike.

On the data so far, I'm inclined to go with 10k/5 as the default, so as
not to risk wasting time with overly long delays on machines that are
faster than this one.  But we really need some data from other machines
before deciding.  It'd be nice to see some results with <10k delays too,
from a machine where the kernel supports better-than-10msec delay
resolution.  Where's the Alpha contingent??
        regards, tom lane


Re: CommitDelay performance improvement

From
Tom Lane
Date:
ncm@zembu.com (Nathan Myers) writes:
> At low loads, it seems (100k,1) (brown +) did best by far, which seems
> very odd.  Even more odd, it did pretty well at very high loads but had 
> problems at intermediate loads.  

In theory, all these variants should behave exactly the same for a
single client, since there will be no commit delay in any of 'em in
that case.  I'm inclined to write off the aberrant result for 100k/1
as due to outside factors --- maybe the WAL file happened to be located
in a particularly convenient place on the disk during that run, or
some such.  Since there's only 100 transactions in that test, it wouldn't
take much to affect the result.

Likewise, the places where one mid-load datapoint is well below either
neighbor are probably due to outside factors --- either a background
WAL checkpoint or other activity on the machine, mail arrival for
instance.  I left the machine alone during the test, but I didn't bother
to shut down the usual system services.

My feeling is that this test run tells us that zero commit delay is
inferior to nonzero under these test conditions, but there's too much
noise to pick out one of the nonzero-delay parameter combinations as
being clearly better than the rest.  (BTW, I did repeat the zero-delay
series just to be sure it wasn't itself an outlier...)
        regards, tom lane


Re: CommitDelay performance improvement

From
Hiroshi Inoue
Date:
Tom Lane wrote:
> 
> Philip Warner <pjw@rhyme.com.au> writes:
> > At 00:42 25/02/01 -0800, Nathan Myers wrote:
> >> The only really bad performers were (0), (10k,1), (100k,20).  The best
> >> were (30k,1) and (30k,10), although (30k,5) also did well except at 40.
> >> Why would 30k be a magic delay, regardless of siblings?  What happened
> >> at 40?
> 
> > I had assumed that 40 was one of the glitches - it would be good if Tom (or
> > someone else) could rerun the suite, to see if we see the same dip.
> 
> Yes, I assumed the same.  I posted the script; could someone else make
> the same run?  We really need more than one test case ;-)
> 

I could find the sciript but seem to have missed your change
about commit_siblings. Where could I get it ?

Regards,
Hiroshi Inoue


Re: CommitDelay performance improvement

From
Tom Lane
Date:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
>> Yes, I assumed the same.  I posted the script; could someone else make
>> the same run?  We really need more than one test case ;-)

> I could find the sciript but seem to have missed your change
> about commit_siblings. Where could I get it ?

Er ... duh ... I didn't commit it yet.  Well, it's harmless enough
as long as commit_delay defaults to 0, so I'll go ahead and commit.
        regards, tom lane


Re: CommitDelay performance improvement

From
Tom Lane
Date:
>> I could find the sciript but seem to have missed your change
>> about commit_siblings. Where could I get it ?

> Er ... duh ... I didn't commit it yet.  Well, it's harmless enough
> as long as commit_delay defaults to 0, so I'll go ahead and commit.

In CVS now.

However, it might be well to wait to run tests until we tweak pgbench
to measure the average elapsed time for a transaction.  As you pointed
out earlier today, overall TPS is not the only figure of merit we need
to worry about.
        regards, tom lane


Re: CommitDelay performance improvement

From
Dominique Quatravaux
Date:
> >Basically, I am not sure how much we lose by doing the delay after
> >returning COMMIT, and I know we gain quite a bit by enabling us to group
> >fsync calls.
> 
> If included, this should be an option only, and not the default option. 
 Sure it should never become the default, because the "D" in ACID is just
about forbidding this kind of behaviour...

--  Dominique