Thread: Uh, I change my mind about commit_delay + commit_siblings (sort of)

Uh, I change my mind about commit_delay + commit_siblings (sort of)

From
Peter Geoghegan
Date:
The attached very simple patch moves the commit_delay +
commit_siblings sleep into XLogFlush, where the leader alone sleeps.
This appears to be a much more effective site for a delay.

Benchmark results, with and without a delay of 3000ms (commit_siblings
is 0 in both cases) are available from:

http://leadercommitdelay.staticloud.com

The only way this differed from my usual setup for these benchmarks
was that, as it happened, shared_buffers was set to 256MB, which would
of course have affected wal_buffers actual value, which was 8MB.

I surmise that the reason this works so well is that it increases the
rate of batching at lower client counts, without needlessly delaying
each and every follower beyond when their transaction is likely to
have committed. A more detailed analysis will have to wait for
tomorrow.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachment

Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

From
Heikki Linnakangas
Date:
On 29.05.2012 04:18, Peter Geoghegan wrote:
> The attached very simple patch moves the commit_delay +
> commit_siblings sleep into XLogFlush, where the leader alone sleeps.
> This appears to be a much more effective site for a delay.
>
> Benchmark results, with and without a delay of 3000ms (commit_siblings
> is 0 in both cases) are available from:
>
> http://leadercommitdelay.staticloud.com

Can you elaborate how to read those results? What do "insert delay test" 
and "patch, commit_delay=0" mean? Which graph should I be looking at?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

From
Peter Geoghegan
Date:
On 29 May 2012 07:16, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 29.05.2012 04:18, Peter Geoghegan wrote:
>> Benchmark results, with and without a delay of 3000ms (commit_siblings
>> is 0 in both cases) are available from:
>>
>> http://leadercommitdelay.staticloud.com

Sorry, I do of course mean a commit_delay of 3000 (microseconds, not
milliseconds).

> Can you elaborate how to read those results? What do "insert delay test" and
> "patch, commit_delay=0" mean? Which graph should I be looking at?

It's very simple. It's an insert.sql based workload, for the patch,
with and without commit_delay set to 3000 (and a commit_delay of 0
makes the commit_delay code inert, just as before). Setting
commit_delay appears to be far more effective with the patch.

Here is a new benchmark:

http://leadermastercommitdelay.staticloud.com/

This is more or less my usual group commit benchmark - median of 3
runs of insert.sql across a wide range of client counts. The
pgbench-tools config is attached for your reference.

On this occasion, I set shared_buffers to 1GB once more (and
consequently, wal_buffers was 16MB).

The green line should look familiar to you. With commit_delay = 0,
it's equivalent to commit_delay=0 on master. In other words, it's very
similar to a number of benchmarks that have already been performed. I
believe that the numbers will line up pretty well with my original
group commit benchmark from back in January, for example, as that was
also performed on my Thinkpad.

The effect that Jeff Janes observed on master is apparent here at low
client counts - master with commit_delay = 3000 is almost as effective
at 8 clients as it is for the patch. After that, the effect is almost
entirely eliminated, which is consistent with my earlier observations
about commit_delay post new group commit.

There obviously appears to be a further significant increase in
transaction throughput with this configuration and patch. Even the
average latency of the patch with commit_delay=3000 is a bit better
than either master with group_commit = 3000 or baseline (as I've said
baseline could have equally well been on master or with master + the
patch).

I suppose my main beef with commit_delay before, apart from the simple
fact that it wasn't actually useful to any significant degree, was
that the code didn't usefully interact with the group commit
improvements, and we didn't revisit commit_delay for 9.2 even in light
of its assumptions not necessarily continuing to hold. Perhaps we
should revisit commit_delay for 9.2 now - we never adequately answered
the question of how useful it was in light of new group commit, so
revisiting that question can be justified as closing an open item,
rather than adding new functionality.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachment

Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

From
Robert Haas
Date:
On Mon, May 28, 2012 at 9:18 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> The attached very simple patch moves the commit_delay +
> commit_siblings sleep into XLogFlush, where the leader alone sleeps.
> This appears to be a much more effective site for a delay.

This is a clever idea, but I think it needs some fine-tuning: as
written, this will sleep for any flush, not just a flush of a commit
record.  One idea might be to add a flag to XLogFlush indicating
whether a commit-delay sleep is permissible.  Callers other than
RecordTransactionCommit() could pass false; RecordTransactionCommit()
could pass true.

The comments need some updating, too.

Like Heikki, I find the test results that you posted pretty hard to
understand.  This is sort of a general beef I have with pgbench-tools:
there are no explanations anywhere about what the graphs actually
mean. The first three graphs seem fairly useless, and the third one
doesn't even appear to contain any data.  The fourth and fifth graphs
seem like the most useful part of the output, but the lack of an
explanation of what's being graphed really hinders understanding.
Apparently, the fourth graph is TPS vs. scale factor (at an
unspecified number of clients) and the fifth graph is TPS vs. clients
(at an unspecified scale factor).  Maybe we're just averaging across
the unspecified parameter in each case, but if so that doesn't seem
very useful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

From
Peter Geoghegan
Date:
On 29 May 2012 17:10, Robert Haas <robertmhaas@gmail.com> wrote:
> This is a clever idea,

Thanks.

> but I think it needs some fine-tuning: as
> written, this will sleep for any flush, not just a flush of a commit
> record.  One idea might be to add a flag to XLogFlush indicating
> whether a commit-delay sleep is permissible.  Callers other than
> RecordTransactionCommit() could pass false; RecordTransactionCommit()
> could pass true.
>
> The comments need some updating, too.

Uh, yeah, I posted that at 2am, which was about an hour after I
initially had the idea. Attached patch revises the comments and
documentation in a way that I think is appropriate, though it is still
essentially the same patch. I did consider the fact that the delay
might occur from one of a number of XLogFlush() callsites that did not
previously delay. However, even if I do what you suggest, it is just
dumb luck as to whether or not that makes any difference, since those
other sites may well be exactly as delayed, since they're still going
to have to wait behind the WALWriteLock, which the leader now holds
while sleeping anyway. I imagined that the fact that those callsites
were excluded before had something to do with ameliorating historic
commit_delay problems, but those problems have already been
significantly reduced.

Why do you think that doing this for all XLogFlush() callsites might
be problematic?

> Like Heikki, I find the test results that you posted pretty hard to
> understand.  This is sort of a general beef I have with pgbench-tools:
> there are no explanations anywhere about what the graphs actually
> mean. The first three graphs seem fairly useless, and the third one
> doesn't even appear to contain any data.  The fourth and fifth graphs
> seem like the most useful part of the output, but the lack of an
> explanation of what's being graphed really hinders understanding.
> Apparently, the fourth graph is TPS vs. scale factor (at an
> unspecified number of clients) and the fifth graph is TPS vs. clients
> (at an unspecified scale factor).  Maybe we're just averaging across
> the unspecified parameter in each case, but if so that doesn't seem
> very useful.

I'm sure that Greg will be happy to consider a github pull request to
improve the tool in these areas. I myself am used to interpreting
pgbench-tools output, but I suppose it is a little confusing in
various ways. As you probably realise, I was directing your attention
towards clients-set.png . I am in the habit of hacking pgbench-tools
to output much bigger gnu-plot images.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachment

Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

From
Robert Haas
Date:
On Tue, May 29, 2012 at 12:47 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> Why do you think that doing this for all XLogFlush() callsites might
> be problematic?

Well, consider the one in the background writer, for example.  That's
just a periodic flush, so I see no benefit in having it acquire the
lock and then wait some more.  It already did wait.  And what about
the case where we're flushing while holding WALInsertLock because the
buffer's full?  Clearly waiting is useless in that case - nobody can
join the group commit for exactly the same reason that we're doing the
flush in the first place: no buffer space.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

From
Peter Geoghegan
Date:
On 29 May 2012 17:58, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, May 29, 2012 at 12:47 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
>> Why do you think that doing this for all XLogFlush() callsites might
>> be problematic?
>
> Well, consider the one in the background writer, for example.  That's
> just a periodic flush, so I see no benefit in having it acquire the
> lock and then wait some more.  It already did wait.

No benefit for the BGWriter, no, but a benefit for the cluster as a
whole, which can see not only an improvement in transaction
throughput, but also of average and worst-case latency. The whole idea
of a group commit feature is that one backend altruistically takes
other backends with it. Besides, for the background writer or
checkpointer, that operate in terms of minutes, seconds and
milliseconds, having their transaction block an extra 2 milliseconds
is hardly a real problem. Both the BGWriter and checkpointer do an
amount of work that is adaptive per cycle anyway.

> And what about the case where we're flushing while holding WALInsertLock because the
> buffer's full?  Clearly waiting is useless in that case - nobody can
> join the group commit for exactly the same reason that we're doing the
> flush in the first place: no buffer space.

Maybe you could have the leader check that condition, and not wait
accordingly. If the buffer is full, chances are very high that there
already is a leader, either sleeping or following through with an
XLogWrite(), whose work is well underway.

The general solution that you've proposed - adding an actually_wait
bool parameter to XLogFlush() - won't work, unless, for example, the
"buffer is full" call happens to become the leader, which is generally
quite unlikely. Otherwise it's stuck waiting behind WALWriteLock along
with everyone else, on average for a duration of half (CommitDelay +
avg flushtime), while the leader sleeps and then works. If something
is only effective a small minority of the time, why bother? I doubt
that you can reasonably have the BGWriter or whatever cut in line
ahead of an ever-growing queue of backends calling XLogFlush(), if
that's what you're thinking - now all those backends have to wait an
additional period of up to however long the flush takes.

In any case, the standard for changing the exact behaviour of
commit_delay/commit_siblings ought to be quite low, because those
settings are already so heavily laden with problems that it is
somewhat doubtful that they've ever been used by someone in production
sensibly. Obviously having a delay that turns out to be in some way
suboptimal is obviously something I'd hope to avoid entirely, but I've
done a good job of considerably improving that situation. If we get
this into 9.2, perhaps a more adaptive implementation can follow in
9.3.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

From
Simon Riggs
Date:
On 29 May 2012 17:58, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, May 29, 2012 at 12:47 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
>> Why do you think that doing this for all XLogFlush() callsites might
>> be problematic?
>
> Well, consider the one in the background writer, for example.  That's
> just a periodic flush, so I see no benefit in having it acquire the
> lock and then wait some more.  It already did wait.  And what about
> the case where we're flushing while holding WALInsertLock because the
> buffer's full?  Clearly waiting is useless in that case - nobody can
> join the group commit for exactly the same reason that we're doing the
> flush in the first place: no buffer space.

When I read this the first time, I was in full agreement.

On closer inspection neither point is valid, though both points were
worth considering.

> Well, consider the one in the background writer, for example.  That's
> just a periodic flush, so I see no benefit in having it acquire the
> lock and then wait some more.  It already did wait.

We use XLogBackgroundFlush() not XLogFlush() from background processes.


> And what about
> the case where we're flushing while holding WALInsertLock because the
> buffer's full?  Clearly waiting is useless in that case - nobody can
> join the group commit for exactly the same reason that we're doing the
> flush in the first place: no buffer space.

We don't flush WAL in that case, we just write it.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

From
Robert Haas
Date:
On Wed, May 30, 2012 at 4:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> When I read this the first time, I was in full agreement.
>
> On closer inspection neither point is valid, though both points were
> worth considering.
>
>> Well, consider the one in the background writer, for example.  That's
>> just a periodic flush, so I see no benefit in having it acquire the
>> lock and then wait some more.  It already did wait.
>
> We use XLogBackgroundFlush() not XLogFlush() from background processes.
>
>> And what about
>> the case where we're flushing while holding WALInsertLock because the
>> buffer's full?  Clearly waiting is useless in that case - nobody can
>> join the group commit for exactly the same reason that we're doing the
>> flush in the first place: no buffer space.
>
> We don't flush WAL in that case, we just write it.

OK, but there are a lot of places where we call XLogFlush(), and it's
far from obvious that it's a win to do this in all of those cases.  At
least, nobody's done that analysis.  XLogFlush is called from:

- WriteTruncateXlogRec(), which is called from TruncateCLOG()
- SlruPhysicalWritePage()
- EndPrepare()
- RecordTransactionCommitPrepared()
- RecordTransactionCommit()
- xact_redo_commit_internal()
- CreateCheckPoint()
- RelationTruncate()
- FlushBuffer()
- write_relmap_file()

Most of those actually do look like reasonable places to try to get
grouped flushing behavior, but:

1. It seems wrong to do it in xact_redo_commit_internal().  It won't
matter if commit_siblings>0 since there won't be any other backends
with transaction IDs anyway, but if commit_siblings==0 then we'll
sleep for no possible benefit.

2. Doing it in FlushBuffer() seems slightly iffy since we might be
sitting on a buffer lock.  But maybe it's a win anyway, or just not
worth worrying about.

Another thing to think about is that if we do this across the board
rather than just for commits, then commit_delay and commit_siblings
will really be totally misnamed - they really ought to be flush_delay
and flush_siblings at that point.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

From
Peter Geoghegan
Date:
On 30 May 2012 13:24, Robert Haas <robertmhaas@gmail.com> wrote:
> Most of those actually do look like reasonable places to try to get
> grouped flushing behavior, but:
>
> 1. It seems wrong to do it in xact_redo_commit_internal().  It won't
> matter if commit_siblings>0 since there won't be any other backends
> with transaction IDs anyway, but if commit_siblings==0 then we'll
> sleep for no possible benefit.
>
> 2. Doing it in FlushBuffer() seems slightly iffy since we might be
> sitting on a buffer lock.  But maybe it's a win anyway, or just not
> worth worrying about.

Typical values of commit_delay are usually a fraction of spinning rust
latency, so I don't think that FlushBuffer() has any business really
caring.

These problems seem rather minor compared to the existing problems
with the settings. As I've already outlined, I doubt it's possible to
really remove the delays here without an over-engineered solution. In
short, I suspect it isn't worth it. We must trust DBAs to set
commit_siblings appropriately if they've set commit_delay.

> Another thing to think about is that if we do this across the board
> rather than just for commits, then commit_delay and commit_siblings
> will really be totally misnamed - they really ought to be flush_delay
> and flush_siblings at that point.

Seems reasonable. It would also have the advantage of avoiding having
the new implementation tarred with the same brush as commit_delay.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

From
Simon Riggs
Date:
On 30 May 2012 13:24, Robert Haas <robertmhaas@gmail.com> wrote:

> OK, but there are a lot of places where we call XLogFlush(), and it's
> far from obvious that it's a win to do this in all of those cases.  At
> least, nobody's done that analysis.  XLogFlush is called from:
>
> - WriteTruncateXlogRec(), which is called from TruncateCLOG()
> - SlruPhysicalWritePage()
> - EndPrepare()
> - RecordTransactionCommitPrepared()
> - RecordTransactionCommit()
> - xact_redo_commit_internal()
> - CreateCheckPoint()
> - RelationTruncate()
> - FlushBuffer()
> - write_relmap_file()
>
> Most of those actually do look like reasonable places to try to get
> grouped flushing behavior, but:
>
> 1. It seems wrong to do it in xact_redo_commit_internal().  It won't
> matter if commit_siblings>0 since there won't be any other backends
> with transaction IDs anyway, but if commit_siblings==0 then we'll
> sleep for no possible benefit.

Agreed

> 2. Doing it in FlushBuffer() seems slightly iffy since we might be
> sitting on a buffer lock.  But maybe it's a win anyway, or just not
> worth worrying about.

Agreed.

The remaining cases aren't worth worrying about, apart from
SlruPhysicalWritePage() which happens during visibility checks and
needs to happen as quickly as possible also.

I would say the additional contention from waiting outweighs the
benefit of the wait in those 3 places, so skipping the wait is wise.

Should be implemented as

#define XLogFlush(lsn) XLogFlushInternal(lsn, true)
#define XLogFlushNoWait(lsn) XLogFlushInternal(lsn, false)

so we can drop the no wait version in without touching the other call points

> Another thing to think about is that if we do this across the board
> rather than just for commits, then commit_delay and commit_siblings
> will really be totally misnamed - they really ought to be flush_delay
> and flush_siblings at that point.

I think if we were to rename the parameters, then they should be called

group_commit_delay
group_commit_siblings

Since "Group Commit" is the accepted term for this behaviour, even
though, as you point out that the behaviour isn't just about commit.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

From
Peter Geoghegan
Date:
On 30 May 2012 15:25, Simon Riggs <simon@2ndquadrant.com> wrote:
>> 1. It seems wrong to do it in xact_redo_commit_internal().  It won't
>> matter if commit_siblings>0 since there won't be any other backends
>> with transaction IDs anyway, but if commit_siblings==0 then we'll
>> sleep for no possible benefit.
>
> Agreed
>
>> 2. Doing it in FlushBuffer() seems slightly iffy since we might be
>> sitting on a buffer lock.  But maybe it's a win anyway, or just not
>> worth worrying about.
>
> Agreed.

As I've pointed out, we cannot meaningfully skip the wait for
auxiliary processes alone (except perhaps by having commit_siblings
set sufficiently high).

> The remaining cases aren't worth worrying about, apart from
> SlruPhysicalWritePage() which happens during visibility checks and
> needs to happen as quickly as possible also.

I'm not so sure. It says in that function:
    /*     * We must determine the largest async-commit LSN for the page. This     * is a bit tedious, but since this
entirefunction is a slow path     * anyway, it seems better to do this here than to maintain a per-page     * LSN
variable(which'd need an extra comparison in the     * transaction-commit path).     */ 

> I would say the additional contention from waiting outweighs the
> benefit of the wait in those 3 places, so skipping the wait is wise.

MinimumActiveBackends() reports the "count backends (other than
myself) that are in active transactions", so unnecessary calls will
have to occur when we have active transactions >= CommitSiblings, not
connections >= CommitSiblings as was previously the case.

What if we were to skip the wait during recovery only, by specially
setting CommitDelay to 0 in the start-up process? Would that satisfy
everyone's concerns about unhelpful delays? I'm not sure how this
might interact with hot standby.

The only concrete way I can see of avoiding a possibly useless wait
(setting commit_siblings prudently alone is a very big help) would be
semantics that are not generally acceptable for XLogFlush(): a way of
calling XLogFlush() such that we either become the leader and don't
wait OR join the queue and return immediately under the assumption
that the leader will successfully flush up to our LSN. It is true that
part of the premise of having a leader backend was that "XLogFlush is
pretty much all critical section, so if it fails we're going to PANIC
anyway".

That said, at least some of the problematic XLogFlush() call sites
named require that we verify that we've flushed up to their LSN before
control returns to them. FlushBuffer() does for sure.

> I think if we were to rename the parameters, then they should be called
>
> group_commit_delay
> group_commit_siblings
>
> Since "Group Commit" is the accepted term for this behaviour, even
> though, as you point out that the behaviour isn't just about commit.

+1. I understand that that is the precise definition of group commit
in Oracle, for example, where the feature is explicitly framed as a
trade-off between throughput and latency (any of the other things that
we might have called group commit at various times are not). The fact
that group commit implies that non-commit flushes will also be batched
seems rather academic, on reflection.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

From
Simon Riggs
Date:
On 30 May 2012 17:19, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> On 30 May 2012 15:25, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> 1. It seems wrong to do it in xact_redo_commit_internal().  It won't
>>> matter if commit_siblings>0 since there won't be any other backends
>>> with transaction IDs anyway, but if commit_siblings==0 then we'll
>>> sleep for no possible benefit.
>>
>> Agreed

I've looked at this more closely now and I can see that the call to
XLogFlush() that is made from xact_redo_commit_internal() doesn't ever
actually flush WAL, so whether we delay or not is completely
irrelevant.

So un-agreed. No change required to patch there.


>>> 2. Doing it in FlushBuffer() seems slightly iffy since we might be
>>> sitting on a buffer lock.  But maybe it's a win anyway, or just not
>>> worth worrying about.
>>
>> Agreed.
>
> As I've pointed out, we cannot meaningfully skip the wait for
> auxiliary processes alone (except perhaps by having commit_siblings
> set sufficiently high).
>
>> The remaining cases aren't worth worrying about, apart from
>> SlruPhysicalWritePage() which happens during visibility checks and
>> needs to happen as quickly as possible also.
>
> I'm not so sure. It says in that function:
>
>                /*
>                 * We must determine the largest async-commit LSN for the page. This
>                 * is a bit tedious, but since this entire function is a slow path
>                 * anyway, it seems better to do this here than to maintain a per-page
>                 * LSN variable (which'd need an extra comparison in the
>                 * transaction-commit path).
>                 */
>
>> I would say the additional contention from waiting outweighs the
>> benefit of the wait in those 3 places, so skipping the wait is wise.
>
> MinimumActiveBackends() reports the "count backends (other than
> myself) that are in active transactions", so unnecessary calls will
> have to occur when we have active transactions >= CommitSiblings, not
> connections >= CommitSiblings as was previously the case.
>
> What if we were to skip the wait during recovery only, by specially
> setting CommitDelay to 0 in the start-up process? Would that satisfy
> everyone's concerns about unhelpful delays? I'm not sure how this
> might interact with hot standby.

Hmm, that was a good idea, but as of my comments above, that isn't
required or useful.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

From
Peter Geoghegan
Date:
On 31 May 2012 11:19, Simon Riggs <simon@2ndquadrant.com> wrote:
> I've looked at this more closely now and I can see that the call to
> XLogFlush() that is made from xact_redo_commit_internal() doesn't ever
> actually flush WAL, so whether we delay or not is completely
> irrelevant.
>
> So un-agreed. No change required to patch there.

So, does that clear up the question of it being acceptable to add a
delay to every existing XLogFlush() call site? I think so.

Aside from the outstanding question of what to rename
commit_delay/commit_siblings to, and how we might want to reframe
those settings in the docs, I think that's everything.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

From
Robert Haas
Date:
On Thu, May 31, 2012 at 6:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> I've looked at this more closely now and I can see that the call to
> XLogFlush() that is made from xact_redo_commit_internal() doesn't ever
> actually flush WAL, so whether we delay or not is completely
> irrelevant.
>
> So un-agreed. No change required to patch there.

I think Peter's suggestion of forcibly setting the delay to 0 in the
startup process is a good one, though.  It's one line of code, and if
it isn't strictly necessary today, it still seems like good
future-proofing.

I am not very happy about the idea of renaming commit_* to
group_commit_*.  It's basically a cosmetic renaming, and breaking
existing configuration files for cosmetic purposes does not seem
warranted to me, especially when the old and new names are so close.
I certainly don't think we can do that in 9.2, now that beta1 has
already shipped.  Modifying the default contents of postgresql.conf
after we've shipped beta has been a historical no-no for reasons that
escape me at the moment, but IIRC they're not stupid reasons.

Frankly, I think this whole thing should be pushed to 9.3.  The
commit_delay and commit_siblings knobs suck, but they've sucked for a
long time, and it won't kill anybody to wait another release cycle to
fix them.  We have plenty of more important things queued up for 9.3
already, and I don't believe there's any compelling reason to think
that this particular thing needs preferential treatment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

From
Simon Riggs
Date:
On 31 May 2012 13:16, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, May 31, 2012 at 6:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> I've looked at this more closely now and I can see that the call to
>> XLogFlush() that is made from xact_redo_commit_internal() doesn't ever
>> actually flush WAL, so whether we delay or not is completely
>> irrelevant.
>>
>> So un-agreed. No change required to patch there.
>
> I think Peter's suggestion of forcibly setting the delay to 0 in the
> startup process is a good one, though.  It's one line of code, and if
> it isn't strictly necessary today, it still seems like good
> future-proofing.

Adding a line that does nothing is not a good idea. The Startup
process flushes very, very few WAL messages, so the setting is
irrelevant.

> I am not very happy about the idea of renaming commit_* to
> group_commit_*.  It's basically a cosmetic renaming, and breaking
> existing configuration files for cosmetic purposes does not seem
> warranted to me, especially when the old and new names are so close.
> I certainly don't think we can do that in 9.2, now that beta1 has
> already shipped.  Modifying the default contents of postgresql.conf
> after we've shipped beta has been a historical no-no for reasons that
> escape me at the moment, but IIRC they're not stupid reasons.
>
> Frankly, I think this whole thing should be pushed to 9.3.  The
> commit_delay and commit_siblings knobs suck, but they've sucked for a
> long time, and it won't kill anybody to wait another release cycle to
> fix them.  We have plenty of more important things queued up for 9.3
> already, and I don't believe there's any compelling reason to think
> that this particular thing needs preferential treatment.

No problem with pushing a variable rename through to 9.3. To be
honest, I don't care whether we rename them or not.

What matters is that we have a patch that provides a massive
performance gain in write performance in just a few lines of code, and
that should be committed to 9.2.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

From
Peter Geoghegan
Date:
On 31 May 2012 13:16, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, May 31, 2012 at 6:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Frankly, I think this whole thing should be pushed to 9.3.  The
> commit_delay and commit_siblings knobs suck, but they've sucked for a
> long time, and it won't kill anybody to wait another release cycle to
> fix them.  We have plenty of more important things queued up for 9.3
> already, and I don't believe there's any compelling reason to think
> that this particular thing needs preferential treatment.

Why do you think that? Those knobs are now quite ineffective, though
we never even considered that when the group commit delay patch was
committed.  The entire body of research and commentary that exists on
commit_delay has been invalidated for 9.2. If that isn't something
that needs to be addressed before release, I don't know what is. The
fact that the patch can sometimes double transaction throughput for an
absolutely trivial change, moving 2 lines of code, is also a good
reason to not bump this for another year.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

From
Robert Haas
Date:
On Thu, May 31, 2012 at 8:38 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> On 31 May 2012 13:16, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, May 31, 2012 at 6:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Frankly, I think this whole thing should be pushed to 9.3.  The
>> commit_delay and commit_siblings knobs suck, but they've sucked for a
>> long time, and it won't kill anybody to wait another release cycle to
>> fix them.  We have plenty of more important things queued up for 9.3
>> already, and I don't believe there's any compelling reason to think
>> that this particular thing needs preferential treatment.
>
> Why do you think that? Those knobs are now quite ineffective, though
> we never even considered that when the group commit delay patch was
> committed.  The entire body of research and commentary that exists on
> commit_delay has been invalidated for 9.2. If that isn't something
> that needs to be addressed before release, I don't know what is. The
> fact that the patch can sometimes double transaction throughput for an
> absolutely trivial change, moving 2 lines of code, is also a good
> reason to not bump this for another year.

Fixing regressions before release is essential; improving performance
is not - especially when the improvement relates to a little-used
feature that you were proposing to get rid of two weeks ago.  It can't
simultaneously be so unimportant that we should remove it altogether
and so important that it's must-fix-before-release, and if one test
can completely overturn your view of which category this falls into,
that seems like a reason for taking some more time to think it over
and, perhaps, run more tests.  We don't have a lot of latitude to
maneuver at this point - anything we do now is going to go straight
out into the wild.  Caution is appropriate.

However, rather than arguing about it, let's see if anyone else has an opinion.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Simon Riggs <simon@2ndQuadrant.com> writes:
> On 31 May 2012 13:16, Robert Haas <robertmhaas@gmail.com> wrote:
>> Frankly, I think this whole thing should be pushed to 9.3.

> What matters is that we have a patch that provides a massive
> performance gain in write performance in just a few lines of code, and
> that should be committed to 9.2.

I agree with Robert on this.  This patch hasn't had *nearly* enough
testing to justify cramming it into 9.2 at this point.  AFAIK the
claim of "massive performance gain" is based on a single test case run
by a single person, which doesn't even give me any confidence that it
doesn't break anything, much less that it's a win across the board.

If we want to finish the beta cycle in a reasonable time period and get
back to actual development, we have to refrain from adding more
possibly-destabilizing development work to 9.2.  And that is what
this is.

Add it to the upcoming CF, please.
        regards, tom lane


Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

From
Simon Riggs
Date:
On 31 May 2012 15:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On 31 May 2012 13:16, Robert Haas <robertmhaas@gmail.com> wrote:
>>> Frankly, I think this whole thing should be pushed to 9.3.
>
>> What matters is that we have a patch that provides a massive
>> performance gain in write performance in just a few lines of code, and
>> that should be committed to 9.2.
>
> I agree with Robert on this.  This patch hasn't had *nearly* enough
> testing to justify cramming it into 9.2 at this point.  AFAIK the
> claim of "massive performance gain" is based on a single test case run
> by a single person, which doesn't even give me any confidence that it
> doesn't break anything, much less that it's a win across the board.

I agree with you. You would be mistaken if you thought that I think
Peter's laptop was sufficient proof for anyone to commit something and
I've already said exactly that to him.

My description of "massive performance gain" is appropriate based on
the measurements so far.

> If we want to finish the beta cycle in a reasonable time period and get
> back to actual development, we have to refrain from adding more
> possibly-destabilizing development work to 9.2.  And that is what
> this is.

In what way is it possibly destabilising? I see nothing in the patch
to merit that claim, so presumably you haven't read the patch yet?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Simon Riggs <simon@2ndQuadrant.com> writes:
> On 31 May 2012 15:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> If we want to finish the beta cycle in a reasonable time period and get
>> back to actual development, we have to refrain from adding more
>> possibly-destabilizing development work to 9.2. �And that is what
>> this is.

> In what way is it possibly destabilising?

I'm prepared to believe that it only affects performance, but it could
be destabilizing to that.  It needs proper review and testing, and the
next CF is the right environment for that to happen.
        regards, tom lane


Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

From
Peter Geoghegan
Date:
On 31 May 2012 14:58, Robert Haas <robertmhaas@gmail.com> wrote:
> Fixing regressions before release is essential; improving performance
> is not - especially when the improvement relates to a little-used
> feature that you were proposing to get rid of two weeks ago.

Yes, the fact that I wanted to get rid of commit_delay is well
established - I called for its deprecation in a dedicated thread, and
during my talk at pgCon. Bruce's confusion as to how that interacted
with what I've been calling "new group commit" was actually what
crystallised my position here: it is trying, for the most part, to do
the same thing as new group commit, but in an entirely orthogonal way.
Bruce's confusion actually reflected the confusion of the code. So I'm
in a sense removing the overlap between commit_delay used to do but
now but shouldn't try to do anymore (make commits coincide, giving
good benchmark results) and what new group commit now does, while
preserving commit_delay's ability to trade off latency for throughput.

I didn't have an answer to the question of how we might continue to
offer a throughput/latency trade-off to users before, but knew that
with 9.2, commit_delay was totally ineffective anyway. The realisation
that it could be made effective by working with rather than against
new group commit changed my mind.

> It can't simultaneously be so unimportant that we should remove it altogether
> and so important that it's must-fix-before-release, and if one test
> can completely overturn your view of which category this falls into,
> that seems like a reason for taking some more time to think it over
> and, perhaps, run more tests.  We don't have a lot of latitude to
> maneuver at this point - anything we do now is going to go straight
> out into the wild.  Caution is appropriate.

The patch can be justified as a way of removing the tension between
new group commit and commit_delay. Removing commit_delay would also do
this, but then there'd be no way to make the aforementioned trade-off
that we previously offered. I suspect that if it restored the peaks
and valleys of commit_delay's changes to throughput in 9.1, over and
above a new group commit baseline, this would be more readily
accepted. I hope the patch isn't being punished for being effective.
Yes, it does offer a large boost to performance, but that happens to
be incidental, unlikely though that sounds.

You've called this a clever idea. I actually don't agree. I was fairly
surprised that no one noticed this earlier. It is rather obviously the
case that a delay that hopes to maximise the batching of commits at
the expense of latency should occur only in a single leader backend
that will proceed with the flush for the batch, and not within each
and every backend as it commits.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

From
Peter Geoghegan
Date:
On 31 May 2012 16:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> In what way is it possibly destabilising?
>
> I'm prepared to believe that it only affects performance, but it could
> be destabilizing to that.  It needs proper review and testing, and the
> next CF is the right environment for that to happen.

It couldn't possibly be as destabilising to performance as
commit_delay was in 9.1.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

From
Peter Geoghegan
Date:
On 31 May 2012 16:26, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> On 31 May 2012 16:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Simon Riggs <simon@2ndQuadrant.com> writes:
>>> In what way is it possibly destabilising?
>>
>> I'm prepared to believe that it only affects performance, but it could
>> be destabilizing to that.  It needs proper review and testing, and the
>> next CF is the right environment for that to happen.
>
> It couldn't possibly be as destabilising to performance as
> commit_delay was in 9.1.

Furthermore, it couldn't possibly affect performance in any way unless
commit_delay is set. I've just moved the delay site so that its only
executed by the group commit leader. The leader would execute the code
anyway, but now the followers don't.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

From
Peter Geoghegan
Date:
This is a benchmark I performed on the same hardware, for tpc-b.sql,
with a commit_delay of 0 and 4000 for the patch:

http://tpcbdelay.staticloud.com/

There is a rather large improvement in throughput here. Robert
previously complained that our group commit implementation didn't do
very well on that benchmark. This seems to make all the difference.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services