Thread: Group commit and commit delay/siblings

Group commit and commit delay/siblings

From
Rob Wultsch
Date:
Manual: http://www.postgresql.org/docs/9.0/static/runtime-config-wal.html#RUNTIME-CONFIG-WAL-SETTINGS
Recent discussion:
http://www.facebook.com/notes/mysql-at-facebook/group-commit-in-postgresql/465781235932

It is my understanding that group commit in PG works without the
commit_delay or commit_siblings being enabled. For many people coming
from other databases, the existence of these GUC seems to suggest that
group commit does not work without the being enabled.

Are these setting useful, and if so how should they be tuned?
If they are generally are not useful, should these settings be removed?

--
Rob Wultsch
wultsch@gmail.com

Re: Group commit and commit delay/siblings

From
Jignesh Shah
Date:
On Mon, Dec 6, 2010 at 4:40 AM, Rob Wultsch <wultsch@gmail.com> wrote:
> Manual: http://www.postgresql.org/docs/9.0/static/runtime-config-wal.html#RUNTIME-CONFIG-WAL-SETTINGS
> Recent discussion:
> http://www.facebook.com/notes/mysql-at-facebook/group-commit-in-postgresql/465781235932
>
> It is my understanding that group commit in PG works without the
> commit_delay or commit_siblings being enabled. For many people coming
> from other databases, the existence of these GUC seems to suggest that
> group commit does not work without the being enabled.
>
> Are these setting useful, and if so how should they be tuned?
> If they are generally are not useful, should these settings be removed?
>
> --
> Rob Wultsch
> wultsch@gmail.com
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Hey Rob,

I think I can explain this with the bit that I understand.

When you try to commit a transaction it will sync the WAL buffer to
disk. Here an optimization is done that before it syncs the disks it
tries to find all the WAL Records in the buffer completed and ready to
sync and it absorbs all the syncs together and does the commit for all
of them together.

What the GUC parameter commit_delay adds that before it syncs it
sleeps for that period and then does the sync. (NOTE: The transaction
is not committed till it syncs so this adds that latency to all
transactions) The benefit is that when number of transactions are
high, while its sleeping someone who committs will also sync its
records and when it awakes it doesnt have to do its own sync or if it
does it helps others.

The commit_siblings = 5 basically checks that it sleeps only when that
many backends are active. This I think is a very expensive check and I
would rather make commit_siblings=0 (which the current code does not
support.. it only supports minimum of 1) The check is expensive
irrespective of the settings .. But anyway here is the real kicker.
In all the tests I did with recent verions 8.4 and version 9.0 , it
seems that the default behavior handles the load well enough and one
does not have to use commit_delay at all. Since when the load is very
high all of them are basically in sync phase and the desired thing
happens anyway.

Infact using commit_delay will actually add the cost of doing
commit_siblings check and can hurt the performance by increasing CPU
consumption.. Doing commit_siblings check for every transaction is a
killer since it does not return after meeting the minimum backends and
goes through every backend to calculate the total number before
comparing with the minimum. This is probably why most people see a
drop in performance when using commit_delay compared to the default.

Anyway  I would recommended right now to stick with the default and
not really use it. It does the sync absorbtion well if you have two
many users (though not perfect).

Regards,
Jignesh

Re: Group commit and commit delay/siblings

From
Rob Wultsch
Date:
On Sun, Dec 5, 2010 at 7:30 PM, Jignesh Shah <jkshah@gmail.com> wrote:
> The commit_siblings = 5 basically checks that it sleeps only when that
> many backends are active. This I think is a very expensive check and I
> would rather make commit_siblings=0 (which the current code does not
> support.. it only supports minimum of 1) The check is expensive
> irrespective of the settings .. But anyway here is the real kicker.
> In all the tests I did with recent verions 8.4 and version 9.0 , it
> seems that the default behavior handles the load well enough and one
> does not have to use commit_delay at all. Since when the load is very
> high all of them are basically in sync phase and the desired thing
> happens anyway.
>
> Infact using commit_delay will actually add the cost of doing
> commit_siblings check and can hurt the performance by increasing CPU
> consumption.. Doing commit_siblings check for every transaction is a
> killer since it does not return after meeting the minimum backends and
> goes through every backend to calculate the total number before
> comparing with the minimum. This is probably why most people see a
> drop in performance when using commit_delay compared to the default.
>
> Anyway  I would recommended right now to stick with the default and
> not really use it. It does the sync absorbtion well if you have two
> many users (though not perfect).

Sounds like this setting should go away unless there is a very good
reason to keep it.


--
Rob Wultsch
wultsch@gmail.com

Re: Group commit and commit delay/siblings

From
Jignesh Shah
Date:
On Mon, Dec 6, 2010 at 10:47 AM, Rob Wultsch <wultsch@gmail.com> wrote:
> On Sun, Dec 5, 2010 at 7:30 PM, Jignesh Shah <jkshah@gmail.com> wrote:
>> The commit_siblings = 5 basically checks that it sleeps only when that
>> many backends are active. This I think is a very expensive check and I
>> would rather make commit_siblings=0 (which the current code does not
>> support.. it only supports minimum of 1) The check is expensive
>> irrespective of the settings .. But anyway here is the real kicker.
>> In all the tests I did with recent verions 8.4 and version 9.0 , it
>> seems that the default behavior handles the load well enough and one
>> does not have to use commit_delay at all. Since when the load is very
>> high all of them are basically in sync phase and the desired thing
>> happens anyway.
>>
>> Infact using commit_delay will actually add the cost of doing
>> commit_siblings check and can hurt the performance by increasing CPU
>> consumption.. Doing commit_siblings check for every transaction is a
>> killer since it does not return after meeting the minimum backends and
>> goes through every backend to calculate the total number before
>> comparing with the minimum. This is probably why most people see a
>> drop in performance when using commit_delay compared to the default.
>>
>> Anyway  I would recommended right now to stick with the default and
>> not really use it. It does the sync absorbtion well if you have two
>> many users (though not perfect).
>
> Sounds like this setting should go away unless there is a very good
> reason to keep it.
>
>
> --
> Rob Wultsch
> wultsch@gmail.com
>

I would say commit_siblings should go away but maybe keep commit_delay
for a while. The advantage of keeping commit_delay is to do a rhythmic
write patterns which can be used to control writes on WAL. It is
debatable but I had used it couple of times to control WAL writes.

To me commit_siblings is expensive during heavy users/load  and should
be killed.

My 2 cents.
Regards,
Jignesh

Re: Group commit and commit delay/siblings

From
Greg Smith
Date:
Jignesh Shah wrote:
> The commit_siblings = 5 basically checks that it sleeps only when that
> many backends are active. This I think is a very expensive check and I
> would rather make commit_siblings=0 (which the current code does not
> support.. it only supports minimum of 1)

I just posted a message to the Facebook group sorting out the confusion
in terminology there.

The code Jignesh is alluding to does this:

        if (CommitDelay > 0 && enableFsync &&
            CountActiveBackends() >= CommitSiblings)
            pg_usleep(CommitDelay);

And the expensive part of the overhead beyond the delay itself is
CountActiveBackends(), which iterates over the entire procArray
structure.  Note that it doesn't bother acquiring ProcArrayLock for
that, as some small inaccuracy isn't really a problem for what it's
using the number for.  And it ignores backends waiting on a lock too, as
unlikely to commit in the near future.

The siblings count is the only thing that keeps this delay from kicking
in on every single commit when the feature is turned on, which it is by
default.  I fear that a reworking in the direction Jignesh is suggesting
here, where that check was removed, would cripple situations where only
a single process was trying to get commits accomplished.

As for why this somewhat weird feature hasn't been removed yet, it's
mainly because we have some benchmarks from Jignesh proving its value in
the hands of an expert.  If you have a system with a really
high-transaction rate, where you can expect that the server is
constantly busy and commits are being cached (and subsequently written
to physical disk asyncronously), a brief pause after each commit helps
chunk commits into the write cache as more efficient blocks.  It seems a
little counter-intuititive, but it does seem to work.

The number of people who are actually in that position are very few
though, so for the most part this parameter is just a magnet for people
to set incorrectly because they don't understand it.  With this
additional insight from Jignesh clearing up some of the questions I had
about this, I'm tempted to pull commit_siblings altogether, make
commit_delay default to 0, and update the docs to say something
suggesting "this will slow down every commit you make; only increase it
if you have a high commit rate system where that's necessary to get
better commit chunking".

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Re: Group commit and commit delay/siblings

From
Jignesh Shah
Date:
On Mon, Dec 6, 2010 at 12:35 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> Jignesh Shah wrote:
>>
>> The commit_siblings = 5 basically checks that it sleeps only when that
>> many backends are active. This I think is a very expensive check and I
>> would rather make commit_siblings=0 (which the current code does not
>> support.. it only supports minimum of 1)
>
> I just posted a message to the Facebook group sorting out the confusion in
> terminology there.
>
> The code Jignesh is alluding to does this:
>
>       if (CommitDelay > 0 && enableFsync &&
>           CountActiveBackends() >= CommitSiblings)
>           pg_usleep(CommitDelay);
>
> And the expensive part of the overhead beyond the delay itself is
> CountActiveBackends(), which iterates over the entire procArray structure.
>  Note that it doesn't bother acquiring ProcArrayLock for that, as some small
> inaccuracy isn't really a problem for what it's using the number for.  And
> it ignores backends waiting on a lock too, as unlikely to commit in the near
> future.
>
> The siblings count is the only thing that keeps this delay from kicking in
> on every single commit when the feature is turned on, which it is by
> default.  I fear that a reworking in the direction Jignesh is suggesting
> here, where that check was removed, would cripple situations where only a
> single process was trying to get commits accomplished.
> As for why this somewhat weird feature hasn't been removed yet, it's mainly
> because we have some benchmarks from Jignesh proving its value in the hands
> of an expert.  If you have a system with a really high-transaction rate,
> where you can expect that the server is constantly busy and commits are
> being cached (and subsequently written to physical disk asyncronously), a
> brief pause after each commit helps chunk commits into the write cache as
> more efficient blocks.  It seems a little counter-intuititive, but it does
> seem to work.
>
> The number of people who are actually in that position are very few though,
> so for the most part this parameter is just a magnet for people to set
> incorrectly because they don't understand it.  With this additional insight
> from Jignesh clearing up some of the questions I had about this, I'm tempted
> to pull commit_siblings altogether, make commit_delay default to 0, and
> update the docs to say something suggesting "this will slow down every
> commit you make; only increase it if you have a high commit rate system
> where that's necessary to get better commit chunking".
>
> --
> Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
> PostgreSQL Training, Services and Support        www.2ndQuadrant.us
> "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
>
>

Yes I agree with the plan here.. Take out commit_siblings, default
commit_delay to zero (which  is already the case)  and add a warning
in the docs that it will slow down your commits (on an individual
basis).

The only reason I still want to keep commit_delay is to make it act as
a controller or drummer if you will.. ( if you have read the book:
"Theory of contstraints" by Dr Eli Goldratt)

Thanks.
Regards,
Jignesh

Re: Group commit and commit delay/siblings

From
Tom Lane
Date:
Greg Smith <greg@2ndquadrant.com> writes:
> And the expensive part of the overhead beyond the delay itself is
> CountActiveBackends(), which iterates over the entire procArray
> structure.

I could have sworn we'd refactored that to something like
    bool ThereAreAtLeastNActiveBackends(int n)
which could drop out of the loop as soon as it'd established what we
really need to know.  In general it's unclear that this'd really save
much, since in a large fraction of executions the answer would be
"no", and then you can't drop out of the loop early, or at least not
very early.  But it clearly wins when n == 0 since then you can just
return true on sight.

> As for why this somewhat weird feature hasn't been removed yet, it's
> mainly because we have some benchmarks from Jignesh proving its value in
> the hands of an expert.

Removal has been proposed several times, but as long as it's off by
default, it's fairly harmless to leave it there.  I rather expect
it'll stay as it is until someone proposes something that actually works
better.  In particular I see no advantage in simply deleting some of the
parameters to the existing code.  I'd suggest that we just improve the
coding so that we don't scan ProcArray at all when commit_siblings is 0.

(I do agree with improving the docs to warn people away from assuming
this is a knob to frob mindlessly.)

            regards, tom lane

Re: Group commit and commit delay/siblings

From
Jignesh Shah
Date:
On Tue, Dec 7, 2010 at 1:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Greg Smith <greg@2ndquadrant.com> writes:
>> And the expensive part of the overhead beyond the delay itself is
>> CountActiveBackends(), which iterates over the entire procArray
>> structure.
>
> I could have sworn we'd refactored that to something like
>        bool ThereAreAtLeastNActiveBackends(int n)
> which could drop out of the loop as soon as it'd established what we
> really need to know.  In general it's unclear that this'd really save
> much, since in a large fraction of executions the answer would be
> "no", and then you can't drop out of the loop early, or at least not
> very early.  But it clearly wins when n == 0 since then you can just
> return true on sight.
>
>> As for why this somewhat weird feature hasn't been removed yet, it's
>> mainly because we have some benchmarks from Jignesh proving its value in
>> the hands of an expert.
>
> Removal has been proposed several times, but as long as it's off by
> default, it's fairly harmless to leave it there.  I rather expect
> it'll stay as it is until someone proposes something that actually works
> better.  In particular I see no advantage in simply deleting some of the
> parameters to the existing code.  I'd suggest that we just improve the
> coding so that we don't scan ProcArray at all when commit_siblings is 0.
>
> (I do agree with improving the docs to warn people away from assuming
> this is a knob to frob mindlessly.)
>
>                        regards, tom lane
>

In that case I propose that we support commit_siblings=0 which is not
currently supported. Minimal value for commit_siblings  is currently
1. If we support commit_siblings=0 then it should short-circuit that
function call which is often what I do in my tests with commit_delay.

Thanks.
Regards,
Jignesh

Re: Group commit and commit delay/siblings

From
Greg Smith
Date:
Jignesh Shah wrote:
> On Tue, Dec 7, 2010 at 1:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> I could have sworn we'd refactored that to something like
>>        bool ThereAreAtLeastNActiveBackends(int n)
>> which could drop out of the loop as soon as it'd established what we
>> really need to know...I'd suggest that we just improve the
>> coding so that we don't scan ProcArray at all when commit_siblings is 0.
>>
>> (I do agree with improving the docs to warn people away from assuming
>> this is a knob to frob mindlessly.)
>>
> In that case I propose that we support commit_siblings=0 which is not
> currently supported. Minimal value for commit_siblings  is currently
> 1. If we support commit_siblings=0 then it should short-circuit that
> function call which is often what I do in my tests with commit_delay.
>

Everybody should be happy now:  attached patch refactors the code to
exit as soon as the siblings count is exceeded, short-circuits with no
scanning of ProcArray if the minimum is 0, and allows setting the
siblings to 0 to enable that shortcut:

postgres# select name,setting,min_val,max_val from pg_settings where
name='commit_siblings';
      name       | setting | min_val | max_val
-----------------+---------+---------+---------
 commit_siblings | 5       | 0       | 1000

It also makes it clear in the docs that a) group commit happens even
without this setting being touched, and b) it's unlikely to actually
help anyone.  Those are the two parts that seem to confuse people
whenever this comes up.  Thanks to Rob and the rest of the Facebook
commentators for helping highlight exactly what was wrong with the way
those were written.  (It almost makes up for the slight distaste I get
from seeing "Greg likes MySQL at Facebook" on my Wall after joining in
that discussion)

I can't rebuild the docs on the system I wrote this on at the moment; I
hope I didn't break anything with my edits but didn't test that yet.

I'll add this into the next CommitFest so we don't forget about it, but
of course Jignesh is welcome to try this out at his convience to see if
I've kept the behavior he wants while improving its downside.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us


diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 1ca51ef..f1d3ca2 100644
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*************** SET ENABLE_SEQSCAN TO OFF;
*** 1683,1699 ****
        </indexterm>
        <listitem>
         <para>
!         Time delay between writing a commit record to the WAL buffer
!         and flushing the buffer out to disk, in microseconds. A
!         nonzero delay can allow multiple transactions to be committed
!         with only one <function>fsync()</function> system call, if
          system load is high enough that additional transactions become
          ready to commit within the given interval. But the delay is
          just wasted if no other transactions become ready to
          commit. Therefore, the delay is only performed if at least
          <varname>commit_siblings</varname> other transactions are
          active at the instant that a server process has written its
!         commit record. The default is zero (no delay).
         </para>
        </listitem>
       </varlistentry>
--- 1683,1706 ----
        </indexterm>
        <listitem>
         <para>
!         When the commit data for a transaction is flushed to disk, any
!         additional commits ready at that time are also flushed out.
!         <varname>commit_delay</varname> adds a time delay, set in
!         microseconds, before writing some commit records to the WAL
!         buffer and flushing the buffer out to disks. A nonzero delay
!         can allow more transactions to be committed with only one call
!         to the active <varname>wal_sync_method</varname>, if
          system load is high enough that additional transactions become
          ready to commit within the given interval. But the delay is
          just wasted if no other transactions become ready to
          commit. Therefore, the delay is only performed if at least
          <varname>commit_siblings</varname> other transactions are
          active at the instant that a server process has written its
!         commit record. The default is zero (no delay).  Since
!         all pending commit data flushes are written at every flush
!         regardless of this setting, it is rare that adding delay to
!         that by increasing this parameter will actually improve commit
!         performance.
         </para>
        </listitem>
       </varlistentry>
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index d2e2e11..79c9c0d 100644
*** a/src/backend/access/transam/xact.c
--- b/src/backend/access/transam/xact.c
*************** RecordTransactionCommit(void)
*** 1052,1058 ****
           * fewer than CommitSiblings other backends with active transactions.
           */
          if (CommitDelay > 0 && enableFsync &&
!             CountActiveBackends() >= CommitSiblings)
              pg_usleep(CommitDelay);

          XLogFlush(XactLastRecEnd);
--- 1052,1058 ----
           * fewer than CommitSiblings other backends with active transactions.
           */
          if (CommitDelay > 0 && enableFsync &&
!             MinimumActiveBackends(CommitSiblings))
              pg_usleep(CommitDelay);

          XLogFlush(XactLastRecEnd);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 6e7a6db..a4114b4 100644
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
*************** CancelVirtualTransaction(VirtualTransact
*** 1937,1956 ****
  }

  /*
!  * CountActiveBackends --- count backends (other than myself) that are in
!  *        active transactions.  This is used as a heuristic to decide if
!  *        a pre-XLOG-flush delay is worthwhile during commit.
   *
   * Do not count backends that are blocked waiting for locks, since they are
   * not going to get to run until someone else commits.
   */
! int
! CountActiveBackends(void)
  {
      ProcArrayStruct *arrayP = procArray;
      int            count = 0;
      int            index;

      /*
       * Note: for speed, we don't acquire ProcArrayLock.  This is a little bit
       * bogus, but since we are only testing fields for zero or nonzero, it
--- 1937,1961 ----
  }

  /*
!  * MinimumActiveBackends --- count backends (other than myself) that are
!  *        in active transactions.  Return true if the count exceeds the
!  *        minimum threshold passed.  This is used as a heuristic to decide if
!  *        a pre-XLOG-flush delay is worthwhile during commit.
   *
   * Do not count backends that are blocked waiting for locks, since they are
   * not going to get to run until someone else commits.
   */
! bool
! MinimumActiveBackends(int min)
  {
      ProcArrayStruct *arrayP = procArray;
      int            count = 0;
      int            index;

+     /* Quick short-circuit if no minimum is specified */
+     if (min == 0)
+         return true;
+
      /*
       * Note: for speed, we don't acquire ProcArrayLock.  This is a little bit
       * bogus, but since we are only testing fields for zero or nonzero, it
*************** CountActiveBackends(void)
*** 1983,1991 ****
          if (proc->waitLock != NULL)
              continue;            /* do not count if blocked on a lock */
          count++;
      }

!     return count;
  }

  /*
--- 1988,1998 ----
          if (proc->waitLock != NULL)
              continue;            /* do not count if blocked on a lock */
          count++;
+         if (count >= min)
+             break;
      }

!     return count >= min;
  }

  /*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index dd19fe9..942acb9 100644
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
*************** static struct config_int ConfigureNamesI
*** 1816,1822 ****
              NULL
          },
          &CommitSiblings,
!         5, 1, 1000, NULL, NULL
      },

      {
--- 1816,1822 ----
              NULL
          },
          &CommitSiblings,
!         5, 0, 1000, NULL, NULL
      },

      {
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 959033e..c182dcd 100644
*** a/src/include/storage/procarray.h
--- b/src/include/storage/procarray.h
*************** extern VirtualTransactionId *GetCurrentV
*** 61,67 ****
  extern VirtualTransactionId *GetConflictingVirtualXIDs(TransactionId limitXmin, Oid dbOid);
  extern pid_t CancelVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode);

! extern int    CountActiveBackends(void);
  extern int    CountDBBackends(Oid databaseid);
  extern void CancelDBBackends(Oid databaseid, ProcSignalReason sigmode, bool conflictPending);
  extern int    CountUserBackends(Oid roleid);
--- 61,67 ----
  extern VirtualTransactionId *GetConflictingVirtualXIDs(TransactionId limitXmin, Oid dbOid);
  extern pid_t CancelVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode);

! extern bool    MinimumActiveBackends(int min);
  extern int    CountDBBackends(Oid databaseid);
  extern void CancelDBBackends(Oid databaseid, ProcSignalReason sigmode, bool conflictPending);
  extern int    CountUserBackends(Oid roleid);

Re: Group commit and commit delay/siblings

From
Simon Riggs
Date:
On Mon, 2010-12-06 at 23:52 -0500, Greg Smith wrote:
> Jignesh Shah wrote:
> > On Tue, Dec 7, 2010 at 1:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> >> I could have sworn we'd refactored that to something like
> >>        bool ThereAreAtLeastNActiveBackends(int n)
> >> which could drop out of the loop as soon as it'd established what we
> >> really need to know...I'd suggest that we just improve the
> >> coding so that we don't scan ProcArray at all when commit_siblings is 0.
> >>
> >> (I do agree with improving the docs to warn people away from assuming
> >> this is a knob to frob mindlessly.)
> >>
> > In that case I propose that we support commit_siblings=0 which is not
> > currently supported. Minimal value for commit_siblings  is currently
> > 1. If we support commit_siblings=0 then it should short-circuit that
> > function call which is often what I do in my tests with commit_delay.
> >
>
> Everybody should be happy now:  attached patch refactors the code to
> exit as soon as the siblings count is exceeded, short-circuits with no
> scanning of ProcArray if the minimum is 0, and allows setting the
> siblings to 0 to enable that shortcut:

Minor patch, no downsides. Docs checked. Committed.

--
 Simon Riggs           http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services