Thread: auto-sizing wal_buffers

auto-sizing wal_buffers

From
Robert Haas
Date:
On Thu, Jan 6, 2011 at 11:37 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> If it defaulted to 3% of shared_buffers, min 64K & max 16MB for the auto
> setting, it would for the most part become an autotuned parameter.  That
> would make it 0.75 to 1MB at the standard anemic Linux default kernel
> parameters.  Maybe more than some would like, but dropping shared_buffers
> from 24MB to 23MB to keep this from being ridiculously undersized is
> probably a win.  That percentage would reach 16MB by the time shared_buffers
> was increased to 533MB, which also seems about right to me.  On a really bad
> setup (brief pause to flip off Apple) with only 4MB to work with total,
> you'd end up with wal_buffers between 64 and 128K, so very close to the
> status quo.
>
> Code that up, and we could probably even remove the parameter as a tunable
> altogether.  Very few would see a downside relative to any sensible
> configuration under the current situation, and many people would notice
> better automagic performance with one less parameter to tweak.  Given the
> recent investigations about the serious downsides of tiny wal_buffers values
> on new Linux kernels when using open_datasync, a touch more aggression about
> this setting seems particularly appropriate to consider now.  That's been
> swapped out as the default, but it's still possible people will switch to
> it.

Would anyone like to argue vigorously for or against the above proposal?

I'll start: I think this is a good idea.  I don't have a strong
opinion on whether the exact details of Greg proposes above are
precisely optimal, but I think they're in the right ballpark.
Furthermore, we already have other things that are tuned in somewhat
similar ways (e.g. the size of the fsync request queue defaults to the
number of shared buffers) so there's precedent for it.  It's one less
parameter that you have to set to make things just work.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: auto-sizing wal_buffers

From
Magnus Hagander
Date:
On Thu, Jan 13, 2011 at 23:19, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Jan 6, 2011 at 11:37 PM, Greg Smith <greg@2ndquadrant.com> wrote:
>> If it defaulted to 3% of shared_buffers, min 64K & max 16MB for the auto
>> setting, it would for the most part become an autotuned parameter.  That
>> would make it 0.75 to 1MB at the standard anemic Linux default kernel
>> parameters.  Maybe more than some would like, but dropping shared_buffers
>> from 24MB to 23MB to keep this from being ridiculously undersized is
>> probably a win.  That percentage would reach 16MB by the time shared_buffers
>> was increased to 533MB, which also seems about right to me.  On a really bad
>> setup (brief pause to flip off Apple) with only 4MB to work with total,
>> you'd end up with wal_buffers between 64 and 128K, so very close to the
>> status quo.
>>
>> Code that up, and we could probably even remove the parameter as a tunable
>> altogether.  Very few would see a downside relative to any sensible
>> configuration under the current situation, and many people would notice
>> better automagic performance with one less parameter to tweak.  Given the
>> recent investigations about the serious downsides of tiny wal_buffers values
>> on new Linux kernels when using open_datasync, a touch more aggression about
>> this setting seems particularly appropriate to consider now.  That's been
>> swapped out as the default, but it's still possible people will switch to
>> it.
>
> Would anyone like to argue vigorously for or against the above proposal?
>
> I'll start: I think this is a good idea.  I don't have a strong
> opinion on whether the exact details of Greg proposes above are
> precisely optimal, but I think they're in the right ballpark.
> Furthermore, we already have other things that are tuned in somewhat
> similar ways (e.g. the size of the fsync request queue defaults to the
> number of shared buffers) so there's precedent for it.  It's one less
> parameter that you have to set to make things just work.

+1, I like the idea. Would it still be there to override if necessary?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: auto-sizing wal_buffers

From
Robert Haas
Date:
On Thu, Jan 13, 2011 at 5:29 PM, Magnus Hagander <magnus@hagander.net> wrote:
> +1, I like the idea. Would it still be there to override if necessary?

Depends what people want to do.  We could make the default "0kB", and
define that to mean "auto-tune", or we could remove the parameter
altogether.  I think I was envisioning the latter, but if people are
hesitant to do that we could do the former instead.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: auto-sizing wal_buffers

From
Josh Berkus
Date:
> Depends what people want to do.  We could make the default "0kB", and
> define that to mean "auto-tune", or we could remove the parameter
> altogether.  I think I was envisioning the latter, but if people are
> hesitant to do that we could do the former instead.

Unfortunately, we might still need a manual parameter for override
because of the interaction between wal_buffers and
synchronous_commit=off, since it sets the max size of the unflushed data
buffer.  Discuss?

And the "auto" setting should be -1, not 0kB.  We use -1 for "use
default" for several other GUCs.

Other than that, I think Greg's numbers are fine, and strongly support
having one less thing to tune.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: auto-sizing wal_buffers

From
Robert Haas
Date:
On Thu, Jan 13, 2011 at 6:02 PM, Josh Berkus <josh@agliodbs.com> wrote:
>
>> Depends what people want to do.  We could make the default "0kB", and
>> define that to mean "auto-tune", or we could remove the parameter
>> altogether.  I think I was envisioning the latter, but if people are
>> hesitant to do that we could do the former instead.
>
> Unfortunately, we might still need a manual parameter for override
> because of the interaction between wal_buffers and
> synchronous_commit=off, since it sets the max size of the unflushed data
> buffer.  Discuss?

Do we have any evidence there's actually a problem in that case, or
that a larger value of wal_buffers solves it?  I mean, the background
writer is going to start a background flush as quickly as it can...

> And the "auto" setting should be -1, not 0kB.  We use -1 for "use
> default" for several other GUCs.

No can do.  Gotta have things in the same units.

> Other than that, I think Greg's numbers are fine, and strongly support
> having one less thing to tune.

OK.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: auto-sizing wal_buffers

From
Josh Berkus
Date:
Robert,

>> Unfortunately, we might still need a manual parameter for override
>> because of the interaction between wal_buffers and
>> synchronous_commit=off, since it sets the max size of the unflushed data
>> buffer.  Discuss?
> 
> Do we have any evidence there's actually a problem in that case, or
> that a larger value of wal_buffers solves it?  I mean, the background
> writer is going to start a background flush as quickly as it can...

I don't think anyone has done any testing.  However, the setting is
there and some users might be convinced that they need it.

>> And the "auto" setting should be -1, not 0kB.  We use -1 for "use
>> default" for several other GUCs.
> 
> No can do.  Gotta have things in the same units.

That's certainly not true with, for example, log_temp_files.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: auto-sizing wal_buffers

From
"Kevin Grittner"
Date:
Robert Haas <robertmhaas@gmail.com> wrote:
> Would anyone like to argue vigorously for or against the above
> proposal?
Greg's numbers look reasonable to me, and there's nobody I'd trust
more to come up with reasonable numbers for this.  One less tunable
is a good thing, especially since this designed to scale from
someone slapping it on his laptop for a first quick try, all the way
up to industrial strength production environments.  I guess a manual
override doesn't bother me too much, but I am a bit dubious of its
value, and there is value in keeping the GUC count down....
-Kevin


Re: auto-sizing wal_buffers

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Jan 13, 2011 at 5:29 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> +1, I like the idea. Would it still be there to override if necessary?

> Depends what people want to do.  We could make the default "0kB", and
> define that to mean "auto-tune", or we could remove the parameter
> altogether.  I think I was envisioning the latter, but if people are
> hesitant to do that we could do the former instead.

I think we need to keep the override capability until the autotune
algorithm has proven itself in the field for a couple of years.

I agree with Josh that a negative value should be used to select the
autotune method.
        regards, tom lane


Re: auto-sizing wal_buffers

From
Greg Smith
Date:
Tom Lane wrote:
> I think we need to keep the override capability until the autotune
> algorithm has proven itself in the field for a couple of years.
>
> I agree with Josh that a negative value should be used to select the
> autotune method.
>

Agreed on both fronts.  Attached patch does the magic.  Also available
in branch "walbuffers" from git://github.com/greg2ndQuadrant/postgres.git

By changing only shared_buffers I get the following quite reasonable
automatic behavior:

$ psql -c "SELECT name,unit,boot_val,setting,current_setting(name) FROM
pg_settings WHERE name IN ('wal_buffers','shared_buffers')"
      name      | unit | boot_val | setting | current_setting
----------------+------+----------+---------+-----------------
 shared_buffers | 8kB  | 1024     | 3072    | 24MB
 wal_buffers    | 8kB  | -1       | 96      | 768kB

 shared_buffers | 8kB  | 1024     | 4096    | 32MB
 wal_buffers    | 8kB  | -1       | 128     | 1MB

 shared_buffers | 8kB  | 1024     | 16384   | 128MB
 wal_buffers    | 8kB  | -1       | 512     | 4MB

 shared_buffers | 8kB  | 1024     | 131072  | 1GB
 wal_buffers    | 8kB  | -1       | 2048    | 16MB

 shared_buffers | 8kB  | 1024     | 262144  | 2GB
 wal_buffers    | 8kB  | -1       | 2048    | 16MB

If you've set it to the auto-tuning behavior, you don't see that setting
of -1 in the SHOW output; you see the value it's actually been set to.
The only way to know that was set automatically is to look at boot_val
as I've shown here.  I consider this what admins would prefer, as the
easy way to expose the value that was used.  I would understand if
people considered it a little odd though.  Since you can't change it
without a postgresql.conf edit and a server start anyway, and it's
tersely documented in the sample postgresql.conf what -1 does, I don't
see this being a problem for anyone in the field.

To try and clear up some of the confusion around how the earlier
documentation suggests larger values of this aren't needed, I added the
following updated description of how this has been observed to work for
admins in practice:

!         Since the data is written out to disk at every transaction commit,
!         the setting many only need to be be large enough to hold the
amount
!         of WAL data generated by one typical transaction.  Larger values,
!         typically at least a few megabytes, can improve write performance
!         on a busy server where many clients are committing at once.
!         Extremely large settings are unlikely to provide additional
benefit.

And to make this easy as possible to apply if I got this right, here's
some proposed commit text:

Automatically set wal_buffers to be proportional
to the size of shared_buffers.  Make it 1/32
as large when the auto-tuned behavior, which
is the default and set with a value of -1,
is used.  The previous default of 64kB is still
enforced as a minimum value.  The maximum
automatic value is limited to 16MB.

(Note that this not exactly what I put in my own commit message if you
grab from my repo, that had a typo)

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8e2a2c5..c3f5632 100644
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*************** SET ENABLE_SEQSCAN TO OFF;
*** 1638,1649 ****
        </indexterm>
        <listitem>
         <para>
!         The amount of memory used in shared memory for WAL data.  The
!         default is 64 kilobytes (<literal>64kB</>).  The setting need only
!         be large enough to hold the amount of WAL data generated by one
!         typical transaction, since the data is written out to disk at
!         every transaction commit.  This parameter can only be set at server
!         start.
         </para>

         <para>
--- 1638,1659 ----
        </indexterm>
        <listitem>
         <para>
!         The amount of shared memory used for storing WAL data.  The
!         default setting of -1 adjusts this automatically based on the size
!         of <varname>shared_buffers</varname>, making it 1/32 (about 3%) of
!         the size of that normally larger shared memory block.  Automatically
!         set values are limited to a maximum of 16 megabytes
!         (<literal>16MB</>), sufficient to hold one WAL segment worth of data.
!         The smallest allowable setting is 64 kilobytes (<literal>64kB</>).
!        </para>
!
!        <para>
!         Since the data is written out to disk at every transaction commit,
!         the setting many only need to be be large enough to hold the amount
!         of WAL data generated by one typical transaction.  Larger values,
!         typically at least a few megabytes, can improve write performance
!         on a busy server where many clients are committing at once.
!         Extremely large settings are unlikely to provide additional benefit.
         </para>

         <para>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b49b933..060e627 100644
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 68,74 ****
  /* User-settable parameters */
  int            CheckPointSegments = 3;
  int            wal_keep_segments = 0;
! int            XLOGbuffers = 8;
  int            XLogArchiveTimeout = 0;
  bool        XLogArchiveMode = false;
  char       *XLogArchiveCommand = NULL;
--- 68,75 ----
  /* User-settable parameters */
  int            CheckPointSegments = 3;
  int            wal_keep_segments = 0;
! int            XLOGbuffers = -1;
! int            XLOGbuffersMin = 8;
  int            XLogArchiveTimeout = 0;
  bool        XLogArchiveMode = false;
  char       *XLogArchiveCommand = NULL;
*************** GetSystemIdentifier(void)
*** 4779,4789 ****
--- 4780,4812 ----
  /*
   * Initialization of shared memory for XLOG
   */
+
+ void XLOGTuneNumBuffers(void)
+ {
+     /*
+      * If automatic setting was requested, use about 3% as much memory as
+      * requested for the buffer cache.  Clamp the automatic maximum to the
+      * size of one 16MB XLOG segment, while still allowing a larger manual
+      * setting.
+      */
+     if (XLOGbuffers == -1)
+         {
+         XLOGbuffers = NBuffers / 32;
+         if (XLOGbuffers > 2048)
+             XLOGbuffers = 2048;
+         }
+     /* Enforce a 64KB minimum */
+     if (XLOGbuffers < XLOGbuffersMin)
+         XLOGbuffers = XLOGbuffersMin;
+ }
+
  Size
  XLOGShmemSize(void)
  {
      Size        size;

+     XLOGTuneNumBuffers();
+
      /* XLogCtl */
      size = sizeof(XLogCtlData);
      /* xlblocks array */
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 942acb9..2421460 100644
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
*************** static struct config_int ConfigureNamesI
*** 1766,1772 ****
              GUC_UNIT_XBLOCKS
          },
          &XLOGbuffers,
!         8, 4, INT_MAX, NULL, NULL
      },

      {
--- 1766,1772 ----
              GUC_UNIT_XBLOCKS
          },
          &XLOGbuffers,
!         -1, -1, INT_MAX, NULL, NULL
      },

      {
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index f436b83..6c6f9a9 100644
*** a/src/backend/utils/misc/postgresql.conf.sample
--- b/src/backend/utils/misc/postgresql.conf.sample
***************
*** 162,168 ****
                      #   fsync_writethrough
                      #   open_sync
  #full_page_writes = on            # recover from partial page writes
! #wal_buffers = 64kB            # min 32kB
                      # (change requires restart)
  #wal_writer_delay = 200ms        # 1-10000 milliseconds

--- 162,168 ----
                      #   fsync_writethrough
                      #   open_sync
  #full_page_writes = on            # recover from partial page writes
! #wal_buffers = -1            # min 32kB, -1 sets based on shared_buffers
                      # (change requires restart)
  #wal_writer_delay = 200ms        # 1-10000 milliseconds

diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index e9d8d15..ed7a32a 100644
*** a/src/include/access/xlog.h
--- b/src/include/access/xlog.h
*************** extern void GetXLogReceiptTime(Timestamp
*** 293,298 ****
--- 293,299 ----

  extern void UpdateControlFile(void);
  extern uint64 GetSystemIdentifier(void);
+ extern void XLOGTuneNumBuffers(void);
  extern Size XLOGShmemSize(void);
  extern void XLOGShmemInit(void);
  extern void BootStrapXLOG(void);

Re: auto-sizing wal_buffers

From
Greg Smith
Date:
Kevin Grittner wrote:
> I guess a manual override doesn't bother me too much, but I am a bit dubious of its
> value, and there is value in keeping the GUC count down...

It's a risk-reward thing really.  The reward for removing it is that a 
few lines of code and a small section of the documentation go away.  
It's not very big.  The risk seems low, but it's not zero.  Let's say 
this goes in, we get to 9.2 or later, and a survey suggests that no one 
has needed to ever set wal_buffers when deploying 9.1.  At that point I 
think everyone would feel much better considering to nuke it 
altogether.  I just looked at the code again when developing the patch, 
and there's really not enough benefit to removing it to worry about 
taking any risk right now.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books



Re: auto-sizing wal_buffers

From
Fujii Masao
Date:
On Sat, Jan 15, 2011 at 3:51 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> Agreed on both fronts.  Attached patch does the magic.  Also available in
> branch "walbuffers" from git://github.com/greg2ndQuadrant/postgres.git

+int            XLOGbuffersMin = 8;

XLOGbuffersMin is a fixed value. I think that defining it as a macro
rather than a variable seems better.

+        if (XLOGbuffers > 2048)
+            XLOGbuffers = 2048;

Using "XLOG_SEG_SIZE/XLOG_BLCKSZ" rather than 2048 seems
better.

+#wal_buffers = -1            # min 32kB, -1 sets based on shared_buffers

Typo: s/32kB/64kB

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: auto-sizing wal_buffers

From
Greg Smith
Date:
Fujii Masao wrote:
> +int            XLOGbuffersMin = 8;
>
> XLOGbuffersMin is a fixed value. I think that defining it as a macro
> rather than a variable seems better.
>
> +        if (XLOGbuffers > 2048)
> +            XLOGbuffers = 2048;
>
> Using "XLOG_SEG_SIZE/XLOG_BLCKSZ" rather than 2048 seems
> better.
>
> +#wal_buffers = -1            # min 32kB, -1 sets based on shared_buffers
>
> Typo: s/32kB/64kB
>

Thanks, I've fixed all these issues and attached a new full patch,
pushed to github, etc.  Tests give same results back, and it's nice that
it scale to reasonable behavior if someone changes their XLOG segment size.

It should be possible to set the value back to the older minimum value
of 32kB too.  That's doesn't actually seem to work though; when I try it
I get:

$ psql -c "SELECT name,unit,boot_val,setting,current_setting(name) FROM
pg_settings WHERE name IN ('wal_buffers','shared_buffers')"
      name      | unit | boot_val | setting | current_setting
----------------+------+----------+---------+-----------------
 shared_buffers | 8kB  | 1024     | 131072  | 1GB
 wal_buffers    | 8kB  | -1       | 8       | 64kB

Where I was expecting that setting to be "4" instead for 32kB.  So
there's probably some minor bug left in where I inserted this into the
initialization sequence.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8e2a2c5..1c13f20 100644
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*************** SET ENABLE_SEQSCAN TO OFF;
*** 1638,1649 ****
        </indexterm>
        <listitem>
         <para>
!         The amount of memory used in shared memory for WAL data.  The
!         default is 64 kilobytes (<literal>64kB</>).  The setting need only
!         be large enough to hold the amount of WAL data generated by one
!         typical transaction, since the data is written out to disk at
!         every transaction commit.  This parameter can only be set at server
!         start.
         </para>

         <para>
--- 1638,1659 ----
        </indexterm>
        <listitem>
         <para>
!         The amount of shared memory used for storing WAL data.  The
!         default setting of -1 adjusts this automatically based on the size
!         of <varname>shared_buffers</varname>, making it 1/32 (about 3%) of
!         the size of that normally larger shared memory block.  Automatically
!         set values are limited to a maximum sufficient to hold one WAL
!         segment worth of data, normally 16 megabytes (<literal>16MB</>).
!         The smallest allowable setting is 32 kilobytes (<literal>32kB</>).
!        </para>
!
!        <para>
!         Since the data is written out to disk at every transaction commit,
!         the setting many only need to be be large enough to hold the amount
!         of WAL data generated by one typical transaction.  Larger values,
!         typically at least a few megabytes, can improve write performance
!         on a busy server where many clients are committing at once.
!         Extremely large settings are unlikely to provide additional benefit.
         </para>

         <para>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5b6a230..d057773 100644
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 69,75 ****
  /* User-settable parameters */
  int            CheckPointSegments = 3;
  int            wal_keep_segments = 0;
! int            XLOGbuffers = 8;
  int            XLogArchiveTimeout = 0;
  bool        XLogArchiveMode = false;
  char       *XLogArchiveCommand = NULL;
--- 69,76 ----
  /* User-settable parameters */
  int            CheckPointSegments = 3;
  int            wal_keep_segments = 0;
! int            XLOGbuffers = -1;
! int            XLOGbuffersMin = 8;
  int            XLogArchiveTimeout = 0;
  bool        XLogArchiveMode = false;
  char       *XLogArchiveCommand = NULL;
*************** GetSystemIdentifier(void)
*** 4780,4790 ****
--- 4781,4819 ----
  /*
   * Initialization of shared memory for XLOG
   */
+
+ void XLOGTuneNumBuffers(void)
+ {
+     int one_segment = XLOG_SEG_SIZE / XLOG_BLCKSZ;
+
+     /*
+      * If automatic setting was requested, use about 3% as much memory as
+      * requested for the buffer cache.  Clamp the automatic maximum to the
+      * size of one XLOG segment, while still allowing a larger manual
+      * setting.  Don't go below the default setting in earlier versions:
+      * twice the size of the minimum, which at default build options is 64kB.
+      */
+     if (XLOGbuffers == -1)  /* set automatically */
+         {
+         XLOGbuffers = NBuffers / 32;
+         if (XLOGbuffers > one_segment)
+             XLOGbuffers = one_segment;
+         if (XLOGbuffers < (XLOGbuffersMin * 2))
+             XLOGbuffers = XLOGbuffersMin * 2;
+         }
+
+     /* Enforce a 32KB minimum in every case */
+     if (XLOGbuffers < XLOGbuffersMin)
+         XLOGbuffers = XLOGbuffersMin;
+ }
+
  Size
  XLOGShmemSize(void)
  {
      Size        size;

+     XLOGTuneNumBuffers();
+
      /* XLogCtl */
      size = sizeof(XLogCtlData);
      /* xlblocks array */
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index e4dea31..7c014cb 100644
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
*************** static struct config_int ConfigureNamesI
*** 1765,1771 ****
              GUC_UNIT_XBLOCKS
          },
          &XLOGbuffers,
!         8, 4, INT_MAX, NULL, NULL
      },

      {
--- 1765,1771 ----
              GUC_UNIT_XBLOCKS
          },
          &XLOGbuffers,
!         -1, -1, INT_MAX, NULL, NULL
      },

      {
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index f436b83..6c6f9a9 100644
*** a/src/backend/utils/misc/postgresql.conf.sample
--- b/src/backend/utils/misc/postgresql.conf.sample
***************
*** 162,168 ****
                      #   fsync_writethrough
                      #   open_sync
  #full_page_writes = on            # recover from partial page writes
! #wal_buffers = 64kB            # min 32kB
                      # (change requires restart)
  #wal_writer_delay = 200ms        # 1-10000 milliseconds

--- 162,168 ----
                      #   fsync_writethrough
                      #   open_sync
  #full_page_writes = on            # recover from partial page writes
! #wal_buffers = -1            # min 32kB, -1 sets based on shared_buffers
                      # (change requires restart)
  #wal_writer_delay = 200ms        # 1-10000 milliseconds

diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 74d3427..b475867 100644
*** a/src/include/access/xlog.h
--- b/src/include/access/xlog.h
*************** typedef struct XLogRecord
*** 85,90 ****
--- 85,93 ----
   */
  #define XLR_BKP_REMOVABLE        0x01

+ /* Minimum setting used for a lower bound on wal_buffers */
+ #define XLOG_BUFFER_MIN            4
+
  /* Sync methods */
  #define SYNC_METHOD_FSYNC        0
  #define SYNC_METHOD_FDATASYNC    1
*************** extern void GetXLogReceiptTime(Timestamp
*** 293,298 ****
--- 296,302 ----

  extern void UpdateControlFile(void);
  extern uint64 GetSystemIdentifier(void);
+ extern void XLOGTuneNumBuffers(void);
  extern Size XLOGShmemSize(void);
  extern void XLOGShmemInit(void);
  extern void BootStrapXLOG(void);

Re: auto-sizing wal_buffers

From
Josh Berkus
Date:
On 1/14/11 10:51 PM, Greg Smith wrote:
> 
> !         Since the data is written out to disk at every transaction
> commit,
> !         the setting many only need to be be large enough to hold the
> amount
> !         of WAL data generated by one typical transaction.  Larger values,
> !         typically at least a few megabytes, can improve write performance
> !         on a busy server where many clients are committing at once.
> !         Extremely large settings are unlikely to provide additional
> benefit.

I think we can be more specific on that last sentence; is there even any
*theoretical* benefit to settings above 16MB, the size of a WAL segment?Certainly there have been no test results to
showany.
 

If we don't know, keep it vague, but otherwise I suggest:

"Settings larger than the size of a single WAL segment (16MB by default)
are unlikely to produce any benefit."

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: auto-sizing wal_buffers

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> I think we can be more specific on that last sentence; is there even any
> *theoretical* benefit to settings above 16MB, the size of a WAL segment?

IIRC there's a forced fsync at WAL segment switch, so no.
        regards, tom lane


Re: auto-sizing wal_buffers

From
Marti Raudsepp
Date:
On Sun, Jan 16, 2011 at 00:34, Josh Berkus <josh@agliodbs.com> wrote:
> I think we can be more specific on that last sentence; is there even any
> *theoretical* benefit to settings above 16MB, the size of a WAL segment?
>  Certainly there have been no test results to show any.

I don't know if it's applicable to real workloads in any way, but it
did make a measurable difference in one of my tests.

Back when benchmarking different wal_sync_methods, I found that when
doing massive INSERTs from generate_series, the INSERT time kept
improving even after increasing wal_buffers from 16MB to 32, 64 and
128MB; especially with wal_sync_method=open_datasync. The total
INSERT+COMMIT time remained constant, however.

More details here:
http://archives.postgresql.org/pgsql-performance/2010-11/msg00094.php

Regards,
Marti


Re: auto-sizing wal_buffers

From
Fujii Masao
Date:
On Sun, Jan 16, 2011 at 1:52 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> Fujii Masao wrote:
>>
>> +int                    XLOGbuffersMin = 8;
>>
>> XLOGbuffersMin is a fixed value. I think that defining it as a macro
>> rather than a variable seems better.
>>
>> +               if (XLOGbuffers > 2048)
>> +                       XLOGbuffers = 2048;
>>
>> Using "XLOG_SEG_SIZE/XLOG_BLCKSZ" rather than 2048 seems
>> better.
>>
>> +#wal_buffers = -1                      # min 32kB, -1 sets based on
>> shared_buffers
>>
>> Typo: s/32kB/64kB
>>
>
> Thanks, I've fixed all these issues and attached a new full patch, pushed to
> github, etc.  Tests give same results back, and it's nice that it scale to
> reasonable behavior if someone changes their XLOG segment size.

Thanks for the update.

+/* Minimum setting used for a lower bound on wal_buffers */
+#define XLOG_BUFFER_MIN            4

Why didn't you use XLOG_BUFFER_MIN instead of XLOGbuffersMin?
XLOG_BUFFER_MIN is not used anywhere for now.

+        if (XLOGbuffers < (XLOGbuffersMin * 2))
+            XLOGbuffers = XLOGbuffersMin * 2;
+        }

Why is the minimum value 64kB only when wal_buffers is set to
-1? This seems confusing for users.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: auto-sizing wal_buffers

From
Fujii Masao
Date:
On Sun, Jan 16, 2011 at 7:34 AM, Josh Berkus <josh@agliodbs.com> wrote:
> I think we can be more specific on that last sentence; is there even any
> *theoretical* benefit to settings above 16MB, the size of a WAL segment?
>  Certainly there have been no test results to show any.

If the workload generates 16MB or more WAL for wal_writer_delay,
16MB or more of wal_buffers would be effective. In that case,
wal_buffers is likely to be filled up with unwritten WAL, then you have
to write buffers while holding WALInsert lock. This is obviously not
good.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: auto-sizing wal_buffers

From
Jeff Janes
Date:
On Sat, Jan 15, 2011 at 2:34 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 1/14/11 10:51 PM, Greg Smith wrote:
>>
>> !         Since the data is written out to disk at every transaction
>> commit,
>> !         the setting many only need to be be large enough to hold the
>> amount
>> !         of WAL data generated by one typical transaction.  Larger values,
>> !         typically at least a few megabytes, can improve write performance
>> !         on a busy server where many clients are committing at once.
>> !         Extremely large settings are unlikely to provide additional
>> benefit.
>
> I think we can be more specific on that last sentence; is there even any
> *theoretical* benefit to settings above 16MB, the size of a WAL segment?

I would turn it around and ask if there is any theoretical reason it
would not benefit?
(And if so, can they be cured soon?)


>  Certainly there have been no test results to show any.

Did the tests show steady improvement up to 16MB and then suddenly
hit a wall?  (And in which case, were they recompiled at a larger segment
size and repeated?)  Or did improvement just peter out because 16MB is really
quite a bit and there was just no need for it to be larger independent
of segment size?

Cheers,

Jeff


Re: auto-sizing wal_buffers

From
Jeff Janes
Date:
On Sun, Jan 16, 2011 at 9:32 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> I think we can be more specific on that last sentence; is there even any
>> *theoretical* benefit to settings above 16MB, the size of a WAL segment?
>
> IIRC there's a forced fsync at WAL segment switch, so no.

However other backends can still do WAL inserts while that fsync
takes place,  as long as they can find available buffers to write into.
So that should not be too limiting--a larger wal_buffers make it more
likely they will find available buffers.

However if the background writer does not keep up under bulk loading
conditions, then the end of segment fsync will probably happen via
AdvanceXLInsertBuffer, which will be sitting on the WALInsertLock.  So
that is obviously bad news.

Cheers,

Jeff


Re: auto-sizing wal_buffers

From
Greg Smith
Date:
Fujii Masao wrote:
> +/* Minimum setting used for a lower bound on wal_buffers */
> +#define XLOG_BUFFER_MIN            4
>
> Why didn't you use XLOG_BUFFER_MIN instead of XLOGbuffersMin?
> XLOG_BUFFER_MIN is not used anywhere for now.
>   

That's a typo; will fix.

> +        if (XLOGbuffers < (XLOGbuffersMin * 2))
> +            XLOGbuffers = XLOGbuffersMin * 2;
> +        }
>
> Why is the minimum value 64kB only when wal_buffers is set to
> -1? This seems confusing for users.
>   

That's because the current default on older versions is 64kB.  Since the 
automatic selection is going to be the new default, I hope, I don't want 
it to be possible it will pick a number smaller than the default of 
older versions.  So the automatic lower limit is 64kB, while the actual 
manually set lower limit remains 32kB, as before.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books



Re: auto-sizing wal_buffers

From
Greg Smith
Date:
Josh Berkus wrote:
> I think we can be more specific on that last sentence; is there even any
> *theoretical* benefit to settings above 16MB, the size of a WAL segment?
>  Certainly there have been no test results to show any.
>   

There was the set Marti just reminded about.  The old wording suggested 
big enough to fix a single transaction was big enough, and that let to 
many people being confused and setting this parameter way too low.  
Since it's possible I'm wrong about 16MB being the upper limit, I didn't 
want the wording to specifically rule out people testing that size to 
see what happens.  We believe  there's never any advantage due to the 
forced wal segment switch, but having test results to the contrary 
floating around keeps me from being too aggressive in how the wording 
there goes.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books



Re: auto-sizing wal_buffers

From
Fujii Masao
Date:
On Tue, Jan 18, 2011 at 8:50 PM, Greg Smith <greg@2ndquadrant.com> wrote:
>> Why is the minimum value 64kB only when wal_buffers is set to
>> -1? This seems confusing for users.
>>
>
> That's because the current default on older versions is 64kB.  Since the
> automatic selection is going to be the new default, I hope, I don't want it
> to be possible it will pick a number smaller than the default of older
> versions.  So the automatic lower limit is 64kB, while the actual manually
> set lower limit remains 32kB, as before.

It would be helpful to explain that as the source code comment. Also
in the document.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: auto-sizing wal_buffers

From
Robert Haas
Date:
On Sat, Jan 15, 2011 at 11:52 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> Where I was expecting that setting to be "4" instead for 32kB.  So there's
> probably some minor bug left in where I inserted this into the
> initialization sequence.

So I started taking a look at this patch tonight with an eye to
committing it, but ended up having to whack it around fairly hard to
fix the problem described above: the problem is, I believe, that the
initialization code in question doesn't run in every backend.  My
first thought was "gee, it's a bad idea for us to be manipulating the
value of the GUC variable directly, I should use an assign_hook".
But of course that turns out to be a bad idea, because there's no
guarantee that wal_buffers will be initialized after shared_buffers.
So I put it back the way you had it, and jiggered things so that the
copy of the value that's actually used for memory allocation gets
stored in XLogCtl.  That made it possible to define a show_hook that
DTRT, but that wasn't entirely straightforward either: the show_hook
has to deliver the finished string, not just a replacement integer for
guc.c to format.  So I exposed the relevant formatting logic from
guc.c as a separate function (duplicating that code didn't seem
smart), and now it all seems to work.  See attached, which also
includes some other rewriting of code and docs that I hope constitute
improvements.

Barring screams of agony^W^W^Whelpful suggestions for how to code this
more neatly, I'll go ahead and commit this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: auto-sizing wal_buffers

From
Jaime Casanova
Date:
On Sat, Jan 22, 2011 at 12:33 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Jan 15, 2011 at 11:52 AM, Greg Smith <greg@2ndquadrant.com> wrote:
>> Where I was expecting that setting to be "4" instead for 32kB.  So there's
>> probably some minor bug left in where I inserted this into the
>> initialization sequence.
>
> So I exposed the relevant formatting logic from guc.c as a separate function

i have read this very breafly, so not much comment... just a few questions...

why is this better than using XLOG_BUFFER_MIN? (the same for the 8
buffers assigned just above of it)

+    else if (XLOGbuffers < 4)
+        XLOGbuffers = 4;


also this
+    Assert(XLOGbuffers > 0);
maybe should be       Assert(XLOGbuffers >= XLOG_BUFFER_MIN);


while you move the code, why didn't you keep this comment?
-                    /*
-                     * Use int64 arithmetic to avoid overflows in units
-                     * conversion.
-                     */



--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL


Re: auto-sizing wal_buffers

From
Robert Haas
Date:
On Sat, Jan 22, 2011 at 1:30 AM, Jaime Casanova <jaime@2ndquadrant.com> wrote:
> why is this better than using XLOG_BUFFER_MIN? (the same for the 8
> buffers assigned just above of it)
>
> +       else if (XLOGbuffers < 4)
> +               XLOGbuffers = 4;

Oh, good point.  Woops.

> also this
> +       Assert(XLOGbuffers > 0);
> maybe should be
>        Assert(XLOGbuffers >= XLOG_BUFFER_MIN);

I think that's slightly less clear about the point of the assertion,
which is to make sure we're at least allocating something.

> while you move the code, why didn't you keep this comment?
> -                                       /*
> -                                        * Use int64 arithmetic to avoid overflows in units
> -                                        * conversion.
> -                                        */

Because I suck.  Will fix.

Thanks for the fast and detailed review.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: auto-sizing wal_buffers

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Barring screams of agony^W^W^Whelpful suggestions for how to code this
> more neatly, I'll go ahead and commit this.

The show_hook sucks, and is unnecessary, as is exposing the format
function.  Just set the GUC variable to the correct value.
        regards, tom lane


Re: auto-sizing wal_buffers

From
Tom Lane
Date:
I wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Barring screams of agony^W^W^Whelpful suggestions for how to code this
>> more neatly, I'll go ahead and commit this.

> The show_hook sucks, and is unnecessary, as is exposing the format
> function.  Just set the GUC variable to the correct value.

Um: that was probably too brief.  "Just set" means "use SetConfigOption".
As penance, I've committed the corrected patch.
        regards, tom lane


Re: auto-sizing wal_buffers

From
Robert Haas
Date:
On Sat, Jan 22, 2011 at 8:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Um: that was probably too brief.

Yep.

> "Just set" means "use SetConfigOption".

OK.

> As penance, I've committed the corrected patch.

Which is obviously derived from my version rather than Greg's, without credit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: auto-sizing wal_buffers

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
>> As penance, I've committed the corrected patch.

> Which is obviously derived from my version rather than Greg's, without credit.

Apologies --- I thought I'd ended up reverting basically all your
changes, so I just credited him.
        regards, tom lane


Re: auto-sizing wal_buffers

From
Robert Haas
Date:
On Sat, Jan 22, 2011 at 8:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> "Just set" means "use SetConfigOption".
>
> OK.

*reads patch as committed*

This is certainly shorter than I wrote, which is good, but it strikes
me that the fundamental problem here is that the API for an assign
hook is fundamentally different for strings than it is for other data
types.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: auto-sizing wal_buffers

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> This is certainly shorter than I wrote, which is good, but it strikes
> me that the fundamental problem here is that the API for an assign
> hook is fundamentally different for strings than it is for other data
> types.

I agree that that's annoying, but given that strings are pass-by-ref
while the other GUC variable types are pass-by-value, it's not really
very easy to make them alike.

In any case, it's not too relevant to this patch, because an assign hook
cannot solve this problem.  As someone (I think you) pointed out
upthread, an assign hook would only be useful if we were sure
wal_buffers would in fact be assigned to by the config file, and that
that would happen after shared_buffers acquired its final value.  Since
we can't assume either thing, the right way to approach it is to have an
internal action that assigns a fresh value to wal_buffers after all the
configuration processing is complete.  Greg had the right design but
didn't know how to change a GUC setting properly.  There are a bunch of
other hacks^Wfeatures that work similarly --- look around for
SetConfigOption calls.
        regards, tom lane


Re: auto-sizing wal_buffers

From
Robert Haas
Date:
On Sat, Jan 22, 2011 at 9:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> This is certainly shorter than I wrote, which is good, but it strikes
>> me that the fundamental problem here is that the API for an assign
>> hook is fundamentally different for strings than it is for other data
>> types.
>
> I agree that that's annoying, but given that strings are pass-by-ref
> while the other GUC variable types are pass-by-value, it's not really
> very easy to make them alike.
>
> In any case, it's not too relevant to this patch, because an assign hook
> cannot solve this problem.  As someone (I think you) pointed out
> upthread, an assign hook would only be useful if we were sure
> wal_buffers would in fact be assigned to by the config file, and that
> that would happen after shared_buffers acquired its final value.  Since
> we can't assume either thing, the right way to approach it is to have an
> internal action that assigns a fresh value to wal_buffers after all the
> configuration processing is complete.  Greg had the right design but
> didn't know how to change a GUC setting properly.  There are a bunch of
> other hacks^Wfeatures that work similarly --- look around for
> SetConfigOption calls.

I'm going with hacks.  Any API that requires you to print to a string
so you can turn around and immediately convert it back to an integer
is not too swift.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: auto-sizing wal_buffers

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I'm going with hacks.  Any API that requires you to print to a string
> so you can turn around and immediately convert it back to an integer
> is not too swift.

Oh, you're complaining about SetConfigOption, not the assign hooks.

Not sure if it's really worth refactoring that.  The problem is that
there are lots and lots and lots of places that need to call that
*without* any dependency on what the datatype of the target GUC option
really is.  There are a small number where we do know the type and
conversion to a string is just overhead.  If any of those were
performance-critical it might be worth worrying about, but this one
certainly isn't.
        regards, tom lane


Re: auto-sizing wal_buffers

From
Robert Haas
Date:
On Sat, Jan 22, 2011 at 9:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I'm going with hacks.  Any API that requires you to print to a string
>> so you can turn around and immediately convert it back to an integer
>> is not too swift.
>
> Oh, you're complaining about SetConfigOption, not the assign hooks.

I was actually complaining about the latter, and then switched gears
to the former.  I'm an equal-opportunity complainer today, I guess...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: auto-sizing wal_buffers

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
>> Oh, you're complaining about SetConfigOption, not the assign hooks.

> I was actually complaining about the latter, and then switched gears
> to the former.  I'm an equal-opportunity complainer today, I guess...

It does strike me that we could provide SetConfigOptionInt,
SetConfigOptionBool, and SetConfigOptionReal for the benefit of callers
who'd prefer to pass values in those formats.  They'd still do sprintf
internally, but this would make the call sites a bit cleaner.

In a quick tally, though, I see only a couple of potential callers for
SetConfigOptionInt, perhaps a dozen for SetConfigOptionBool, and none at
all for SetConfigOptionReal.  Hence not sure it's worth the trouble.
        regards, tom lane


Re: auto-sizing wal_buffers

From
Robert Haas
Date:
On Sat, Jan 22, 2011 at 9:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>>> Oh, you're complaining about SetConfigOption, not the assign hooks.
>
>> I was actually complaining about the latter, and then switched gears
>> to the former.  I'm an equal-opportunity complainer today, I guess...
>
> It does strike me that we could provide SetConfigOptionInt,
> SetConfigOptionBool, and SetConfigOptionReal for the benefit of callers
> who'd prefer to pass values in those formats.  They'd still do sprintf
> internally, but this would make the call sites a bit cleaner.

Why do we need to double the conversion in the first place?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: auto-sizing wal_buffers

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Sat, Jan 22, 2011 at 9:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It does strike me that we could provide SetConfigOptionInt,
>> SetConfigOptionBool, and SetConfigOptionReal for the benefit of callers
>> who'd prefer to pass values in those formats. �They'd still do sprintf
>> internally, but this would make the call sites a bit cleaner.

> Why do we need to double the conversion in the first place?

Because most of the processing in set_config_option is independent of
the type of the GUC variable.  Maybe it could be refactored, but I don't
think it would come out prettier, nor faster.  Again, the important code
paths are starting from string values anyway --- I don't think we should
contort the design of guc.c to serve a small minority of callers at the
expense of complicating the normal cases.
        regards, tom lane