Thread: auto-sizing wal_buffers
On Thu, Jan 6, 2011 at 11:37 PM, Greg Smith <greg@2ndquadrant.com> wrote: > If it defaulted to 3% of shared_buffers, min 64K & max 16MB for the auto > setting, it would for the most part become an autotuned parameter. That > would make it 0.75 to 1MB at the standard anemic Linux default kernel > parameters. Maybe more than some would like, but dropping shared_buffers > from 24MB to 23MB to keep this from being ridiculously undersized is > probably a win. That percentage would reach 16MB by the time shared_buffers > was increased to 533MB, which also seems about right to me. On a really bad > setup (brief pause to flip off Apple) with only 4MB to work with total, > you'd end up with wal_buffers between 64 and 128K, so very close to the > status quo. > > Code that up, and we could probably even remove the parameter as a tunable > altogether. Very few would see a downside relative to any sensible > configuration under the current situation, and many people would notice > better automagic performance with one less parameter to tweak. Given the > recent investigations about the serious downsides of tiny wal_buffers values > on new Linux kernels when using open_datasync, a touch more aggression about > this setting seems particularly appropriate to consider now. That's been > swapped out as the default, but it's still possible people will switch to > it. Would anyone like to argue vigorously for or against the above proposal? I'll start: I think this is a good idea. I don't have a strong opinion on whether the exact details of Greg proposes above are precisely optimal, but I think they're in the right ballpark. Furthermore, we already have other things that are tuned in somewhat similar ways (e.g. the size of the fsync request queue defaults to the number of shared buffers) so there's precedent for it. It's one less parameter that you have to set to make things just work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jan 13, 2011 at 23:19, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Jan 6, 2011 at 11:37 PM, Greg Smith <greg@2ndquadrant.com> wrote: >> If it defaulted to 3% of shared_buffers, min 64K & max 16MB for the auto >> setting, it would for the most part become an autotuned parameter. That >> would make it 0.75 to 1MB at the standard anemic Linux default kernel >> parameters. Maybe more than some would like, but dropping shared_buffers >> from 24MB to 23MB to keep this from being ridiculously undersized is >> probably a win. That percentage would reach 16MB by the time shared_buffers >> was increased to 533MB, which also seems about right to me. On a really bad >> setup (brief pause to flip off Apple) with only 4MB to work with total, >> you'd end up with wal_buffers between 64 and 128K, so very close to the >> status quo. >> >> Code that up, and we could probably even remove the parameter as a tunable >> altogether. Very few would see a downside relative to any sensible >> configuration under the current situation, and many people would notice >> better automagic performance with one less parameter to tweak. Given the >> recent investigations about the serious downsides of tiny wal_buffers values >> on new Linux kernels when using open_datasync, a touch more aggression about >> this setting seems particularly appropriate to consider now. That's been >> swapped out as the default, but it's still possible people will switch to >> it. > > Would anyone like to argue vigorously for or against the above proposal? > > I'll start: I think this is a good idea. I don't have a strong > opinion on whether the exact details of Greg proposes above are > precisely optimal, but I think they're in the right ballpark. > Furthermore, we already have other things that are tuned in somewhat > similar ways (e.g. the size of the fsync request queue defaults to the > number of shared buffers) so there's precedent for it. It's one less > parameter that you have to set to make things just work. +1, I like the idea. Would it still be there to override if necessary? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Thu, Jan 13, 2011 at 5:29 PM, Magnus Hagander <magnus@hagander.net> wrote: > +1, I like the idea. Would it still be there to override if necessary? Depends what people want to do. We could make the default "0kB", and define that to mean "auto-tune", or we could remove the parameter altogether. I think I was envisioning the latter, but if people are hesitant to do that we could do the former instead. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> Depends what people want to do. We could make the default "0kB", and > define that to mean "auto-tune", or we could remove the parameter > altogether. I think I was envisioning the latter, but if people are > hesitant to do that we could do the former instead. Unfortunately, we might still need a manual parameter for override because of the interaction between wal_buffers and synchronous_commit=off, since it sets the max size of the unflushed data buffer. Discuss? And the "auto" setting should be -1, not 0kB. We use -1 for "use default" for several other GUCs. Other than that, I think Greg's numbers are fine, and strongly support having one less thing to tune. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Thu, Jan 13, 2011 at 6:02 PM, Josh Berkus <josh@agliodbs.com> wrote: > >> Depends what people want to do. We could make the default "0kB", and >> define that to mean "auto-tune", or we could remove the parameter >> altogether. I think I was envisioning the latter, but if people are >> hesitant to do that we could do the former instead. > > Unfortunately, we might still need a manual parameter for override > because of the interaction between wal_buffers and > synchronous_commit=off, since it sets the max size of the unflushed data > buffer. Discuss? Do we have any evidence there's actually a problem in that case, or that a larger value of wal_buffers solves it? I mean, the background writer is going to start a background flush as quickly as it can... > And the "auto" setting should be -1, not 0kB. We use -1 for "use > default" for several other GUCs. No can do. Gotta have things in the same units. > Other than that, I think Greg's numbers are fine, and strongly support > having one less thing to tune. OK. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert, >> Unfortunately, we might still need a manual parameter for override >> because of the interaction between wal_buffers and >> synchronous_commit=off, since it sets the max size of the unflushed data >> buffer. Discuss? > > Do we have any evidence there's actually a problem in that case, or > that a larger value of wal_buffers solves it? I mean, the background > writer is going to start a background flush as quickly as it can... I don't think anyone has done any testing. However, the setting is there and some users might be convinced that they need it. >> And the "auto" setting should be -1, not 0kB. We use -1 for "use >> default" for several other GUCs. > > No can do. Gotta have things in the same units. That's certainly not true with, for example, log_temp_files. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Robert Haas <robertmhaas@gmail.com> wrote: > Would anyone like to argue vigorously for or against the above > proposal? Greg's numbers look reasonable to me, and there's nobody I'd trust more to come up with reasonable numbers for this. One less tunable is a good thing, especially since this designed to scale from someone slapping it on his laptop for a first quick try, all the way up to industrial strength production environments. I guess a manual override doesn't bother me too much, but I am a bit dubious of its value, and there is value in keeping the GUC count down.... -Kevin
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jan 13, 2011 at 5:29 PM, Magnus Hagander <magnus@hagander.net> wrote: >> +1, I like the idea. Would it still be there to override if necessary? > Depends what people want to do. We could make the default "0kB", and > define that to mean "auto-tune", or we could remove the parameter > altogether. I think I was envisioning the latter, but if people are > hesitant to do that we could do the former instead. I think we need to keep the override capability until the autotune algorithm has proven itself in the field for a couple of years. I agree with Josh that a negative value should be used to select the autotune method. regards, tom lane
Tom Lane wrote: > I think we need to keep the override capability until the autotune > algorithm has proven itself in the field for a couple of years. > > I agree with Josh that a negative value should be used to select the > autotune method. > Agreed on both fronts. Attached patch does the magic. Also available in branch "walbuffers" from git://github.com/greg2ndQuadrant/postgres.git By changing only shared_buffers I get the following quite reasonable automatic behavior: $ psql -c "SELECT name,unit,boot_val,setting,current_setting(name) FROM pg_settings WHERE name IN ('wal_buffers','shared_buffers')" name | unit | boot_val | setting | current_setting ----------------+------+----------+---------+----------------- shared_buffers | 8kB | 1024 | 3072 | 24MB wal_buffers | 8kB | -1 | 96 | 768kB shared_buffers | 8kB | 1024 | 4096 | 32MB wal_buffers | 8kB | -1 | 128 | 1MB shared_buffers | 8kB | 1024 | 16384 | 128MB wal_buffers | 8kB | -1 | 512 | 4MB shared_buffers | 8kB | 1024 | 131072 | 1GB wal_buffers | 8kB | -1 | 2048 | 16MB shared_buffers | 8kB | 1024 | 262144 | 2GB wal_buffers | 8kB | -1 | 2048 | 16MB If you've set it to the auto-tuning behavior, you don't see that setting of -1 in the SHOW output; you see the value it's actually been set to. The only way to know that was set automatically is to look at boot_val as I've shown here. I consider this what admins would prefer, as the easy way to expose the value that was used. I would understand if people considered it a little odd though. Since you can't change it without a postgresql.conf edit and a server start anyway, and it's tersely documented in the sample postgresql.conf what -1 does, I don't see this being a problem for anyone in the field. To try and clear up some of the confusion around how the earlier documentation suggests larger values of this aren't needed, I added the following updated description of how this has been observed to work for admins in practice: ! Since the data is written out to disk at every transaction commit, ! the setting many only need to be be large enough to hold the amount ! of WAL data generated by one typical transaction. Larger values, ! typically at least a few megabytes, can improve write performance ! on a busy server where many clients are committing at once. ! Extremely large settings are unlikely to provide additional benefit. And to make this easy as possible to apply if I got this right, here's some proposed commit text: Automatically set wal_buffers to be proportional to the size of shared_buffers. Make it 1/32 as large when the auto-tuned behavior, which is the default and set with a value of -1, is used. The previous default of 64kB is still enforced as a minimum value. The maximum automatic value is limited to 16MB. (Note that this not exactly what I put in my own commit message if you grab from my repo, that had a typo) -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 8e2a2c5..c3f5632 100644 *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *************** SET ENABLE_SEQSCAN TO OFF; *** 1638,1649 **** </indexterm> <listitem> <para> ! The amount of memory used in shared memory for WAL data. The ! default is 64 kilobytes (<literal>64kB</>). The setting need only ! be large enough to hold the amount of WAL data generated by one ! typical transaction, since the data is written out to disk at ! every transaction commit. This parameter can only be set at server ! start. </para> <para> --- 1638,1659 ---- </indexterm> <listitem> <para> ! The amount of shared memory used for storing WAL data. The ! default setting of -1 adjusts this automatically based on the size ! of <varname>shared_buffers</varname>, making it 1/32 (about 3%) of ! the size of that normally larger shared memory block. Automatically ! set values are limited to a maximum of 16 megabytes ! (<literal>16MB</>), sufficient to hold one WAL segment worth of data. ! The smallest allowable setting is 64 kilobytes (<literal>64kB</>). ! </para> ! ! <para> ! Since the data is written out to disk at every transaction commit, ! the setting many only need to be be large enough to hold the amount ! of WAL data generated by one typical transaction. Larger values, ! typically at least a few megabytes, can improve write performance ! on a busy server where many clients are committing at once. ! Extremely large settings are unlikely to provide additional benefit. </para> <para> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index b49b933..060e627 100644 *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *************** *** 68,74 **** /* User-settable parameters */ int CheckPointSegments = 3; int wal_keep_segments = 0; ! int XLOGbuffers = 8; int XLogArchiveTimeout = 0; bool XLogArchiveMode = false; char *XLogArchiveCommand = NULL; --- 68,75 ---- /* User-settable parameters */ int CheckPointSegments = 3; int wal_keep_segments = 0; ! int XLOGbuffers = -1; ! int XLOGbuffersMin = 8; int XLogArchiveTimeout = 0; bool XLogArchiveMode = false; char *XLogArchiveCommand = NULL; *************** GetSystemIdentifier(void) *** 4779,4789 **** --- 4780,4812 ---- /* * Initialization of shared memory for XLOG */ + + void XLOGTuneNumBuffers(void) + { + /* + * If automatic setting was requested, use about 3% as much memory as + * requested for the buffer cache. Clamp the automatic maximum to the + * size of one 16MB XLOG segment, while still allowing a larger manual + * setting. + */ + if (XLOGbuffers == -1) + { + XLOGbuffers = NBuffers / 32; + if (XLOGbuffers > 2048) + XLOGbuffers = 2048; + } + /* Enforce a 64KB minimum */ + if (XLOGbuffers < XLOGbuffersMin) + XLOGbuffers = XLOGbuffersMin; + } + Size XLOGShmemSize(void) { Size size; + XLOGTuneNumBuffers(); + /* XLogCtl */ size = sizeof(XLogCtlData); /* xlblocks array */ diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 942acb9..2421460 100644 *** a/src/backend/utils/misc/guc.c --- b/src/backend/utils/misc/guc.c *************** static struct config_int ConfigureNamesI *** 1766,1772 **** GUC_UNIT_XBLOCKS }, &XLOGbuffers, ! 8, 4, INT_MAX, NULL, NULL }, { --- 1766,1772 ---- GUC_UNIT_XBLOCKS }, &XLOGbuffers, ! -1, -1, INT_MAX, NULL, NULL }, { diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index f436b83..6c6f9a9 100644 *** a/src/backend/utils/misc/postgresql.conf.sample --- b/src/backend/utils/misc/postgresql.conf.sample *************** *** 162,168 **** # fsync_writethrough # open_sync #full_page_writes = on # recover from partial page writes ! #wal_buffers = 64kB # min 32kB # (change requires restart) #wal_writer_delay = 200ms # 1-10000 milliseconds --- 162,168 ---- # fsync_writethrough # open_sync #full_page_writes = on # recover from partial page writes ! #wal_buffers = -1 # min 32kB, -1 sets based on shared_buffers # (change requires restart) #wal_writer_delay = 200ms # 1-10000 milliseconds diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index e9d8d15..ed7a32a 100644 *** a/src/include/access/xlog.h --- b/src/include/access/xlog.h *************** extern void GetXLogReceiptTime(Timestamp *** 293,298 **** --- 293,299 ---- extern void UpdateControlFile(void); extern uint64 GetSystemIdentifier(void); + extern void XLOGTuneNumBuffers(void); extern Size XLOGShmemSize(void); extern void XLOGShmemInit(void); extern void BootStrapXLOG(void);
Kevin Grittner wrote: > I guess a manual override doesn't bother me too much, but I am a bit dubious of its > value, and there is value in keeping the GUC count down... It's a risk-reward thing really. The reward for removing it is that a few lines of code and a small section of the documentation go away. It's not very big. The risk seems low, but it's not zero. Let's say this goes in, we get to 9.2 or later, and a survey suggests that no one has needed to ever set wal_buffers when deploying 9.1. At that point I think everyone would feel much better considering to nuke it altogether. I just looked at the code again when developing the patch, and there's really not enough benefit to removing it to worry about taking any risk right now. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
On Sat, Jan 15, 2011 at 3:51 PM, Greg Smith <greg@2ndquadrant.com> wrote: > Agreed on both fronts. Attached patch does the magic. Also available in > branch "walbuffers" from git://github.com/greg2ndQuadrant/postgres.git +int XLOGbuffersMin = 8; XLOGbuffersMin is a fixed value. I think that defining it as a macro rather than a variable seems better. + if (XLOGbuffers > 2048) + XLOGbuffers = 2048; Using "XLOG_SEG_SIZE/XLOG_BLCKSZ" rather than 2048 seems better. +#wal_buffers = -1 # min 32kB, -1 sets based on shared_buffers Typo: s/32kB/64kB Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Fujii Masao wrote: > +int XLOGbuffersMin = 8; > > XLOGbuffersMin is a fixed value. I think that defining it as a macro > rather than a variable seems better. > > + if (XLOGbuffers > 2048) > + XLOGbuffers = 2048; > > Using "XLOG_SEG_SIZE/XLOG_BLCKSZ" rather than 2048 seems > better. > > +#wal_buffers = -1 # min 32kB, -1 sets based on shared_buffers > > Typo: s/32kB/64kB > Thanks, I've fixed all these issues and attached a new full patch, pushed to github, etc. Tests give same results back, and it's nice that it scale to reasonable behavior if someone changes their XLOG segment size. It should be possible to set the value back to the older minimum value of 32kB too. That's doesn't actually seem to work though; when I try it I get: $ psql -c "SELECT name,unit,boot_val,setting,current_setting(name) FROM pg_settings WHERE name IN ('wal_buffers','shared_buffers')" name | unit | boot_val | setting | current_setting ----------------+------+----------+---------+----------------- shared_buffers | 8kB | 1024 | 131072 | 1GB wal_buffers | 8kB | -1 | 8 | 64kB Where I was expecting that setting to be "4" instead for 32kB. So there's probably some minor bug left in where I inserted this into the initialization sequence. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 8e2a2c5..1c13f20 100644 *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *************** SET ENABLE_SEQSCAN TO OFF; *** 1638,1649 **** </indexterm> <listitem> <para> ! The amount of memory used in shared memory for WAL data. The ! default is 64 kilobytes (<literal>64kB</>). The setting need only ! be large enough to hold the amount of WAL data generated by one ! typical transaction, since the data is written out to disk at ! every transaction commit. This parameter can only be set at server ! start. </para> <para> --- 1638,1659 ---- </indexterm> <listitem> <para> ! The amount of shared memory used for storing WAL data. The ! default setting of -1 adjusts this automatically based on the size ! of <varname>shared_buffers</varname>, making it 1/32 (about 3%) of ! the size of that normally larger shared memory block. Automatically ! set values are limited to a maximum sufficient to hold one WAL ! segment worth of data, normally 16 megabytes (<literal>16MB</>). ! The smallest allowable setting is 32 kilobytes (<literal>32kB</>). ! </para> ! ! <para> ! Since the data is written out to disk at every transaction commit, ! the setting many only need to be be large enough to hold the amount ! of WAL data generated by one typical transaction. Larger values, ! typically at least a few megabytes, can improve write performance ! on a busy server where many clients are committing at once. ! Extremely large settings are unlikely to provide additional benefit. </para> <para> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 5b6a230..d057773 100644 *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *************** *** 69,75 **** /* User-settable parameters */ int CheckPointSegments = 3; int wal_keep_segments = 0; ! int XLOGbuffers = 8; int XLogArchiveTimeout = 0; bool XLogArchiveMode = false; char *XLogArchiveCommand = NULL; --- 69,76 ---- /* User-settable parameters */ int CheckPointSegments = 3; int wal_keep_segments = 0; ! int XLOGbuffers = -1; ! int XLOGbuffersMin = 8; int XLogArchiveTimeout = 0; bool XLogArchiveMode = false; char *XLogArchiveCommand = NULL; *************** GetSystemIdentifier(void) *** 4780,4790 **** --- 4781,4819 ---- /* * Initialization of shared memory for XLOG */ + + void XLOGTuneNumBuffers(void) + { + int one_segment = XLOG_SEG_SIZE / XLOG_BLCKSZ; + + /* + * If automatic setting was requested, use about 3% as much memory as + * requested for the buffer cache. Clamp the automatic maximum to the + * size of one XLOG segment, while still allowing a larger manual + * setting. Don't go below the default setting in earlier versions: + * twice the size of the minimum, which at default build options is 64kB. + */ + if (XLOGbuffers == -1) /* set automatically */ + { + XLOGbuffers = NBuffers / 32; + if (XLOGbuffers > one_segment) + XLOGbuffers = one_segment; + if (XLOGbuffers < (XLOGbuffersMin * 2)) + XLOGbuffers = XLOGbuffersMin * 2; + } + + /* Enforce a 32KB minimum in every case */ + if (XLOGbuffers < XLOGbuffersMin) + XLOGbuffers = XLOGbuffersMin; + } + Size XLOGShmemSize(void) { Size size; + XLOGTuneNumBuffers(); + /* XLogCtl */ size = sizeof(XLogCtlData); /* xlblocks array */ diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index e4dea31..7c014cb 100644 *** a/src/backend/utils/misc/guc.c --- b/src/backend/utils/misc/guc.c *************** static struct config_int ConfigureNamesI *** 1765,1771 **** GUC_UNIT_XBLOCKS }, &XLOGbuffers, ! 8, 4, INT_MAX, NULL, NULL }, { --- 1765,1771 ---- GUC_UNIT_XBLOCKS }, &XLOGbuffers, ! -1, -1, INT_MAX, NULL, NULL }, { diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index f436b83..6c6f9a9 100644 *** a/src/backend/utils/misc/postgresql.conf.sample --- b/src/backend/utils/misc/postgresql.conf.sample *************** *** 162,168 **** # fsync_writethrough # open_sync #full_page_writes = on # recover from partial page writes ! #wal_buffers = 64kB # min 32kB # (change requires restart) #wal_writer_delay = 200ms # 1-10000 milliseconds --- 162,168 ---- # fsync_writethrough # open_sync #full_page_writes = on # recover from partial page writes ! #wal_buffers = -1 # min 32kB, -1 sets based on shared_buffers # (change requires restart) #wal_writer_delay = 200ms # 1-10000 milliseconds diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index 74d3427..b475867 100644 *** a/src/include/access/xlog.h --- b/src/include/access/xlog.h *************** typedef struct XLogRecord *** 85,90 **** --- 85,93 ---- */ #define XLR_BKP_REMOVABLE 0x01 + /* Minimum setting used for a lower bound on wal_buffers */ + #define XLOG_BUFFER_MIN 4 + /* Sync methods */ #define SYNC_METHOD_FSYNC 0 #define SYNC_METHOD_FDATASYNC 1 *************** extern void GetXLogReceiptTime(Timestamp *** 293,298 **** --- 296,302 ---- extern void UpdateControlFile(void); extern uint64 GetSystemIdentifier(void); + extern void XLOGTuneNumBuffers(void); extern Size XLOGShmemSize(void); extern void XLOGShmemInit(void); extern void BootStrapXLOG(void);
On 1/14/11 10:51 PM, Greg Smith wrote: > > ! Since the data is written out to disk at every transaction > commit, > ! the setting many only need to be be large enough to hold the > amount > ! of WAL data generated by one typical transaction. Larger values, > ! typically at least a few megabytes, can improve write performance > ! on a busy server where many clients are committing at once. > ! Extremely large settings are unlikely to provide additional > benefit. I think we can be more specific on that last sentence; is there even any *theoretical* benefit to settings above 16MB, the size of a WAL segment?Certainly there have been no test results to showany. If we don't know, keep it vague, but otherwise I suggest: "Settings larger than the size of a single WAL segment (16MB by default) are unlikely to produce any benefit." -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > I think we can be more specific on that last sentence; is there even any > *theoretical* benefit to settings above 16MB, the size of a WAL segment? IIRC there's a forced fsync at WAL segment switch, so no. regards, tom lane
On Sun, Jan 16, 2011 at 00:34, Josh Berkus <josh@agliodbs.com> wrote: > I think we can be more specific on that last sentence; is there even any > *theoretical* benefit to settings above 16MB, the size of a WAL segment? > Certainly there have been no test results to show any. I don't know if it's applicable to real workloads in any way, but it did make a measurable difference in one of my tests. Back when benchmarking different wal_sync_methods, I found that when doing massive INSERTs from generate_series, the INSERT time kept improving even after increasing wal_buffers from 16MB to 32, 64 and 128MB; especially with wal_sync_method=open_datasync. The total INSERT+COMMIT time remained constant, however. More details here: http://archives.postgresql.org/pgsql-performance/2010-11/msg00094.php Regards, Marti
On Sun, Jan 16, 2011 at 1:52 AM, Greg Smith <greg@2ndquadrant.com> wrote: > Fujii Masao wrote: >> >> +int XLOGbuffersMin = 8; >> >> XLOGbuffersMin is a fixed value. I think that defining it as a macro >> rather than a variable seems better. >> >> + if (XLOGbuffers > 2048) >> + XLOGbuffers = 2048; >> >> Using "XLOG_SEG_SIZE/XLOG_BLCKSZ" rather than 2048 seems >> better. >> >> +#wal_buffers = -1 # min 32kB, -1 sets based on >> shared_buffers >> >> Typo: s/32kB/64kB >> > > Thanks, I've fixed all these issues and attached a new full patch, pushed to > github, etc. Tests give same results back, and it's nice that it scale to > reasonable behavior if someone changes their XLOG segment size. Thanks for the update. +/* Minimum setting used for a lower bound on wal_buffers */ +#define XLOG_BUFFER_MIN 4 Why didn't you use XLOG_BUFFER_MIN instead of XLOGbuffersMin? XLOG_BUFFER_MIN is not used anywhere for now. + if (XLOGbuffers < (XLOGbuffersMin * 2)) + XLOGbuffers = XLOGbuffersMin * 2; + } Why is the minimum value 64kB only when wal_buffers is set to -1? This seems confusing for users. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Sun, Jan 16, 2011 at 7:34 AM, Josh Berkus <josh@agliodbs.com> wrote: > I think we can be more specific on that last sentence; is there even any > *theoretical* benefit to settings above 16MB, the size of a WAL segment? > Certainly there have been no test results to show any. If the workload generates 16MB or more WAL for wal_writer_delay, 16MB or more of wal_buffers would be effective. In that case, wal_buffers is likely to be filled up with unwritten WAL, then you have to write buffers while holding WALInsert lock. This is obviously not good. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Sat, Jan 15, 2011 at 2:34 PM, Josh Berkus <josh@agliodbs.com> wrote: > On 1/14/11 10:51 PM, Greg Smith wrote: >> >> ! Since the data is written out to disk at every transaction >> commit, >> ! the setting many only need to be be large enough to hold the >> amount >> ! of WAL data generated by one typical transaction. Larger values, >> ! typically at least a few megabytes, can improve write performance >> ! on a busy server where many clients are committing at once. >> ! Extremely large settings are unlikely to provide additional >> benefit. > > I think we can be more specific on that last sentence; is there even any > *theoretical* benefit to settings above 16MB, the size of a WAL segment? I would turn it around and ask if there is any theoretical reason it would not benefit? (And if so, can they be cured soon?) > Certainly there have been no test results to show any. Did the tests show steady improvement up to 16MB and then suddenly hit a wall? (And in which case, were they recompiled at a larger segment size and repeated?) Or did improvement just peter out because 16MB is really quite a bit and there was just no need for it to be larger independent of segment size? Cheers, Jeff
On Sun, Jan 16, 2011 at 9:32 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Josh Berkus <josh@agliodbs.com> writes: >> I think we can be more specific on that last sentence; is there even any >> *theoretical* benefit to settings above 16MB, the size of a WAL segment? > > IIRC there's a forced fsync at WAL segment switch, so no. However other backends can still do WAL inserts while that fsync takes place, as long as they can find available buffers to write into. So that should not be too limiting--a larger wal_buffers make it more likely they will find available buffers. However if the background writer does not keep up under bulk loading conditions, then the end of segment fsync will probably happen via AdvanceXLInsertBuffer, which will be sitting on the WALInsertLock. So that is obviously bad news. Cheers, Jeff
Fujii Masao wrote: > +/* Minimum setting used for a lower bound on wal_buffers */ > +#define XLOG_BUFFER_MIN 4 > > Why didn't you use XLOG_BUFFER_MIN instead of XLOGbuffersMin? > XLOG_BUFFER_MIN is not used anywhere for now. > That's a typo; will fix. > + if (XLOGbuffers < (XLOGbuffersMin * 2)) > + XLOGbuffers = XLOGbuffersMin * 2; > + } > > Why is the minimum value 64kB only when wal_buffers is set to > -1? This seems confusing for users. > That's because the current default on older versions is 64kB. Since the automatic selection is going to be the new default, I hope, I don't want it to be possible it will pick a number smaller than the default of older versions. So the automatic lower limit is 64kB, while the actual manually set lower limit remains 32kB, as before. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
Josh Berkus wrote: > I think we can be more specific on that last sentence; is there even any > *theoretical* benefit to settings above 16MB, the size of a WAL segment? > Certainly there have been no test results to show any. > There was the set Marti just reminded about. The old wording suggested big enough to fix a single transaction was big enough, and that let to many people being confused and setting this parameter way too low. Since it's possible I'm wrong about 16MB being the upper limit, I didn't want the wording to specifically rule out people testing that size to see what happens. We believe there's never any advantage due to the forced wal segment switch, but having test results to the contrary floating around keeps me from being too aggressive in how the wording there goes. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
On Tue, Jan 18, 2011 at 8:50 PM, Greg Smith <greg@2ndquadrant.com> wrote: >> Why is the minimum value 64kB only when wal_buffers is set to >> -1? This seems confusing for users. >> > > That's because the current default on older versions is 64kB. Since the > automatic selection is going to be the new default, I hope, I don't want it > to be possible it will pick a number smaller than the default of older > versions. So the automatic lower limit is 64kB, while the actual manually > set lower limit remains 32kB, as before. It would be helpful to explain that as the source code comment. Also in the document. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Sat, Jan 15, 2011 at 11:52 AM, Greg Smith <greg@2ndquadrant.com> wrote: > Where I was expecting that setting to be "4" instead for 32kB. So there's > probably some minor bug left in where I inserted this into the > initialization sequence. So I started taking a look at this patch tonight with an eye to committing it, but ended up having to whack it around fairly hard to fix the problem described above: the problem is, I believe, that the initialization code in question doesn't run in every backend. My first thought was "gee, it's a bad idea for us to be manipulating the value of the GUC variable directly, I should use an assign_hook". But of course that turns out to be a bad idea, because there's no guarantee that wal_buffers will be initialized after shared_buffers. So I put it back the way you had it, and jiggered things so that the copy of the value that's actually used for memory allocation gets stored in XLogCtl. That made it possible to define a show_hook that DTRT, but that wasn't entirely straightforward either: the show_hook has to deliver the finished string, not just a replacement integer for guc.c to format. So I exposed the relevant formatting logic from guc.c as a separate function (duplicating that code didn't seem smart), and now it all seems to work. See attached, which also includes some other rewriting of code and docs that I hope constitute improvements. Barring screams of agony^W^W^Whelpful suggestions for how to code this more neatly, I'll go ahead and commit this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Sat, Jan 22, 2011 at 12:33 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Jan 15, 2011 at 11:52 AM, Greg Smith <greg@2ndquadrant.com> wrote: >> Where I was expecting that setting to be "4" instead for 32kB. So there's >> probably some minor bug left in where I inserted this into the >> initialization sequence. > > So I exposed the relevant formatting logic from guc.c as a separate function i have read this very breafly, so not much comment... just a few questions... why is this better than using XLOG_BUFFER_MIN? (the same for the 8 buffers assigned just above of it) + else if (XLOGbuffers < 4) + XLOGbuffers = 4; also this + Assert(XLOGbuffers > 0); maybe should be Assert(XLOGbuffers >= XLOG_BUFFER_MIN); while you move the code, why didn't you keep this comment? - /* - * Use int64 arithmetic to avoid overflows in units - * conversion. - */ -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte y capacitación de PostgreSQL
On Sat, Jan 22, 2011 at 1:30 AM, Jaime Casanova <jaime@2ndquadrant.com> wrote: > why is this better than using XLOG_BUFFER_MIN? (the same for the 8 > buffers assigned just above of it) > > + else if (XLOGbuffers < 4) > + XLOGbuffers = 4; Oh, good point. Woops. > also this > + Assert(XLOGbuffers > 0); > maybe should be > Assert(XLOGbuffers >= XLOG_BUFFER_MIN); I think that's slightly less clear about the point of the assertion, which is to make sure we're at least allocating something. > while you move the code, why didn't you keep this comment? > - /* > - * Use int64 arithmetic to avoid overflows in units > - * conversion. > - */ Because I suck. Will fix. Thanks for the fast and detailed review. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Barring screams of agony^W^W^Whelpful suggestions for how to code this > more neatly, I'll go ahead and commit this. The show_hook sucks, and is unnecessary, as is exposing the format function. Just set the GUC variable to the correct value. regards, tom lane
I wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Barring screams of agony^W^W^Whelpful suggestions for how to code this >> more neatly, I'll go ahead and commit this. > The show_hook sucks, and is unnecessary, as is exposing the format > function. Just set the GUC variable to the correct value. Um: that was probably too brief. "Just set" means "use SetConfigOption". As penance, I've committed the corrected patch. regards, tom lane
On Sat, Jan 22, 2011 at 8:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Um: that was probably too brief. Yep. > "Just set" means "use SetConfigOption". OK. > As penance, I've committed the corrected patch. Which is obviously derived from my version rather than Greg's, without credit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: >> As penance, I've committed the corrected patch. > Which is obviously derived from my version rather than Greg's, without credit. Apologies --- I thought I'd ended up reverting basically all your changes, so I just credited him. regards, tom lane
On Sat, Jan 22, 2011 at 8:45 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> "Just set" means "use SetConfigOption". > > OK. *reads patch as committed* This is certainly shorter than I wrote, which is good, but it strikes me that the fundamental problem here is that the API for an assign hook is fundamentally different for strings than it is for other data types. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > This is certainly shorter than I wrote, which is good, but it strikes > me that the fundamental problem here is that the API for an assign > hook is fundamentally different for strings than it is for other data > types. I agree that that's annoying, but given that strings are pass-by-ref while the other GUC variable types are pass-by-value, it's not really very easy to make them alike. In any case, it's not too relevant to this patch, because an assign hook cannot solve this problem. As someone (I think you) pointed out upthread, an assign hook would only be useful if we were sure wal_buffers would in fact be assigned to by the config file, and that that would happen after shared_buffers acquired its final value. Since we can't assume either thing, the right way to approach it is to have an internal action that assigns a fresh value to wal_buffers after all the configuration processing is complete. Greg had the right design but didn't know how to change a GUC setting properly. There are a bunch of other hacks^Wfeatures that work similarly --- look around for SetConfigOption calls. regards, tom lane
On Sat, Jan 22, 2011 at 9:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> This is certainly shorter than I wrote, which is good, but it strikes >> me that the fundamental problem here is that the API for an assign >> hook is fundamentally different for strings than it is for other data >> types. > > I agree that that's annoying, but given that strings are pass-by-ref > while the other GUC variable types are pass-by-value, it's not really > very easy to make them alike. > > In any case, it's not too relevant to this patch, because an assign hook > cannot solve this problem. As someone (I think you) pointed out > upthread, an assign hook would only be useful if we were sure > wal_buffers would in fact be assigned to by the config file, and that > that would happen after shared_buffers acquired its final value. Since > we can't assume either thing, the right way to approach it is to have an > internal action that assigns a fresh value to wal_buffers after all the > configuration processing is complete. Greg had the right design but > didn't know how to change a GUC setting properly. There are a bunch of > other hacks^Wfeatures that work similarly --- look around for > SetConfigOption calls. I'm going with hacks. Any API that requires you to print to a string so you can turn around and immediately convert it back to an integer is not too swift. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I'm going with hacks. Any API that requires you to print to a string > so you can turn around and immediately convert it back to an integer > is not too swift. Oh, you're complaining about SetConfigOption, not the assign hooks. Not sure if it's really worth refactoring that. The problem is that there are lots and lots and lots of places that need to call that *without* any dependency on what the datatype of the target GUC option really is. There are a small number where we do know the type and conversion to a string is just overhead. If any of those were performance-critical it might be worth worrying about, but this one certainly isn't. regards, tom lane
On Sat, Jan 22, 2011 at 9:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I'm going with hacks. Any API that requires you to print to a string >> so you can turn around and immediately convert it back to an integer >> is not too swift. > > Oh, you're complaining about SetConfigOption, not the assign hooks. I was actually complaining about the latter, and then switched gears to the former. I'm an equal-opportunity complainer today, I guess... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: >> Oh, you're complaining about SetConfigOption, not the assign hooks. > I was actually complaining about the latter, and then switched gears > to the former. I'm an equal-opportunity complainer today, I guess... It does strike me that we could provide SetConfigOptionInt, SetConfigOptionBool, and SetConfigOptionReal for the benefit of callers who'd prefer to pass values in those formats. They'd still do sprintf internally, but this would make the call sites a bit cleaner. In a quick tally, though, I see only a couple of potential callers for SetConfigOptionInt, perhaps a dozen for SetConfigOptionBool, and none at all for SetConfigOptionReal. Hence not sure it's worth the trouble. regards, tom lane
On Sat, Jan 22, 2011 at 9:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >>> Oh, you're complaining about SetConfigOption, not the assign hooks. > >> I was actually complaining about the latter, and then switched gears >> to the former. I'm an equal-opportunity complainer today, I guess... > > It does strike me that we could provide SetConfigOptionInt, > SetConfigOptionBool, and SetConfigOptionReal for the benefit of callers > who'd prefer to pass values in those formats. They'd still do sprintf > internally, but this would make the call sites a bit cleaner. Why do we need to double the conversion in the first place? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Sat, Jan 22, 2011 at 9:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It does strike me that we could provide SetConfigOptionInt, >> SetConfigOptionBool, and SetConfigOptionReal for the benefit of callers >> who'd prefer to pass values in those formats. �They'd still do sprintf >> internally, but this would make the call sites a bit cleaner. > Why do we need to double the conversion in the first place? Because most of the processing in set_config_option is independent of the type of the GUC variable. Maybe it could be refactored, but I don't think it would come out prettier, nor faster. Again, the important code paths are starting from string values anyway --- I don't think we should contort the design of guc.c to serve a small minority of callers at the expense of complicating the normal cases. regards, tom lane