Thread: configure option for XLOG_BLCKSZ
Hi all, I saw a that a patch was committed that exposed a configure switch for BLCKSZ. I was hoping that I could do that same for XLOG_BLCKSZ. I think I got the configure.in, sgml, pg_config_manual.h, and pg_config.h.in changes correct. Regards, Mark
Attachment
"Mark Wong" <markwkm@gmail.com> writes: > I saw a that a patch was committed that exposed a configure switch for > BLCKSZ. I was hoping that I could do that same for XLOG_BLCKSZ. Well, we certainly *could*, but what's the use-case really? The case for varying BLCKSZ is marginal already, and I've seen none at all for varying XLOG_BLCKSZ. Why do we need to make it easier than "edit pg_config_manual.h"? regards, tom lane
Tom Lane wrote: > "Mark Wong" <markwkm@gmail.com> writes: >> I saw a that a patch was committed that exposed a configure switch for >> BLCKSZ. I was hoping that I could do that same for XLOG_BLCKSZ. > > Well, we certainly *could*, but what's the use-case really? The case > for varying BLCKSZ is marginal already, and I've seen none at all for > varying XLOG_BLCKSZ. Why do we need to make it easier than "edit > pg_config_manual.h"? The use case I could see is for performance testing but I would concur that it doesn't take much to modify pg_config_manual.h. In thinking about it, this might actually be a foot gun. You have a new pg guy, download source and think to himself..., "Hey I have a 4k block size as formatted on my hard disk". Then all of a sudden they have an incompatible PostgreSQL with everything else. Sincerely, Joshua D. Drake > > regards, tom lane >
On Fri, May 2, 2008 at 12:04 AM, Joshua D. Drake <jd@commandprompt.com> wrote: > > Tom Lane wrote: > > > "Mark Wong" <markwkm@gmail.com> writes: > > > > > I saw a that a patch was committed that exposed a configure switch for > > > BLCKSZ. I was hoping that I could do that same for XLOG_BLCKSZ. > > > > > > > Well, we certainly *could*, but what's the use-case really? The case > > for varying BLCKSZ is marginal already, and I've seen none at all for > > varying XLOG_BLCKSZ. Why do we need to make it easier than "edit > > pg_config_manual.h"? > > > > The use case I could see is for performance testing but I would concur that > it doesn't take much to modify pg_config_manual.h. In thinking about it, > this might actually be a foot gun. You have a new pg guy, download source > and think to himself..., "Hey I have a 4k block size as formatted on my hard > disk". Then all of a sudden they have an incompatible PostgreSQL with > everything else. As someone who has tested varying both those parameters it feels awkward to have a configure option for one and not the other, or vice versa. I have slightly stronger feelings for having them both as configure options because it's easier to script, but feel a little more strongly about having BLCKSZ and XLOG_BLCKSZ both as either configure options or in pg_config_manual.h. To have them such that one needs to change them in different manners makes a tad more work in automating testing. So my case is just for ease of testing. Regards, Mark
"Mark Wong" <markwkm@gmail.com> writes: > As someone who has tested varying both those parameters it feels > awkward to have a configure option for one and not the other, or vice > versa. I have slightly stronger feelings for having them both as > configure options because it's easier to script, but feel a little > more strongly about having BLCKSZ and XLOG_BLCKSZ both as either > configure options or in pg_config_manual.h. To have them such that > one needs to change them in different manners makes a tad more work in > automating testing. So my case is just for ease of testing. Well, that's a fair point. Another issue though is whether it makes sense for XLOG_BLCKSZ to be different from BLCKSZ at all, at least in the default case. They are both the unit of I/O and it's not clear why you'd want different units. Mark, has your testing shown any indication that they really ought to be separately configurable? I could see having the same configure switch set both of 'em. regards, tom lane
On Fri, May 2, 2008 at 8:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Mark Wong" <markwkm@gmail.com> writes: > > > As someone who has tested varying both those parameters it feels > > awkward to have a configure option for one and not the other, or vice > > versa. I have slightly stronger feelings for having them both as > > configure options because it's easier to script, but feel a little > > more strongly about having BLCKSZ and XLOG_BLCKSZ both as either > > configure options or in pg_config_manual.h. To have them such that > > one needs to change them in different manners makes a tad more work in > > automating testing. So my case is just for ease of testing. > > Well, that's a fair point. Another issue though is whether it makes > sense for XLOG_BLCKSZ to be different from BLCKSZ at all, at least in > the default case. They are both the unit of I/O and it's not clear > why you'd want different units. Mark, has your testing shown any > indication that they really ought to be separately configurable? > I could see having the same configure switch set both of 'em. I still believe it makes sense to have them separated. I did have some data, which has since been destroyed, that suggested there were some system characterization differences for OLTP workloads with PostgreSQL. Let's hope those disks get delivered to Portland soon. :) Regards, Mark
"Mark Wong" <markwkm@gmail.com> writes: > I still believe it makes sense to have them separated. I did have > some data, which has since been destroyed, that suggested there were > some system characterization differences for OLTP workloads with > PostgreSQL. Let's hope those disks get delivered to Portland soon. :) Fair enough. It's not that much more code to have another configure switch --- will go do that. If we are allowing blocksize and relation seg size to have configure switches, seems that symmetry would demand that XLOG_SEG_SIZE be configurable as well. Thoughts? regards, tom lane
On Fri, 2 May 2008 09:12:32 -0700 "Mark Wong" <markwkm@gmail.com> wrote: > I still believe it makes sense to have them separated. I did have > some data, which has since been destroyed, that suggested there were > some system characterization differences for OLTP workloads with > PostgreSQL. Let's hope those disks get delivered to Portland soon. :) I have those disks. Joshua D. Drake > > Regards, > Mark > -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
Attachment
On Fri, 2 May 2008, Tom Lane wrote: > The case for varying BLCKSZ is marginal already, and I've seen none at > all for varying XLOG_BLCKSZ. I recall someone on the performance list who felt it useful increase XLOG_BLCKSZ to support a high-write environment with WAL shipping, just to make sending the files over the network more efficient. Can't seem to find a reference in the archives though. If you look at things like the giant Sun system tests, there was significant tuning getting all the block sizes to line up better with the underlying hardware. I would not be surprised to discover that sort of install gains a bit from slinging WAL files around in larger chunks as well. They're already using small values for commit_delay just to get the typical WAL write to be in larger blocks. As PostgreSQL makes it way into higher throughput environments, it wouldn't surprise me to discover more of these situations where switching WAL segments every 16MB turns into a bottleneck. Right now, it may only be a few people in the world, but saying "that's big enough" for an allocation of anything usually turns out wrong if you wait long enough. One real concern I have with making this easier to adjust is that I'd hate to let people pick any old block size with the default wal_sync_method, only to have them later discover they can't turn on any direct I/O write method because they botched the alignment restrictions. > Another issue though is whether it makes sense for XLOG_BLCKSZ to be > different from BLCKSZ at all, at least in the default case. They are > both the unit of I/O and it's not clear why you'd want different units. There are lots of people who use completely different physical or logical disk setups for the WAL disk than the regular database. That's going to get even more varied moving forward as SSD starts getting used more, since those devices have a very different set of block size optimization characteristics compared with traditional RAID setups. They prefer smaller blocks to match the underlying flash better, and you don't pay as much of a penalty for writing that way because lining up with the spinning disk isn't important. Someone who put one of DB/WAL on SSD and the other on traditional disk might end up with very different DB/WAL block sizes to match. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Fri, May 2, 2008 at 9:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Mark Wong" <markwkm@gmail.com> writes: > > > I still believe it makes sense to have them separated. I did have > > some data, which has since been destroyed, that suggested there were > > some system characterization differences for OLTP workloads with > > PostgreSQL. Let's hope those disks get delivered to Portland soon. :) > > Fair enough. It's not that much more code to have another configure > switch --- will go do that. > > If we are allowing blocksize and relation seg size to have configure > switches, seems that symmetry would demand that XLOG_SEG_SIZE be > configurable as well. Thoughts? I don't have a feel for this one, but when we get the disks set up we can certainly test to see what effects it has. :) Regards, Mark
"Mark Wong" <markwkm@gmail.com> writes: > I saw a that a patch was committed that exposed a configure switch for > BLCKSZ. I was hoping that I could do that same for XLOG_BLCKSZ. I > think I got the configure.in, sgml, pg_config_manual.h, and > pg_config.h.in changes correct. Applied with minor changes: * I thought it better to call the switch --with-wal-blocksize than --with-xlog-blocksize. Although we've not been terribly consistent about it, there is more user-facing documentation that calls it WAL than XLOG. * I added a --with-wal-segsize switch as well. It's not totally clear what the allowed ranges of the settings should be. The method of using a shell "case" to verify the setting validity is kinda klugy, but I couldn't offhand think of a direct test for "is this a power of 2" at the shell level, so it seems we need to be restrictive. regards, tom lane
On Fri, 2008-05-02 at 12:28 -0400, Greg Smith wrote: > As PostgreSQL makes its way into higher throughput environments, it > wouldn't surprise me to discover more of these situations where switching > WAL segments every 16MB turns into a bottleneck. We already hit that issue and fixed it early in the 8.3 cycle. It was more of a problem than the checkpoint issue because it caused hard lock-outs while the file switches occurred. It didn't show up unless you looked at the very detailed transaction result data because on fast systems we are file switching every few seconds. Not seen any gains from varying the WAL file size since then... -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs <simon@2ndquadrant.com> writes: > We already hit that issue and fixed it early in the 8.3 cycle. It was > more of a problem than the checkpoint issue because it caused hard > lock-outs while the file switches occurred. It didn't show up unless you > looked at the very detailed transaction result data because on fast > systems we are file switching every few seconds. > Not seen any gains from varying the WAL file size since then... I think the use-case for varying the WAL segment size is unrelated to performance of the master server, but would instead be concerned with adjusting the granularity of WAL log shipping. regards, tom lane
On Sat, 03 May 2008 13:14:35 -0400 Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > > Not seen any gains from varying the WAL file size since then... > > I think the use-case for varying the WAL segment size is unrelated to > performance of the master server, but would instead be concerned with > adjusting the granularity of WAL log shipping. *nod* I heard this argument several times. Simon: there was a discussion about this topic in Prato last year. Since WAL logfiles are usually binary stuff, the files can't be compressed much so a smaller logfile size on a not-so-much-used system would save a noticeable amount of bandwith (and cpu cycles for compression). Kind regards -- Andreas 'ads' Scherbaum German PostgreSQL User Group
Andreas 'ads' Scherbaum wrote: > On Sat, 03 May 2008 13:14:35 -0400 Tom Lane wrote: > > > Simon Riggs <simon@2ndquadrant.com> writes: > > > > > Not seen any gains from varying the WAL file size since then... > > > > I think the use-case for varying the WAL segment size is unrelated to > > performance of the master server, but would instead be concerned with > > adjusting the granularity of WAL log shipping. > > *nod* I heard this argument several times. Simon: there was a discussion > about this topic in Prato last year. Since WAL logfiles are usually > binary stuff, the files can't be compressed much so a smaller logfile > size on a not-so-much-used system would save a noticeable amount of > bandwith (and cpu cycles for compression). Seems the stuff to zero out the unused segment tail would be more useful here. Kevin sent me the source file some time ago -- he didn't want to upload them to pgfoundry because he was missing a Makefile. I built one for him, but last time I looked he hadn't uploaded anything. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Mon, 5 May 2008 11:09:32 -0400 Alvaro Herrera wrote: > Andreas 'ads' Scherbaum wrote: > > On Sat, 03 May 2008 13:14:35 -0400 Tom Lane wrote: > > > > > Simon Riggs <simon@2ndquadrant.com> writes: > > > > > > > Not seen any gains from varying the WAL file size since then... > > > > > > I think the use-case for varying the WAL segment size is unrelated to > > > performance of the master server, but would instead be concerned with > > > adjusting the granularity of WAL log shipping. > > > > *nod* I heard this argument several times. Simon: there was a discussion > > about this topic in Prato last year. Since WAL logfiles are usually > > binary stuff, the files can't be compressed much so a smaller logfile > > size on a not-so-much-used system would save a noticeable amount of > > bandwith (and cpu cycles for compression). > > Seems the stuff to zero out the unused segment tail would be more useful > here. Yeah, that was the original question, if i remember correctly. If the WAL logfile is zeroed out just before start using it and PG only needs a small part of this logfile, the remaining zeroes are easily compressable. Useful for PITR and good for backups/rsync/scp. Kind regards -- Andreas 'ads' Scherbaum German PostgreSQL User Group
Alvaro Herrera <alvherre@commandprompt.com> writes: >> On Sat, 03 May 2008 13:14:35 -0400 Tom Lane wrote: >>> I think the use-case for varying the WAL segment size is unrelated to >>> performance of the master server, but would instead be concerned with >>> adjusting the granularity of WAL log shipping. > Seems the stuff to zero out the unused segment tail would be more useful > here. Well, that's also useful, but it hardly seems like a substitute for picking a more optimal segment size in the first place. regards, tom lane
On Mon, 2008-05-05 at 13:06 -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > >> On Sat, 03 May 2008 13:14:35 -0400 Tom Lane wrote: > >>> I think the use-case for varying the WAL segment size is unrelated to > >>> performance of the master server, but would instead be concerned with > >>> adjusting the granularity of WAL log shipping. > > > Seems the stuff to zero out the unused segment tail would be more useful > > here. > > Well, that's also useful, but it hardly seems like a substitute for > picking a more optimal segment size in the first place. I can't imagine having separately compiled executables depending upon the write rate of different applications. What would you do if the write rate increases over time (like it usually does)? How would you manage a server farm like that? There's no practical answer there, just a great way to introduce instability where there previously wasn't any. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com