Thread: configure option for XLOG_BLCKSZ

configure option for XLOG_BLCKSZ

From

"Mark Wong"

Date:

02 May 2008, 02:01:41

Hi all,

I saw a that a patch was committed that exposed a configure switch for
BLCKSZ.  I was hoping that I could do that same for XLOG_BLCKSZ.  I
think I got the configure.in, sgml, pg_config_manual.h, and
pg_config.h.in changes correct.

Regards,
Mark

Attachment

pgsql-config-xlog-blocksize.patch

Re: configure option for XLOG_BLCKSZ

From

Tom Lane

Date:

02 May 2008, 02:49:08

"Mark Wong" <markwkm@gmail.com> writes:
> I saw a that a patch was committed that exposed a configure switch for
> BLCKSZ.  I was hoping that I could do that same for XLOG_BLCKSZ.

Well, we certainly *could*, but what's the use-case really?  The case
for varying BLCKSZ is marginal already, and I've seen none at all for
varying XLOG_BLCKSZ.  Why do we need to make it easier than "edit
pg_config_manual.h"?

            regards, tom lane

Re: configure option for XLOG_BLCKSZ

From

"Joshua D. Drake"

Date:

02 May 2008, 04:04:48

Tom Lane wrote:
> "Mark Wong" <markwkm@gmail.com> writes:
>> I saw a that a patch was committed that exposed a configure switch for
>> BLCKSZ.  I was hoping that I could do that same for XLOG_BLCKSZ.
>
> Well, we certainly *could*, but what's the use-case really?  The case
> for varying BLCKSZ is marginal already, and I've seen none at all for
> varying XLOG_BLCKSZ.  Why do we need to make it easier than "edit
> pg_config_manual.h"?

The use case I could see is for performance testing but I would concur
that it doesn't take much to modify pg_config_manual.h. In thinking
about it, this might actually be a foot gun. You have a new pg guy,
download source and think to himself..., "Hey I have a 4k block size as
formatted on my hard disk". Then all of a sudden they have an
incompatible PostgreSQL with everything else.

Sincerely,

Joshua D. Drake

>
>             regards, tom lane
>

Re: configure option for XLOG_BLCKSZ

From

"Mark Wong"

Date:

02 May 2008, 12:45:31

On Fri, May 2, 2008 at 12:04 AM, Joshua D. Drake <jd@commandprompt.com> wrote:
>
> Tom Lane wrote:
>
> > "Mark Wong" <markwkm@gmail.com> writes:
> >
> > > I saw a that a patch was committed that exposed a configure switch for
> > > BLCKSZ.  I was hoping that I could do that same for XLOG_BLCKSZ.
> > >
> >
> > Well, we certainly *could*, but what's the use-case really?  The case
> > for varying BLCKSZ is marginal already, and I've seen none at all for
> > varying XLOG_BLCKSZ.  Why do we need to make it easier than "edit
> > pg_config_manual.h"?
> >
>
>  The use case I could see is for performance testing but I would concur that
> it doesn't take much to modify pg_config_manual.h. In thinking about it,
> this might actually be a foot gun. You have a new pg guy, download source
> and think to himself..., "Hey I have a 4k block size as formatted on my hard
> disk". Then all of a sudden they have an incompatible PostgreSQL with
> everything else.

As someone who has tested varying both those parameters it feels
awkward to have a configure option for one and not the other, or vice
versa.  I have slightly stronger feelings for having them both as
configure options because it's easier to script, but feel a little
more strongly about having BLCKSZ and XLOG_BLCKSZ both as either
configure options or in pg_config_manual.h.  To have them such that
one needs to change them in different manners makes a tad more work in
automating testing.  So my case is just for ease of testing.

Regards,
Mark

Re: configure option for XLOG_BLCKSZ

From

Tom Lane

Date:

02 May 2008, 12:58:02

"Mark Wong" <markwkm@gmail.com> writes:
> As someone who has tested varying both those parameters it feels
> awkward to have a configure option for one and not the other, or vice
> versa.  I have slightly stronger feelings for having them both as
> configure options because it's easier to script, but feel a little
> more strongly about having BLCKSZ and XLOG_BLCKSZ both as either
> configure options or in pg_config_manual.h.  To have them such that
> one needs to change them in different manners makes a tad more work in
> automating testing.  So my case is just for ease of testing.

Well, that's a fair point.  Another issue though is whether it makes
sense for XLOG_BLCKSZ to be different from BLCKSZ at all, at least in
the default case.  They are both the unit of I/O and it's not clear
why you'd want different units.  Mark, has your testing shown any
indication that they really ought to be separately configurable?
I could see having the same configure switch set both of 'em.

            regards, tom lane

Re: configure option for XLOG_BLCKSZ

From

"Mark Wong"

Date:

02 May 2008, 13:12:38

On Fri, May 2, 2008 at 8:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Mark Wong" <markwkm@gmail.com> writes:
>
> > As someone who has tested varying both those parameters it feels
>  > awkward to have a configure option for one and not the other, or vice
>  > versa.  I have slightly stronger feelings for having them both as
>  > configure options because it's easier to script, but feel a little
>  > more strongly about having BLCKSZ and XLOG_BLCKSZ both as either
>  > configure options or in pg_config_manual.h.  To have them such that
>  > one needs to change them in different manners makes a tad more work in
>  > automating testing.  So my case is just for ease of testing.
>
>  Well, that's a fair point.  Another issue though is whether it makes
>  sense for XLOG_BLCKSZ to be different from BLCKSZ at all, at least in
>  the default case.  They are both the unit of I/O and it's not clear
>  why you'd want different units.  Mark, has your testing shown any
>  indication that they really ought to be separately configurable?
>  I could see having the same configure switch set both of 'em.

I still believe it makes sense to have them separated.  I did have
some data, which has since been destroyed, that suggested there were
some system characterization differences for OLTP workloads with
PostgreSQL.  Let's hope those disks get delivered to Portland soon. :)

Regards,
Mark

Re: configure option for XLOG_BLCKSZ

From

Tom Lane

Date:

02 May 2008, 13:16:47

"Mark Wong" <markwkm@gmail.com> writes:
> I still believe it makes sense to have them separated.  I did have
> some data, which has since been destroyed, that suggested there were
> some system characterization differences for OLTP workloads with
> PostgreSQL.  Let's hope those disks get delivered to Portland soon. :)

Fair enough.  It's not that much more code to have another configure
switch --- will go do that.

If we are allowing blocksize and relation seg size to have configure
switches, seems that symmetry would demand that XLOG_SEG_SIZE be
configurable as well.  Thoughts?

            regards, tom lane

Re: configure option for XLOG_BLCKSZ

From

"Joshua D. Drake"

Date:

02 May 2008, 13:29:08

On Fri, 2 May 2008 09:12:32 -0700
"Mark Wong" <markwkm@gmail.com> wrote:


> I still believe it makes sense to have them separated.  I did have
> some data, which has since been destroyed, that suggested there were
> some system characterization differences for OLTP workloads with
> PostgreSQL.  Let's hope those disks get delivered to Portland soon. :)

I have those disks.

Joshua D. Drake


>
> Regards,
> Mark
>


--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

Attachment

signature.asc

Re: configure option for XLOG_BLCKSZ

From

Greg Smith

Date:

02 May 2008, 13:29:15

On Fri, 2 May 2008, Tom Lane wrote:

> The case for varying BLCKSZ is marginal already, and I've seen none at
> all for varying XLOG_BLCKSZ.

I recall someone on the performance list who felt it useful increase
XLOG_BLCKSZ to support a high-write environment with WAL shipping, just to
make sending the files over the network more efficient.  Can't seem to
find a reference in the archives though.

If you look at things like the giant Sun system tests, there was
significant tuning getting all the block sizes to line up better with the
underlying hardware.  I would not be surprised to discover that sort of
install gains a bit from slinging WAL files around in larger chunks as
well.  They're already using small values for commit_delay just to get the
typical WAL write to be in larger blocks.

As PostgreSQL makes it way into higher throughput environments, it
wouldn't surprise me to discover more of these situations where switching
WAL segments every 16MB turns into a bottleneck.  Right now, it may only
be a few people in the world, but saying "that's big enough" for an
allocation of anything usually turns out wrong if you wait long enough.

One real concern I have with making this easier to adjust is that I'd hate
to let people pick any old block size with the default wal_sync_method,
only to have them later discover they can't turn on any direct I/O write
method because they botched the alignment restrictions.

> Another issue though is whether it makes sense for XLOG_BLCKSZ to be
> different from BLCKSZ at all, at least in the default case.  They are
> both the unit of I/O and it's not clear why you'd want different units.

There are lots of people who use completely different physical or logical
disk setups for the WAL disk than the regular database.  That's going to
get even more varied moving forward as SSD starts getting used more, since
those devices have a very different set of block size optimization
characteristics compared with traditional RAID setups.  They prefer
smaller blocks to match the underlying flash better, and you don't pay as
much of a penalty for writing that way because lining up with the spinning
disk isn't important.  Someone who put one of DB/WAL on SSD and the other
on traditional disk might end up with very different DB/WAL block sizes to
match.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: configure option for XLOG_BLCKSZ

From

"Mark Wong"

Date:

02 May 2008, 15:34:23

On Fri, May 2, 2008 at 9:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Mark Wong" <markwkm@gmail.com> writes:
>
> > I still believe it makes sense to have them separated.  I did have
>  > some data, which has since been destroyed, that suggested there were
>  > some system characterization differences for OLTP workloads with
>  > PostgreSQL.  Let's hope those disks get delivered to Portland soon. :)
>
>  Fair enough.  It's not that much more code to have another configure
>  switch --- will go do that.
>
>  If we are allowing blocksize and relation seg size to have configure
>  switches, seems that symmetry would demand that XLOG_SEG_SIZE be
>  configurable as well.  Thoughts?

I don't have a feel for this one, but when we get the disks set up we
can certainly test to see what effects it has. :)

Regards,
Mark

Re: configure option for XLOG_BLCKSZ

From

Tom Lane

Date:

02 May 2008, 16:57:35

"Mark Wong" <markwkm@gmail.com> writes:
> I saw a that a patch was committed that exposed a configure switch for
> BLCKSZ.  I was hoping that I could do that same for XLOG_BLCKSZ.  I
> think I got the configure.in, sgml, pg_config_manual.h, and
> pg_config.h.in changes correct.

Applied with minor changes:

* I thought it better to call the switch --with-wal-blocksize than
--with-xlog-blocksize.  Although we've not been terribly consistent
about it, there is more user-facing documentation that calls it WAL
than XLOG.

* I added a --with-wal-segsize switch as well.

It's not totally clear what the allowed ranges of the settings should
be.  The method of using a shell "case" to verify the setting validity
is kinda klugy, but I couldn't offhand think of a direct test for
"is this a power of 2" at the shell level, so it seems we need to be
restrictive.

            regards, tom lane

Re: configure option for XLOG_BLCKSZ

From

Simon Riggs

Date:

03 May 2008, 08:40:45

On Fri, 2008-05-02 at 12:28 -0400, Greg Smith wrote:

> As PostgreSQL makes its way into higher throughput environments, it
> wouldn't surprise me to discover more of these situations where switching
> WAL segments every 16MB turns into a bottleneck.

We already hit that issue and fixed it early in the 8.3 cycle. It was
more of a problem than the checkpoint issue because it caused hard
lock-outs while the file switches occurred. It didn't show up unless you
looked at the very detailed transaction result data because on fast
systems we are file switching every few seconds.

Not seen any gains from varying the WAL file size since then...

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

Re: configure option for XLOG_BLCKSZ

From

Tom Lane

Date:

03 May 2008, 14:15:10

Simon Riggs <simon@2ndquadrant.com> writes:
> We already hit that issue and fixed it early in the 8.3 cycle. It was
> more of a problem than the checkpoint issue because it caused hard
> lock-outs while the file switches occurred. It didn't show up unless you
> looked at the very detailed transaction result data because on fast
> systems we are file switching every few seconds.

> Not seen any gains from varying the WAL file size since then...

I think the use-case for varying the WAL segment size is unrelated to
performance of the master server, but would instead be concerned with
adjusting the granularity of WAL log shipping.

            regards, tom lane

Re: configure option for XLOG_BLCKSZ

From

Andreas 'ads' Scherbaum

Date:

03 May 2008, 14:26:01

On Sat, 03 May 2008 13:14:35 -0400 Tom Lane wrote:

> Simon Riggs <simon@2ndquadrant.com> writes:
>
> > Not seen any gains from varying the WAL file size since then...
>
> I think the use-case for varying the WAL segment size is unrelated to
> performance of the master server, but would instead be concerned with
> adjusting the granularity of WAL log shipping.

*nod* I heard this argument several times. Simon: there was a discussion
about this topic in Prato last year. Since WAL logfiles are usually
binary stuff, the files can't be compressed much so a smaller logfile
size on a not-so-much-used system would save a noticeable amount of
bandwith (and cpu cycles for compression).

Kind regards

--
                Andreas 'ads' Scherbaum
German PostgreSQL User Group

Re: configure option for XLOG_BLCKSZ

From

Alvaro Herrera

Date:

05 May 2008, 12:09:54

Andreas 'ads' Scherbaum wrote:
> On Sat, 03 May 2008 13:14:35 -0400 Tom Lane wrote:
>
> > Simon Riggs <simon@2ndquadrant.com> writes:
> >
> > > Not seen any gains from varying the WAL file size since then...
> >
> > I think the use-case for varying the WAL segment size is unrelated to
> > performance of the master server, but would instead be concerned with
> > adjusting the granularity of WAL log shipping.
>
> *nod* I heard this argument several times. Simon: there was a discussion
> about this topic in Prato last year. Since WAL logfiles are usually
> binary stuff, the files can't be compressed much so a smaller logfile
> size on a not-so-much-used system would save a noticeable amount of
> bandwith (and cpu cycles for compression).

Seems the stuff to zero out the unused segment tail would be more useful
here.

Kevin sent me the source file some time ago -- he didn't want to upload
them to pgfoundry because he was missing a Makefile.  I built one for
him, but last time I looked he hadn't uploaded anything.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: configure option for XLOG_BLCKSZ

From

Andreas 'ads' Scherbaum

Date:

05 May 2008, 12:58:07

On Mon, 5 May 2008 11:09:32 -0400 Alvaro Herrera wrote:

> Andreas 'ads' Scherbaum wrote:
> > On Sat, 03 May 2008 13:14:35 -0400 Tom Lane wrote:
> >
> > > Simon Riggs <simon@2ndquadrant.com> writes:
> > >
> > > > Not seen any gains from varying the WAL file size since then...
> > >
> > > I think the use-case for varying the WAL segment size is unrelated to
> > > performance of the master server, but would instead be concerned with
> > > adjusting the granularity of WAL log shipping.
> >
> > *nod* I heard this argument several times. Simon: there was a discussion
> > about this topic in Prato last year. Since WAL logfiles are usually
> > binary stuff, the files can't be compressed much so a smaller logfile
> > size on a not-so-much-used system would save a noticeable amount of
> > bandwith (and cpu cycles for compression).
>
> Seems the stuff to zero out the unused segment tail would be more useful
> here.

Yeah, that was the original question, if i remember correctly.
If the WAL logfile is zeroed out just before start using it and PG only
needs a small part of this logfile, the remaining zeroes are easily
compressable. Useful for PITR and good for backups/rsync/scp.


Kind regards

--
                Andreas 'ads' Scherbaum
German PostgreSQL User Group

Re: configure option for XLOG_BLCKSZ

From

Tom Lane

Date:

05 May 2008, 14:06:32

Alvaro Herrera <alvherre@commandprompt.com> writes:
>> On Sat, 03 May 2008 13:14:35 -0400 Tom Lane wrote:
>>> I think the use-case for varying the WAL segment size is unrelated to
>>> performance of the master server, but would instead be concerned with
>>> adjusting the granularity of WAL log shipping.

> Seems the stuff to zero out the unused segment tail would be more useful
> here.

Well, that's also useful, but it hardly seems like a substitute for
picking a more optimal segment size in the first place.

            regards, tom lane

Re: configure option for XLOG_BLCKSZ

From

Simon Riggs

Date:

05 May 2008, 14:50:57

On Mon, 2008-05-05 at 13:06 -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> >> On Sat, 03 May 2008 13:14:35 -0400 Tom Lane wrote:
> >>> I think the use-case for varying the WAL segment size is unrelated to
> >>> performance of the master server, but would instead be concerned with
> >>> adjusting the granularity of WAL log shipping.
>
> > Seems the stuff to zero out the unused segment tail would be more useful
> > here.
>
> Well, that's also useful, but it hardly seems like a substitute for
> picking a more optimal segment size in the first place.

I can't imagine having separately compiled executables depending upon
the write rate of different applications. What would you do if the write
rate increases over time (like it usually does)? How would you manage a
server farm like that? There's no practical answer there, just a great
way to introduce instability where there previously wasn't any.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com