Re: configure option for XLOG_BLCKSZ - Mailing list pgsql-patches

From Greg Smith
Subject Re: configure option for XLOG_BLCKSZ
Date
Msg-id Pine.GSO.4.64.0805020952170.24797@westnet.com
Whole thread Raw
In response to Re: configure option for XLOG_BLCKSZ  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: configure option for XLOG_BLCKSZ
List pgsql-patches
On Fri, 2 May 2008, Tom Lane wrote:

> The case for varying BLCKSZ is marginal already, and I've seen none at
> all for varying XLOG_BLCKSZ.

I recall someone on the performance list who felt it useful increase
XLOG_BLCKSZ to support a high-write environment with WAL shipping, just to
make sending the files over the network more efficient.  Can't seem to
find a reference in the archives though.

If you look at things like the giant Sun system tests, there was
significant tuning getting all the block sizes to line up better with the
underlying hardware.  I would not be surprised to discover that sort of
install gains a bit from slinging WAL files around in larger chunks as
well.  They're already using small values for commit_delay just to get the
typical WAL write to be in larger blocks.

As PostgreSQL makes it way into higher throughput environments, it
wouldn't surprise me to discover more of these situations where switching
WAL segments every 16MB turns into a bottleneck.  Right now, it may only
be a few people in the world, but saying "that's big enough" for an
allocation of anything usually turns out wrong if you wait long enough.

One real concern I have with making this easier to adjust is that I'd hate
to let people pick any old block size with the default wal_sync_method,
only to have them later discover they can't turn on any direct I/O write
method because they botched the alignment restrictions.

> Another issue though is whether it makes sense for XLOG_BLCKSZ to be
> different from BLCKSZ at all, at least in the default case.  They are
> both the unit of I/O and it's not clear why you'd want different units.

There are lots of people who use completely different physical or logical
disk setups for the WAL disk than the regular database.  That's going to
get even more varied moving forward as SSD starts getting used more, since
those devices have a very different set of block size optimization
characteristics compared with traditional RAID setups.  They prefer
smaller blocks to match the underlying flash better, and you don't pay as
much of a penalty for writing that way because lining up with the spinning
disk isn't important.  Someone who put one of DB/WAL on SSD and the other
on traditional disk might end up with very different DB/WAL block sizes to
match.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

pgsql-patches by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: configure option for XLOG_BLCKSZ
Next
From: Simon Riggs
Date:
Subject: Re: [HACKERS] GUC parameter cursors_tuple_fraction