Re: increasing the default WAL segment size - Mailing list pgsql-hackers

From Robert Haas
Subject Re: increasing the default WAL segment size
Date
Msg-id CA+TgmoYza7XdrsX7ok3vWYqYbOesv9Gn8c=W6c0DesHj66NzXg@mail.gmail.com
Whole thread Raw
In response to Re: increasing the default WAL segment size  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: increasing the default WAL segment size  (Magnus Hagander <magnus@hagander.net>)
Re: increasing the default WAL segment size  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Thu, Aug 25, 2016 at 11:21 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 25 August 2016 at 02:31, Robert Haas <robertmhaas@gmail.com> wrote:
>> Furthermore, there is an enforced, synchronous fsync at the end of
>> every segment, which actually does hurt performance on write-heavy
>> workloads.[2] Of course, if that were the only reason to consider
>> increasing the segment size, it would probably make more sense to just
>> try to push that extra fsync into the background, but that's not
>> really the case.  From what I hear, the gigantic number of files is a
>> bigger pain point.
>
> I think we should fully describe the problem before finding a solution.

Sure, that's usually a good idea.  I attempted to outline all of the
possible issues of which I am aware in my original email, but of
course you may know of considerations which I overlooked.

> This is too big a change to just tweak a value without discussing the
> actual issue.

Again, I tried to make sure I was discussing the actual issues in my
original email.  In brief: having to run archive_command multiple
times per second imposes very tight latency requirements on it;
directories with hundreds of thousands or millions of files are hard
to manage; enforced synchronous fsyncs at the end of each segment hurt
performance.

> And if the problem is as described, how can a change of x4 be enough
> to make it worth the pain of change? I think you're already admitting
> it can't be worth it by discussing initdb configuration.

I guess it depends on how much pain of change you think there will be.
I would expect a change from 16MB -> 64MB to be fairly painless, but
(1) it might break tools that aren't designed to cope with differing
segment sizes and (2) it will increase disk utilization for people who
have such low velocity systems that they never end up with more than 2
WAL segments, and now those segments are bigger.  If you know of other
impacts or have reason to believe those problems will be serious,
please fill in the details.

Despite the fact that initdb configuration has dominated this thread,
I mentioned it only in the very last sentence of my email and only as
a possibility.  I believe that a 4x change will be good enough for the
majority of people for whom this is currently a pain point.  However,
yes, I do believe that there are some people for whom it won't be
sufficient.  And I believe that as we continue to enhance PostgreSQL
to support higher and higher transaction rates, the number of people
who need an extra-large WAL segment size will increase.  As I see it,
there are three options here:

1. Do nothing.  So far, I don't see anybody arguing for that.

2. Change the default to 64MB and call it good.  This idea seems to
have considerable support.

3. Allow initdb-time configurability but keep the default at 16MB.  I
don't see any support for this.  There is clearly support for
configurability, but I don't see anyone arguing that the current
default is preferable, unless that is what you are arguing.

4. Change the default to 64MB and also allow initdb-time
configurability.  This option also appears to enjoy substantial
support, perhaps more than #2.  Magnus seemed to be arguing that this
is preferable to #2, because then it's easier for people to change the
setting back if someone discovers a case where the higher default is a
problem; Tom, on the other hand, seems to think this is overkill.

Personally, I believe option #4 is for the best.  I believe that the
great majority of users will be better off with 64MB than with 16MB,
but I like the idea of allowing for smaller values (for people with
really low-velocity instances) and larger ones (for people with really
high-velocity instances).

> If we do have the pain of change, should we also consider making WAL
> files variable length? What do we gain by having the files all the
> same size? ISTM better to have WAL files that vary in length up to 1GB
> in size.

This seems like an odd comment because the whole way we address WAL
positions is based on the fact that segments are fixed size, as I
would have thought you would know better than I.  The file that
contains a particular byte of WAL is based on lsn/XLOG_SEG_SIZE and
the position within the file is lsn%XLOG_SEG_SIZE.  Making files
variable-size would vastly complicate this addressing scheme and maybe
hurt performance in the process.  I can't see any compelling reason to
go there.

> (This is all about XLOG_SEG_SIZE; I presume XLOG_BLCKSZ can stay as it
> is, right?)

Yep.  Or at least, any discussion of changing the default XLOG block
size would be a completely separate from the issues raised here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Claudio Freire
Date:
Subject: Re: increasing the default WAL segment size
Next
From: Alvaro Herrera
Date:
Subject: Re: WAL consistency check facility