Re: [HACKERS] increasing the default WAL segment size - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] increasing the default WAL segment size
Date
Msg-id CA+TgmoYJJei4b_rifB2uMLEcfs5W6UPWGVE-gBsRsUwMSiqtiw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] increasing the default WAL segment size  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: [HACKERS] increasing the default WAL segment size  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
On Tue, Jan 3, 2017 at 8:59 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 3 January 2017 at 13:45, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Tue, Jan 3, 2017 at 6:41 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> On 2 January 2017 at 21:23, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>>>
>>>> It's not clear from the thread that there is consensus that this feature is desired. In particular, the
performanceaspects of changing segment size from a C constant to a variable are in question. Someone with access to
largehardware should test that. Andres[1] and Robert[2] did suggest that the option could be changed to a bitshift,
whichIMHO would also solve some sanity-checking issues. 
>>>
>>> Overall, Robert has made a good case. The only discussion now is about
>>> the knock-on effects it causes.
>>>
>>> One concern that has only barely been discussed is the effect of
>>> zero-ing new WAL files. That is a linear effect and will adversely
>>> effect performance as WAL segment size increases.
>>>
>>
>> Sorry, but I am not able to understand why this is a problem?  The
>> bigger the size of WAL segment, lesser the number of files.  So IIUC,
>> then it can only impact if zero-ing two 16MB files is cheaper than
>> zero-ing one 32MB file.  Is that your theory or you have something
>> else in mind?
>
> The issue I see is that at present no backend needs to do more than
> 16MB of zeroing at one time, so the impact on response time is
> reduced. If we start doing zeroing in larger chunks than the impact on
> response times will increase. So instead of regular blips we have one
> large blip, less often. I think the latter will be worse, but welcome
> measurements that show that performance is smooth and regular with
> large files sizes.

Yeah.  I don't think there's any way to get around the fact that there
will be bigger latency spikes in some cases with larger WAL files.  I
think the question is whether they'll be common enough or serious
enough to worry about.  For example, in a quick test on my laptop,
zero-filling a 16 megabyte file using "dd if=/dev/zero of=x bs=8k
count=2048" takes about 11 milliseconds, and zero-filling a 64
megabyte file with a count of 8192 increases the time to almost 50
milliseconds.  That's something, but I wouldn't rate it as concerning.
There are a lot of things that can cause latency changes multiple
orders of magnitude larger than that, so worrying about that one in
particular would seem to me to be fairly pointless.  However, that's
also a measurement on an unloaded system with an SSD, and the impact
may be a lot more on a big system where with lots of concurrent
activity, and if the process that does the write also has to do an
fsync, that will increase the cost considerably, too.

But the flip side is that it's wrong to imagine that there's no harm
in leaving the situation as it is.  Even my MacBook Pro can crank out
about 2.7 WAL segments/second on "pgbench -c 16 -j 16 -T 60".  I think
a decent server with a few more CPU cores than my laptop has could do
4-5 times that.  So we shouldn't imagine that the costs of spewing out
a bajillion segment files are being paid only at the very high end.
Even somebody running PostgreSQL on a low-end virtual machine might
find it difficult to write an archive_command that can keep up if the
system is under continuous load.  Of course, as Stephen pointed out,
there are toolkits that can do it and you should probably be using one
of those anyway for other reasons, but nevertheless spitting out
almost 3 WAL segments per second even on a laptop gives a whole new
meaning to the term "continuous archiving".

Another point to consider is that a bigger WAL segment size can
actually *improve* latency because every segment switch triggers an
immediate fsync, and every backend in the system ends up waiting for
it to finish.  We should probably eventually try to push those flushes
into the background, and the zeroing work as well.  My impression
(possibly incorrect?) is that we expect to settle into a routine where
zeroing new segments is relatively uncommon because we reuse old
segment files, but the forced end-of-segment flushes never go away.
So it's possible we might actually come out ahead on latency with this
change, at least sometimes.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] [PATCH] Reload SSL certificates on SIGHUP