Thread: increasing the default WAL segment size

increasing the default WAL segment size

From
Robert Haas
Date:
Hi,

I'd like to propose that we increase the default WAL segment size,
which is currently 16MB.  It was first set to that value in commit
47937403676d913c0e740eec6b85113865c6c8ab in October of 1999; prior to
that, it was 64MB.  Between 1999 and now, there have been three
significant changes that make me think it might be time to rethink
this value:

1. Transaction rates are vastly higher these days.  In 1999, I think
we were still limited to ~2^32 transactions during the entire lifetime
of the server; transaction ID wraparound hadn't been invented yet.[1]
Today, some installations do that many write transactions in under a
week.  The practical consequence of this is that WAL files fill up in
extremely short periods of time. Some users generate multiple
terabytes of WAL per day, which means they are generating - and very
likely archiving - WAL files a rate of greater than 1 per second!
That poses multiple problems. For example, if your archive command
happens to involve ssh, you might run into trouble because of this
sort of thing:

[rhaas pgsql]$ /usr/bin/time ssh hydra true       1.57 real         0.00 user         0.00 sys

Also, your operating system's implementation of directories and the
commands to work with them (like ls) don't necessarily scale well to
tens or hundreds of thousands of archived files.

Furthermore, there is an enforced, synchronous fsync at the end of
every segment, which actually does hurt performance on write-heavy
workloads.[2] Of course, if that were the only reason to consider
increasing the segment size, it would probably make more sense to just
try to push that extra fsync into the background, but that's not
really the case.  From what I hear, the gigantic number of files is a
bigger pain point.

2. Disks are a bit larger these days.  In the worst case, we waste
just under twice as much space as whatever the segment size is: you
might need 1 byte from the oldest segment you're keeping and 1 byte
from the newest segment that you are keeping, but not the remaining
contents of either file.  In 1999, trying to limit disk wastage to
<32MB probably seemed reasonable, but today that's very little disk
space.  I think at that time typical hard drive sizes were around 10
GB, whereas today they are around 1 TB.[3] I'm not sure whether the
size of the sorts of high-performance storage that is likely to be
used for pg_xlog has grown as fast as hard drives generally, but even
so it seems pretty clear to me that trying to limit disk wastage to
32MB is excessively conservative on modern hardware.

3. archive_timeout is no longer a frequently used option.  Obviously,
if you are frequently archiving partial segments, you don't want the
segment size to be too large, because if it is, each forced segment
switch potentially wastes a large amount of space (and bandwidth).
But given streaming replication and pg_receivexlog, the use case for
archiving partial segments is, at least according to my understanding,
a lot narrower than it used to be.  So, I think we don't have to worry
as much about keeping forced segment switches cheap as we did during
the 8.x series.

Considering those three factors, I think we should consider pushing
the default value up somewhat higher for v10.  Reverting to the 64MB
size that we had prior to 47937403676d913c0e740eec6b85113865c6c8ab
sounds pretty reasonable.  Users with really high transaction rates
might even prefer a higher value (e.g. 256MB, 1GB) but that's hardly
practical for small installs given our default of max_wal_size = 1GB.
Possibly it would make sense for this to be configurable at initdb
time instead of requiring a recompile; we probably don't save any
significant number of cycles by compiling this into the server.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] I believe at that time we consumed an XID even for a read-only
transaction, too; today, we can do 2^32 read transactions in a few
hours.
[2] Amit did some benchmarking on this, I believe, but I don't have
the numbers handy.
[3] https://commons.wikimedia.org/wiki/File:Hard_drive_capacity_over_time.png



Re: increasing the default WAL segment size

From
Claudio Freire
Date:
On Wed, Aug 24, 2016 at 10:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> 1. Transaction rates are vastly higher these days.  In 1999, I think
> we were still limited to ~2^32 transactions during the entire lifetime
> of the server; transaction ID wraparound hadn't been invented yet.[1]
> Today, some installations do that many write transactions in under a
> week.  The practical consequence of this is that WAL files fill up in
> extremely short periods of time. Some users generate multiple
> terabytes of WAL per day, which means they are generating - and very
> likely archiving - WAL files a rate of greater than 1 per second!
> That poses multiple problems. For example, if your archive command
> happens to involve ssh, you might run into trouble because of this
> sort of thing:
>
> [rhaas pgsql]$ /usr/bin/time ssh hydra true
>         1.57 real         0.00 user         0.00 sys
...
> Considering those three factors, I think we should consider pushing
> the default value up somewhat higher for v10.  Reverting to the 64MB
> size that we had prior to 47937403676d913c0e740eec6b85113865c6c8ab
> sounds pretty reasonable.  Users with really high transaction rates
> might even prefer a higher value (e.g. 256MB, 1GB) but that's hardly
> practical for small installs given our default of max_wal_size = 1GB.
> Possibly it would make sense for this to be configurable at initdb
> time instead of requiring a recompile; we probably don't save any
> significant number of cycles by compiling this into the server.

FWIW, +1

We're already hurt by the small segments due to a similar phenomenon
as the ssh case: TCP slow start. Designing the archive/recovery
command to work around TCP slow start is quite complex, and bigger
segments would just be a better thing.

Not to mention that bigger segments compress better.



Re: increasing the default WAL segment size

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I'd like to propose that we increase the default WAL segment size,
> which is currently 16MB.

That seems like a reasonable thing to consider ...

> Possibly it would make sense for this to be configurable at initdb
> time instead of requiring a recompile;

... but I think this is just folly.  You'd have to do major amounts
of work to keep, eg, slave servers on the same page as the master
about what the segment size is.  Better to keep treating it like
BLCKSZ, as a fixed parameter of a build.  (There's a reason why we
keep this number in pg_control.)
        regards, tom lane



Re: increasing the default WAL segment size

From
"Tsunakawa, Takayuki"
Date:
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas
> Considering those three factors, I think we should consider pushing the
> default value up somewhat higher for v10.  Reverting to the 64MB size that
> we had prior to 47937403676d913c0e740eec6b85113865c6c8ab
> sounds pretty reasonable.

+1
The other downside is that the response time of transactions may degrade when they have to wait for a new WAL segment
tobe created.  Tha might pop up as occasional slow or higher maximum response time, which is a mystery to users.  Maybe
it'stime to use posix_fallocate() to create WAL segments.
 


> Possibly it would make sense for this to be configurable at initdb time
> instead of requiring a recompile; we probably don't save any significant
> number of cycles by compiling this into the server.

+1

> 3. archive_timeout is no longer a frequently used option.  Obviously, if
> you are frequently archiving partial segments, you don't want the segment
> size to be too large, because if it is, each forced segment switch
> potentially wastes a large amount of space (and bandwidth).
> But given streaming replication and pg_receivexlog, the use case for
> archiving partial segments is, at least according to my understanding, a
> lot narrower than it used to be.  So, I think we don't have to worry as
> much about keeping forced segment switches cheap as we did during the 8.x
> series.

I'm not sure about this.  I know (many or not) users use continuous archiving with archive_command and archive_timeout
forbackups, and don't want to use streaming replication, because the system is not worth the cost and trouble of HA.  I
heardfrom a few users that they were surprised when they knew that PostgreSQL generates WAL even when no update
transactionis happening.  Is this still true?
 

Regards
Takayuki Tsunakawa


Re: increasing the default WAL segment size

From
Andres Freund
Date:
On 2016-08-24 22:33:49 -0400, Tom Lane wrote:
> > Possibly it would make sense for this to be configurable at initdb
> > time instead of requiring a recompile;
> 
> ... but I think this is just folly.  You'd have to do major amounts
> of work to keep, eg, slave servers on the same page as the master
> about what the segment size is.

Don't think it'd actually be all that complicated, we already verify
the compatibility of some things.  But I'm doubtful it's worth it, and
I'm also rather doubtful that it's actually without overhead.

Andres



Re: increasing the default WAL segment size

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2016-08-24 22:33:49 -0400, Tom Lane wrote:
>> ... but I think this is just folly.  You'd have to do major amounts
>> of work to keep, eg, slave servers on the same page as the master
>> about what the segment size is.

> Don't think it'd actually be all that complicated, we already verify
> the compatibility of some things.  But I'm doubtful it's worth it, and
> I'm also rather doubtful that it's actually without overhead.

My point is basically that it'll introduce failure modes that we don't
currently concern ourselves with.  Yes, you can do configure
--with-wal-segsize, but it's on your own head whether the resulting build
will interoperate with anything else --- and I'm quite sure nobody tests,
eg, walsender or walreceiver to see if they fail sanely in such cases.
I don't think we'd get to take such a laissez-faire position with respect
to an initdb option.
        regards, tom lane



Re: increasing the default WAL segment size

From
Robert Haas
Date:
On Wed, Aug 24, 2016 at 10:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> ... but I think this is just folly.  You'd have to do major amounts
> of work to keep, eg, slave servers on the same page as the master
> about what the segment size is.

I said an initdb-time parameter, meaning not capable of being changed
within the lifetime of the cluster.  So I don't see how the slave
servers would get out of sync?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: increasing the default WAL segment size

From
Robert Haas
Date:
On Wed, Aug 24, 2016 at 10:54 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-08-24 22:33:49 -0400, Tom Lane wrote:
>> > Possibly it would make sense for this to be configurable at initdb
>> > time instead of requiring a recompile;
>>
>> ... but I think this is just folly.  You'd have to do major amounts
>> of work to keep, eg, slave servers on the same page as the master
>> about what the segment size is.
>
> Don't think it'd actually be all that complicated, we already verify
> the compatibility of some things.  But I'm doubtful it's worth it, and
> I'm also rather doubtful that it's actually without overhead.

Really?  Where do you think the overhead would come from?  What sort
of test would you run to try to detect it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: increasing the default WAL segment size

From
Robert Haas
Date:
On Wed, Aug 24, 2016 at 11:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@anarazel.de> writes:
>> On 2016-08-24 22:33:49 -0400, Tom Lane wrote:
>>> ... but I think this is just folly.  You'd have to do major amounts
>>> of work to keep, eg, slave servers on the same page as the master
>>> about what the segment size is.
>
>> Don't think it'd actually be all that complicated, we already verify
>> the compatibility of some things.  But I'm doubtful it's worth it, and
>> I'm also rather doubtful that it's actually without overhead.
>
> My point is basically that it'll introduce failure modes that we don't
> currently concern ourselves with.  Yes, you can do configure
> --with-wal-segsize, but it's on your own head whether the resulting build
> will interoperate with anything else --- and I'm quite sure nobody tests,
> eg, walsender or walreceiver to see if they fail sanely in such cases.
> I don't think we'd get to take such a laissez-faire position with respect
> to an initdb option.

I am really confused by this.  If you connect a slave to a master
other than the one that you cloned to create the salve, of course
that's going to fail.  But if the slave is cloned from the master,
then the segment size is going to match.  It seems like the only thing
we need to do to make this work is make sure to get the segment size
from the control file rather than anywhere else, which doesn't seem
very difficult.  What am I missing?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: increasing the default WAL segment size

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Aug 24, 2016 at 10:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ... but I think this is just folly.  You'd have to do major amounts
>> of work to keep, eg, slave servers on the same page as the master
>> about what the segment size is.

> I said an initdb-time parameter, meaning not capable of being changed
> within the lifetime of the cluster.  So I don't see how the slave
> servers would get out of sync?

The point is that that now becomes something to worry about.  I do not
think I have to exhibit a live bug within five minutes' thought before
saying that it's a risk area.  It's something that we simply have not
worried about before, and IME that generally means there's some squishy
things there.
        regards, tom lane



Re: increasing the default WAL segment size

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> What am I missing?

Maybe nothing.  But I'll point out that of the things that can currently
be configured at initdb time, such as LC_COLLATE, there is not one single
one that matters to walsender/walreceiver.  If you think there is zero
risk involved in introducing a parameter that will matter at that level,
you have a different concept of risk than I do.

If you'd presented some positive reason why we ought to be taking some
risk here, I'd be on board.  But you haven't really.  The current default
value for this parameter is nearly old enough to vote; how is it that
we suddenly need to make it easily configurable?  Let's just change
the value and be happy.
        regards, tom lane



Re: increasing the default WAL segment size

From
Andres Freund
Date:
On 2016-08-24 23:26:51 -0400, Robert Haas wrote:
> On Wed, Aug 24, 2016 at 10:54 PM, Andres Freund <andres@anarazel.de> wrote:
> > and I'm also rather doubtful that it's actually without overhead.
> 
> Really?  Where do you think the overhead would come from?

ATM we do a math involving XLOG_BLCKSZ in a bunch of places (including
doing a lot of %). Some of that happens with exclusive lwlocks held, and
some even with a spinlock held IIRC. Making that variable won't be
free. Whether it's actually measurabel - hard to say. I do remember
Heikki fighting hard to simplify some parts of the critical code during
xlog scalability stuff, and that that even involved moving minor amounts
of math out of critical sections.

> What sort of test would you run to try to detect it?

Xlog scalability tests (parallel copy, parallel inserts...), and
decoding speed (pg_xlogdump --stats?)



Re: increasing the default WAL segment size

From
Robert Haas
Date:
On Wed, Aug 24, 2016 at 11:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> What am I missing?
>
> Maybe nothing.  But I'll point out that of the things that can currently
> be configured at initdb time, such as LC_COLLATE, there is not one single
> one that matters to walsender/walreceiver.  If you think there is zero
> risk involved in introducing a parameter that will matter at that level,
> you have a different concept of risk than I do.
>
> If you'd presented some positive reason why we ought to be taking some
> risk here, I'd be on board.  But you haven't really.  The current default
> value for this parameter is nearly old enough to vote; how is it that
> we suddenly need to make it easily configurable?  Let's just change
> the value and be happy.

I certainly think that's a good first cut.  As I said before, I think
that increasing the value from 16MB to 64MB won't really hurt people
with mostly-default configurations.  max_wal_size=1GB currently means
64 16-MB segments; if it starts meaning 16 64-MB segments, I don't
think that will have much impact on on people one way or the other.
Meanwhile, we'll significantly help people who are currently
generating painfully large but not totally insane numbers of WAL
segments.  Someone who is currently generating 32,768 WAL segments per
day - about one every 2.6 seconds - will have a significantly easier
time if they start generating 8,192 WAL segments per day - about one
every 10.5 seconds - instead.  It's just much easier for a reasonably
simple archive command to keep up, "ls" doesn't have as many directory
entries to sort, etc.

However, for people who have really high velocity systems - say
300,000 WAL segments per day - a fourfold increase in the segment size
only gets them down to 75,000 WAL segments per day, which is still
pretty nuts.  High tens of thousands of segments per day is, surely,
easier to manage than low hundreds of thousands, but it still puts
really tight requirements on how fast your archive_command has to run.
On that kind of system, you really want a segment size of maybe 1GB.
In this example that gets you down to ~4700 WAL files per day, or
about one every 18 seconds.  But 1GB is clearly too large to be the
default.

I think we're going to run into this issue more and more as people
start running PostgreSQL on larger databases.  In current releases,
the cost of wraparound autovacuums can easily be the limiting factor
here: the I/O cost is proportional to the XID burn rate multiplied by
the entire size of the database.  So mostly read-only databases or
databases that only take batch loads can be fine even if they are
really big, but it's hard to scale databases that do lots of
transaction processing beyond a certain size because you just end up
running continuous wraparound vacuums and eventually you can't even do
that fast enough.  The freeze map changes in 9.6 should help with this
problem, though, at least for databases that have hot spots rather
than uniform access, which is of course very common. I think the
result of that is likely to be that people try to scale up PostgreSQL
to larger databases than ever before.  New techniques for indexing
large amounts of data (like BRIN) and for querying it (like parallel
query, especially once we support having the driving scan be a bitmap
heap scan) are going to encourage people in that direction, too.
You're asking why we suddenly need to make this configurable as if it
were a surprising need, but I think it would be more surprising if
scaling up didn't create some new needs.  I can't think of any reason
why a 100TB database and a 100MB database should both want to use the
same WAL segment size, and I think we want to support both of those
things.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: increasing the default WAL segment size

From
Robert Haas
Date:
On Wed, Aug 24, 2016 at 11:52 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-08-24 23:26:51 -0400, Robert Haas wrote:
>> On Wed, Aug 24, 2016 at 10:54 PM, Andres Freund <andres@anarazel.de> wrote:
>> > and I'm also rather doubtful that it's actually without overhead.
>>
>> Really?  Where do you think the overhead would come from?
>
> ATM we do a math involving XLOG_BLCKSZ in a bunch of places (including
> doing a lot of %). Some of that happens with exclusive lwlocks held, and
> some even with a spinlock held IIRC. Making that variable won't be
> free. Whether it's actually measurabel - hard to say. I do remember
> Heikki fighting hard to simplify some parts of the critical code during
> xlog scalability stuff, and that that even involved moving minor amounts
> of math out of critical sections.

OK, that's helpful context.

>> What sort of test would you run to try to detect it?
>
> Xlog scalability tests (parallel copy, parallel inserts...), and
> decoding speed (pg_xlogdump --stats?)

Thanks; that's helpful, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: increasing the default WAL segment size

From
Andres Freund
Date:
On 2016-08-25 00:28:58 -0400, Robert Haas wrote:
> On Wed, Aug 24, 2016 at 11:52 PM, Andres Freund <andres@anarazel.de> wrote:
> > On 2016-08-24 23:26:51 -0400, Robert Haas wrote:
> >> On Wed, Aug 24, 2016 at 10:54 PM, Andres Freund <andres@anarazel.de> wrote:
> >> > and I'm also rather doubtful that it's actually without overhead.
> >>
> >> Really?  Where do you think the overhead would come from?
> >
> > ATM we do a math involving XLOG_BLCKSZ in a bunch of places (including
> > doing a lot of %). Some of that happens with exclusive lwlocks held, and
> > some even with a spinlock held IIRC. Making that variable won't be
> > free. Whether it's actually measurabel - hard to say. I do remember
> > Heikki fighting hard to simplify some parts of the critical code during
> > xlog scalability stuff, and that that even involved moving minor amounts
> > of math out of critical sections.
> 
> OK, that's helpful context.
> 
> >> What sort of test would you run to try to detect it?
> >
> > Xlog scalability tests (parallel copy, parallel inserts...), and
> > decoding speed (pg_xlogdump --stats?)
> 
> Thanks; that's helpful, too.

FWIW, I'm also doubtful that investing time into making this initdb
configurable is a good use of time: The number of users that'll adjust
initdb time parameters is going to be fairly small.



Re: increasing the default WAL segment size

From
Robert Haas
Date:
On Thu, Aug 25, 2016 at 12:35 AM, Andres Freund <andres@anarazel.de> wrote:
> FWIW, I'm also doubtful that investing time into making this initdb
> configurable is a good use of time: The number of users that'll adjust
> initdb time parameters is going to be fairly small.

I have to admit that I was skeptical about the idea of doing anything
about this at all the first few times it came up.  16MB ought to be
good enough for anyone!  However, the time between beatings has now
gotten short enough that the bruises don't have time to heal before
the next beating arrives from a completely different customer.  I try
not to hold my views so firmly as to be impervious to contrary
evidence.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: increasing the default WAL segment size

From
Wolfgang Wilhelm
Date:
<div style="color:#000; background-color:#fff; font-family:HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida
Grande,sans-serif;font-size:12px"><div id="yui_3_16_0_1_1472100272670_6301">Hello hackers,</div><div
id="yui_3_16_0_1_1472100272670_6302"><br/></div><div dir="ltr" id="yui_3_16_0_1_1472100272670_6303">I'm no PG hacker,
somaybe I'm completely wrong, so sorry if I have wasted your time. I try to make the best out of Tom Lanes comment.<br
/></div><divdir="ltr" id="yui_3_16_0_1_1472100272670_6304"><br /></div><div dir="ltr"
id="yui_3_16_0_1_1472100272670_7010">Whatwould happen if there's a database on a server with initdb (or whatever)
parameter-with-wal-size=64MB and later someone decides to make it the master in a replicated system and has a slave
withoutthat parameter? Would the slave work with the "different" wal size of the master? How could be guaranteed that
insuch a scenario the replication either works correctly or failes with a meaningful error message?<br /></div><div
id="yui_3_16_0_1_1472100272670_6668"><br/></div><div id="yui_3_16_0_1_1472100272670_6669">But in general I thing a more
flexibleWAL size is a good idea. <br /></div><div id="yui_3_16_0_1_1472100272670_6840">To answer Andres: You have found
oneof the (few?) users to adjust initdb parameters.<br /></div><div id="yui_3_16_0_1_1472100272670_6670"><br
/></div><divid="yui_3_16_0_1_1472100272670_6850">Regards</div><div id="yui_3_16_0_1_1472100272670_6888"><br
/></div><divid="yui_3_16_0_1_1472100272670_6672"><br /></div><div class="qtdSeparateBR"><br /><br /></div><div
class="yahoo_quoted"style="display: block;"><div style="font-family: HelveticaNeue, Helvetica Neue, Helvetica, Arial,
LucidaGrande, sans-serif; font-size: 12px;"><div style="font-family: HelveticaNeue, Helvetica Neue, Helvetica, Arial,
LucidaGrande, sans-serif; font-size: 16px;"><div dir="ltr"><font face="Arial" size="2"> Robert Haas
<robertmhaas@gmail.com>schrieb am 6:43 Donnerstag, 25.August 2016:<br /></font></div><br /><br /><div
class="y_msg_container">OnThu, Aug 25, 2016 at 12:35 AM, Andres Freund <<a href="mailto:andres@anarazel.de"
shape="rect"ymailto="mailto:andres@anarazel.de">andres@anarazel.de</a>> wrote:<br clear="none" />> FWIW, I'm also
doubtfulthat investing time into making this initdb<br clear="none" />> configurable is a good use of time: The
numberof users that'll adjust<br clear="none" />> initdb time parameters is going to be fairly small.<br
clear="none"/><br clear="none" />I have to admit that I was skeptical about the idea of doing anything<br clear="none"
/>aboutthis at all the first few times it came up.  16MB ought to be<br clear="none" />good enough for anyone! 
However,the time between beatings has now<br clear="none" />gotten short enough that the bruises don't have time to
healbefore<br clear="none" />the next beating arrives from a completely different customer.  I try<br clear="none"
/>notto hold my views so firmly as to be impervious to contrary<br clear="none" />evidence.<br clear="none" /><br
clear="none"/>-- <br clear="none" />Robert Haas<br clear="none" />EnterpriseDB: <a href="http://www.enterprisedb.com/"
shape="rect"target="_blank">http://www.enterprisedb.com</a><br clear="none" />The Enterprise PostgreSQL Company<div
class="yqt5140683510"id="yqtfd93095"><br clear="none" /><br clear="none" /><br clear="none" />-- <br clear="none"
/>Sentvia pgsql-hackers mailing list (<a href="mailto:pgsql-hackers@postgresql.org" shape="rect"
ymailto="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<brclear="none" />To make changes to
yoursubscription:<br clear="none" /><a href="http://www.postgresql.org/mailpref/pgsql-hackers" shape="rect"
target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><brclear="none" /></div><br /><br
/></div></div></div></div></div>

Re: increasing the default WAL segment size

From
Bruce Momjian
Date:
On Wed, Aug 24, 2016 at 10:40:06PM -0300, Claudio Freire wrote:
> > time instead of requiring a recompile; we probably don't save any
> > significant number of cycles by compiling this into the server.
> 
> FWIW, +1
> 
> We're already hurt by the small segments due to a similar phenomenon
> as the ssh case: TCP slow start. Designing the archive/recovery
> command to work around TCP slow start is quite complex, and bigger
> segments would just be a better thing.
> 
> Not to mention that bigger segments compress better.

This would be good time to rename pg_xlog and pg_clog directories too.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: increasing the default WAL segment size

From
Robert Haas
Date:
On Thu, Aug 25, 2016 at 1:04 AM, Wolfgang Wilhelm
<wolfgang20121964@yahoo.de> wrote:
> What would happen if there's a database on a server with initdb (or
> whatever) parameter -with-wal-size=64MB and later someone decides to make it
> the master in a replicated system and has a slave without that parameter?
> Would the slave work with the "different" wal size of the master? How could
> be guaranteed that in such a scenario the replication either works correctly
> or failes with a meaningful error message?

You make reference to an "initdb (or whatever) parameter" but actually
there is a big difference between the "initdb" case and the "whatever"
case.  If the parameter is fixed at initdb time, then the master and
the slave will definitely agree: the slave had to be created by
copying the master, and that means the control file that contains the
size was also copied.  Neither can have been changed afterwards.
That's what an initdb-time parameter means.  On the other hand, if the
parameter is, say, a GUC, then you would have exactly the kinds of
problems that you are talking about here.  I am not keen to solve any
of those problems, which is why I am not proposing to go any further
than an initdb-time parameter.

> But in general I thing a more flexible WAL size is a good idea.
> To answer Andres: You have found one of the (few?) users to adjust initdb
> parameters.

Good to know, thanks.

In further defense of the idea that making this more configurable
isn't nuts, it's worth noting that the history here is:

* When Vadim originally added XLogSegSize in
30659d43eb73272e20f2eb1d785a07ba3b553ed8 (September 1999), it was a
constant.
* In c3c09be34b6b0d7892f1087a23fc6eb93f3c4f04 (February 2004), this
became configurable via pg_config_manual.h.
* In cf9f6c8d8e9df28f3fbe1850ca7f042b2c01252e (May 2008), Tom made
this configurable via configure.

So there's a well-established history of making this gradually easier
for users to change.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: increasing the default WAL segment size

From
Magnus Hagander
Date:


On Thu, Aug 25, 2016 at 5:32 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Aug 24, 2016 at 10:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ... but I think this is just folly.  You'd have to do major amounts
>> of work to keep, eg, slave servers on the same page as the master
>> about what the segment size is.

> I said an initdb-time parameter, meaning not capable of being changed
> within the lifetime of the cluster.  So I don't see how the slave
> servers would get out of sync?

The point is that that now becomes something to worry about.  I do not
think I have to exhibit a live bug within five minutes' thought before
saying that it's a risk area.  It's something that we simply have not
worried about before, and IME that generally means there's some squishy
things there.

If we ignore the possible performance implications (which we shouldn't, of course, but for the sake of argument), I think having it as a configurable parameter in initdb would make it *less* of something to worry about.

Because it comes with the cluster during replication. I think it's more likely that you accidentally end up with two instances compiled with different values than that you get an issue from this.

That said, I think it also has to be a *very* bad painpoint for somebody to care about changing it if it requires recompilation. The vast majority of users run the packaged versions, and they don't want to run anything else. So you will have whatever the RPMs or the DEBs or installers pick for you. Anything that is a ./configure-time option,is something we should expect almost nobody to change.

Changing the default will of course help/hurt those as well. But if we change the default to something high and say "hey those of you who just run it on a smaller system should recompile with a different --configure", we are being *very* user-unfriendly. Or the other way around.

That doesn't mean we shouldn't change the default. We just need to be a lot more careful about what we change it to if it's ./configure to reset it.
 
--

Re: increasing the default WAL segment size

From
Stephen Frost
Date:
Robert,

* Robert Haas (robertmhaas@gmail.com) wrote:
> Meanwhile, we'll significantly help people who are currently
> generating painfully large but not totally insane numbers of WAL
> segments.  Someone who is currently generating 32,768 WAL segments per
> day - about one every 2.6 seconds - will have a significantly easier
> time if they start generating 8,192 WAL segments per day - about one
> every 10.5 seconds - instead.  It's just much easier for a reasonably
> simple archive command to keep up, "ls" doesn't have as many directory
> entries to sort, etc.

I'm generally on-board with increasing the WAL segment size, and I can
see the point that we might want to make it more easily configurable as
it's valuable to set it differently on a small database vs. a large
database, but I take exception with the notion that a "simple archive
command" is ever appropriate.  Heikki's excellent talk at PGCon '15
(iirc) goes over why our archive command example is about as terrible as
you can get and that's primairly because it's just a simple 'cp'.

archive_command needs to be doing things like fsync'ing the WAL file
after it's been copied away, probably fsync'ing the directory the WAL
file has been copied into, returning the correct exit code to PG, etc.

Thankfully, there are backup/WAL archive utilities which do this
correctly and are even built to handle a large rate of WAL files for
high transaction systems (including keeping open a long-running ssh/TCP
to address the startup costs of both).  Switching to 64MB would still be
nice to simply reduce the number of files you have to deal with, and I'm
all for it for that reason, but the ssh/TCP startup cost reasons aren't
good ones for the switch as people shouldn't be using a "simple" command
anyway and the good tools for WAL archiving have already addressed those
issues.

Thanks!

Stephen

Re: increasing the default WAL segment size

From
Robert Haas
Date:
On Thu, Aug 25, 2016 at 9:48 AM, Stephen Frost <sfrost@snowman.net> wrote:
> * Robert Haas (robertmhaas@gmail.com) wrote:
>> Meanwhile, we'll significantly help people who are currently
>> generating painfully large but not totally insane numbers of WAL
>> segments.  Someone who is currently generating 32,768 WAL segments per
>> day - about one every 2.6 seconds - will have a significantly easier
>> time if they start generating 8,192 WAL segments per day - about one
>> every 10.5 seconds - instead.  It's just much easier for a reasonably
>> simple archive command to keep up, "ls" doesn't have as many directory
>> entries to sort, etc.
>
> I'm generally on-board with increasing the WAL segment size, and I can
> see the point that we might want to make it more easily configurable as
> it's valuable to set it differently on a small database vs. a large
> database, but I take exception with the notion that a "simple archive
> command" is ever appropriate.

My point wasn't really that archive_command should actually be simple.
My point was that if it's being run multiple times per second, there
are additional challenges that wouldn't arise if it were being run
only every 5-10 seconds.

I guess I should have said "simpler" rather than "reasonably simple",
because there's nothing simple about setting archive_command properly.
I mean, it could only actually be simple if somebody had a good a
backup tool that provided an archive_command that you could just drop
in place.  But I'm sure if somebody had such a tool, they'd take every
opportunity to bring it up, so we doubtless would have heard about it
by now.  Right?  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: increasing the default WAL segment size

From
Stephen Frost
Date:
Robert,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Thu, Aug 25, 2016 at 9:48 AM, Stephen Frost <sfrost@snowman.net> wrote:
> > * Robert Haas (robertmhaas@gmail.com) wrote:
> >> Meanwhile, we'll significantly help people who are currently
> >> generating painfully large but not totally insane numbers of WAL
> >> segments.  Someone who is currently generating 32,768 WAL segments per
> >> day - about one every 2.6 seconds - will have a significantly easier
> >> time if they start generating 8,192 WAL segments per day - about one
> >> every 10.5 seconds - instead.  It's just much easier for a reasonably
> >> simple archive command to keep up, "ls" doesn't have as many directory
> >> entries to sort, etc.
> >
> > I'm generally on-board with increasing the WAL segment size, and I can
> > see the point that we might want to make it more easily configurable as
> > it's valuable to set it differently on a small database vs. a large
> > database, but I take exception with the notion that a "simple archive
> > command" is ever appropriate.
>
> My point wasn't really that archive_command should actually be simple.
> My point was that if it's being run multiple times per second, there
> are additional challenges that wouldn't arise if it were being run
> only every 5-10 seconds.

My point was that the concerns about TCP/ssh startup costs, which was
part of your point #1 in your initial justification for the change,
have been addressed through tooling.

> I guess I should have said "simpler" rather than "reasonably simple",
> because there's nothing simple about setting archive_command properly.

Agreed.

> I mean, it could only actually be simple if somebody had a good a
> backup tool that provided an archive_command that you could just drop
> in place.  But I'm sure if somebody had such a tool, they'd take every
> opportunity to bring it up, so we doubtless would have heard about it
> by now.  Right?  :-)

Thankfully there's actually multiple good open source and freely
available tools that address this issue (albeit, through different
mechanisms).

Thanks!

Stephen

Re: increasing the default WAL segment size

From
Bruce Momjian
Date:
On Wed, Aug 24, 2016 at 08:52:20PM -0700, Andres Freund wrote:
> On 2016-08-24 23:26:51 -0400, Robert Haas wrote:
> > On Wed, Aug 24, 2016 at 10:54 PM, Andres Freund <andres@anarazel.de> wrote:
> > > and I'm also rather doubtful that it's actually without overhead.
> > 
> > Really?  Where do you think the overhead would come from?
> 
> ATM we do a math involving XLOG_BLCKSZ in a bunch of places (including
> doing a lot of %). Some of that happens with exclusive lwlocks held, and
> some even with a spinlock held IIRC. Making that variable won't be
> free. Whether it's actually measurabel - hard to say. I do remember
> Heikki fighting hard to simplify some parts of the critical code during
> xlog scalability stuff, and that that even involved moving minor amounts
> of math out of critical sections.

I think Robert made a good case that high-volume servers might want a
larger WAL segment size, but as Andres pointed out, there are
performance concerns.  Those might be minimized by requiring the segment
size to be a 2x multiple of 16MB.

Another issue is that many users are coming from database products that
have significant performance hits in switching WAL files so they might
be tempted to set very high segment sizes in inappropriate cases.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: increasing the default WAL segment size

From
Robert Haas
Date:
On Thu, Aug 25, 2016 at 10:34 AM, Stephen Frost <sfrost@snowman.net> wrote:
>> My point wasn't really that archive_command should actually be simple.
>> My point was that if it's being run multiple times per second, there
>> are additional challenges that wouldn't arise if it were being run
>> only every 5-10 seconds.
>
> My point was that the concerns about TCP/ssh startup costs, which was
> part of your point #1 in your initial justification for the change,
> have been addressed through tooling.

It's good to know that some tool sets have addressed that, but I'm
pretty certain that not every tool set has done so, probably not even
all of the ones in common use.  Anyway, I think the requirements we
impose on archive_command today are just crazy.  All other things
being equal, changes that make it easier to write a decent one are
IMHO going in the right direction.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: increasing the default WAL segment size

From
Robert Haas
Date:
On Thu, Aug 25, 2016 at 10:39 AM, Bruce Momjian <bruce@momjian.us> wrote:
> Another issue is that many users are coming from database products that
> have significant performance hits in switching WAL files so they might
> be tempted to set very high segment sizes in inappropriate cases.

Well, we have some hit there, too.  It may be smaller, but it's
certainly not zero.

I'm generally in favor of preventing people from setting ridiculous
values for settings; we shouldn't let somebody set the WAL segment
size to 8kB or something silly like that.  But it's more important to
enable legitimate uses than it is to prohibit inappropriate uses.  If
a particular value of a particular setting may be legitimately useful
to some users, we should allow it, even if some other user might
choose that value under false assumptions.

In short, let's eschew nannyism.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: increasing the default WAL segment size

From
Robert Haas
Date:
On Thu, Aug 25, 2016 at 9:34 AM, Magnus Hagander <magnus@hagander.net> wrote:
> Because it comes with the cluster during replication. I think it's more
> likely that you accidentally end up with two instances compiled with
> different values than that you get an issue from this.

I hadn't thought about it that way, but I think you're right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: increasing the default WAL segment size

From
Simon Riggs
Date:
On 25 August 2016 at 02:31, Robert Haas <robertmhaas@gmail.com> wrote:

> Furthermore, there is an enforced, synchronous fsync at the end of
> every segment, which actually does hurt performance on write-heavy
> workloads.[2] Of course, if that were the only reason to consider
> increasing the segment size, it would probably make more sense to just
> try to push that extra fsync into the background, but that's not
> really the case.  From what I hear, the gigantic number of files is a
> bigger pain point.

I think we should fully describe the problem before finding a solution.

This is too big a change to just tweak a value without discussing the
actual issue.

And if the problem is as described, how can a change of x4 be enough
to make it worth the pain of change? I think you're already admitting
it can't be worth it by discussing initdb configuration.

If we do have the pain of change, should we also consider making WAL
files variable length? What do we gain by having the files all the
same size? ISTM better to have WAL files that vary in length up to 1GB
in size.

(This is all about XLOG_SEG_SIZE; I presume XLOG_BLCKSZ can stay as it
is, right?)

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: increasing the default WAL segment size

From
Stephen Frost
Date:
* Robert Haas (robertmhaas@gmail.com) wrote:
> On Thu, Aug 25, 2016 at 10:34 AM, Stephen Frost <sfrost@snowman.net> wrote:
> >> My point wasn't really that archive_command should actually be simple.
> >> My point was that if it's being run multiple times per second, there
> >> are additional challenges that wouldn't arise if it were being run
> >> only every 5-10 seconds.
> >
> > My point was that the concerns about TCP/ssh startup costs, which was
> > part of your point #1 in your initial justification for the change,
> > have been addressed through tooling.
>
> It's good to know that some tool sets have addressed that, but I'm
> pretty certain that not every tool set has done so, probably not even
> all of the ones in common use.  Anyway, I think the requirements we
> impose on archive_command today are just crazy.  All other things
> being equal, changes that make it easier to write a decent one are
> IMHO going in the right direction.

Agreed, but, unfortunately, this isn't an "all other things being equal"
case, or we wouldn't be having this discussion.  Increasing the WAL
segment size means it'll be longer before archive_command is called
which means there's a larger amount of potential data loss for users who
are using it without any other archiving/replication solution, along
with the other concerns about it possibly resulting in a higher disk
space cost.

I agree that increasing it makes sense and that 64MB is a good number,
but I wouldn't want to go much higher than that.  That doesn't
completely solve the TCP/SSH start-up cost penalty as there will be
environments where that is still expensive even with 64MB WAL segments,
but it will certainly be reduced.

To try to summarize, I don't think we should be trying to solve the
TCP/SSH start-up penalty issue for all users by encouraging them to
increase the WAL segment size, at least not without covering the
trade-offs.  That isn't to say we shouldn't change the default, I agree
that we should, but I believe we should keep it a reasonably
conservative change and if we make it user-configurable then we need
to be sure to document the trade-offs.

Thanks!

Stephen

Re: increasing the default WAL segment size

From
Claudio Freire
Date:
On Thu, Aug 25, 2016 at 12:21 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 25 August 2016 at 02:31, Robert Haas <robertmhaas@gmail.com> wrote:
>
>> Furthermore, there is an enforced, synchronous fsync at the end of
>> every segment, which actually does hurt performance on write-heavy
>> workloads.[2] Of course, if that were the only reason to consider
>> increasing the segment size, it would probably make more sense to just
>> try to push that extra fsync into the background, but that's not
>> really the case.  From what I hear, the gigantic number of files is a
>> bigger pain point.
>
> I think we should fully describe the problem before finding a solution.
>
> This is too big a change to just tweak a value without discussing the
> actual issue.
>
> And if the problem is as described, how can a change of x4 be enough
> to make it worth the pain of change? I think you're already admitting
> it can't be worth it by discussing initdb configuration.
>
> If we do have the pain of change, should we also consider making WAL
> files variable length? What do we gain by having the files all the
> same size? ISTM better to have WAL files that vary in length up to 1GB
> in size.
>
> (This is all about XLOG_SEG_SIZE; I presume XLOG_BLCKSZ can stay as it
> is, right?)

Avoiding variable sizes does avoid some failure modes on the
filesystem side in the face of crashes/power loss.

So making them variable size, while possible, wouldn't be simple at
all (it would involve figuring out the way filesystems behave when
facing crash when a file is changing in size).



Re: increasing the default WAL segment size

From
Robert Haas
Date:
On Thu, Aug 25, 2016 at 11:21 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 25 August 2016 at 02:31, Robert Haas <robertmhaas@gmail.com> wrote:
>> Furthermore, there is an enforced, synchronous fsync at the end of
>> every segment, which actually does hurt performance on write-heavy
>> workloads.[2] Of course, if that were the only reason to consider
>> increasing the segment size, it would probably make more sense to just
>> try to push that extra fsync into the background, but that's not
>> really the case.  From what I hear, the gigantic number of files is a
>> bigger pain point.
>
> I think we should fully describe the problem before finding a solution.

Sure, that's usually a good idea.  I attempted to outline all of the
possible issues of which I am aware in my original email, but of
course you may know of considerations which I overlooked.

> This is too big a change to just tweak a value without discussing the
> actual issue.

Again, I tried to make sure I was discussing the actual issues in my
original email.  In brief: having to run archive_command multiple
times per second imposes very tight latency requirements on it;
directories with hundreds of thousands or millions of files are hard
to manage; enforced synchronous fsyncs at the end of each segment hurt
performance.

> And if the problem is as described, how can a change of x4 be enough
> to make it worth the pain of change? I think you're already admitting
> it can't be worth it by discussing initdb configuration.

I guess it depends on how much pain of change you think there will be.
I would expect a change from 16MB -> 64MB to be fairly painless, but
(1) it might break tools that aren't designed to cope with differing
segment sizes and (2) it will increase disk utilization for people who
have such low velocity systems that they never end up with more than 2
WAL segments, and now those segments are bigger.  If you know of other
impacts or have reason to believe those problems will be serious,
please fill in the details.

Despite the fact that initdb configuration has dominated this thread,
I mentioned it only in the very last sentence of my email and only as
a possibility.  I believe that a 4x change will be good enough for the
majority of people for whom this is currently a pain point.  However,
yes, I do believe that there are some people for whom it won't be
sufficient.  And I believe that as we continue to enhance PostgreSQL
to support higher and higher transaction rates, the number of people
who need an extra-large WAL segment size will increase.  As I see it,
there are three options here:

1. Do nothing.  So far, I don't see anybody arguing for that.

2. Change the default to 64MB and call it good.  This idea seems to
have considerable support.

3. Allow initdb-time configurability but keep the default at 16MB.  I
don't see any support for this.  There is clearly support for
configurability, but I don't see anyone arguing that the current
default is preferable, unless that is what you are arguing.

4. Change the default to 64MB and also allow initdb-time
configurability.  This option also appears to enjoy substantial
support, perhaps more than #2.  Magnus seemed to be arguing that this
is preferable to #2, because then it's easier for people to change the
setting back if someone discovers a case where the higher default is a
problem; Tom, on the other hand, seems to think this is overkill.

Personally, I believe option #4 is for the best.  I believe that the
great majority of users will be better off with 64MB than with 16MB,
but I like the idea of allowing for smaller values (for people with
really low-velocity instances) and larger ones (for people with really
high-velocity instances).

> If we do have the pain of change, should we also consider making WAL
> files variable length? What do we gain by having the files all the
> same size? ISTM better to have WAL files that vary in length up to 1GB
> in size.

This seems like an odd comment because the whole way we address WAL
positions is based on the fact that segments are fixed size, as I
would have thought you would know better than I.  The file that
contains a particular byte of WAL is based on lsn/XLOG_SEG_SIZE and
the position within the file is lsn%XLOG_SEG_SIZE.  Making files
variable-size would vastly complicate this addressing scheme and maybe
hurt performance in the process.  I can't see any compelling reason to
go there.

> (This is all about XLOG_SEG_SIZE; I presume XLOG_BLCKSZ can stay as it
> is, right?)

Yep.  Or at least, any discussion of changing the default XLOG block
size would be a completely separate from the issues raised here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: increasing the default WAL segment size

From
Magnus Hagander
Date:


On Thu, Aug 25, 2016 at 6:59 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Aug 25, 2016 at 11:21 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 25 August 2016 at 02:31, Robert Haas <robertmhaas@gmail.com> wrote:
>> Furthermore, there is an enforced, synchronous fsync at the end of
>> every segment, which actually does hurt performance on write-heavy
>> workloads.[2] Of course, if that were the only reason to consider
>> increasing the segment size, it would probably make more sense to just
>> try to push that extra fsync into the background, but that's not
>> really the case.  From what I hear, the gigantic number of files is a
>> bigger pain point.
>
> I think we should fully describe the problem before finding a solution.

Sure, that's usually a good idea.  I attempted to outline all of the
possible issues of which I am aware in my original email, but of
course you may know of considerations which I overlooked.

> This is too big a change to just tweak a value without discussing the
> actual issue.

Again, I tried to make sure I was discussing the actual issues in my
original email.  In brief: having to run archive_command multiple
times per second imposes very tight latency requirements on it;
directories with hundreds of thousands or millions of files are hard
to manage; enforced synchronous fsyncs at the end of each segment hurt
performance.

> And if the problem is as described, how can a change of x4 be enough
> to make it worth the pain of change? I think you're already admitting
> it can't be worth it by discussing initdb configuration.

I guess it depends on how much pain of change you think there will be.
I would expect a change from 16MB -> 64MB to be fairly painless, but
(1) it might break tools that aren't designed to cope with differing
segment sizes and (2) it will increase disk utilization for people who
have such low velocity systems that they never end up with more than 2
WAL segments, and now those segments are bigger.  If you know of other
impacts or have reason to believe those problems will be serious,
please fill in the details.

Despite the fact that initdb configuration has dominated this thread,
I mentioned it only in the very last sentence of my email and only as
a possibility.  I believe that a 4x change will be good enough for the
majority of people for whom this is currently a pain point.  However,
yes, I do believe that there are some people for whom it won't be
sufficient.  And I believe that as we continue to enhance PostgreSQL
to support higher and higher transaction rates, the number of people
who need an extra-large WAL segment size will increase.  As I see it,
there are three options here:

1. Do nothing.  So far, I don't see anybody arguing for that.

2. Change the default to 64MB and call it good.  This idea seems to
have considerable support.

3. Allow initdb-time configurability but keep the default at 16MB.  I
don't see any support for this.  There is clearly support for
configurability, but I don't see anyone arguing that the current
default is preferable, unless that is what you are arguing.

4. Change the default to 64MB and also allow initdb-time
configurability.  This option also appears to enjoy substantial
support, perhaps more than #2.  Magnus seemed to be arguing that this
is preferable to #2, because then it's easier for people to change the
setting back if someone discovers a case where the higher default is a
problem; Tom, on the other hand, seems to think this is overkill. 

Personally, I believe option #4 is for the best.  I believe that the
great majority of users will be better off with 64MB than with 16MB,
but I like the idea of allowing for smaller values (for people with
really low-velocity instances) and larger ones (for people with really
high-velocity instances).

I was not arguing for #4 over #2, at least not strongly. I think #2 is fine, and I think #4 are fine. #4 allows a way out, but it's not *that* important unless we go *beyond* 64Mb.

I was mainly arguing that we can't claim "it has a configure switch so it's kinda configurable" as a way out. If we want it configurable *at all*, it should be an initdb switch. If we are confident in our defaults, it doesn't have to be.

I agree that #4 is best. I'm not sure it's worth the cost. I'm not worried at all about the risk of master/slave sync thing, per previous statement. But if it does have performance implications, per Andres suggestion, then making it configurable at initdb time probably comes with a cost that's not worth paying.


--

Re: increasing the default WAL segment size

From
Robert Haas
Date:
On Thu, Aug 25, 2016 at 1:05 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> 1. Do nothing.  So far, I don't see anybody arguing for that.
>>
>> 2. Change the default to 64MB and call it good.  This idea seems to
>> have considerable support.
>>
>> 3. Allow initdb-time configurability but keep the default at 16MB.  I
>> don't see any support for this.  There is clearly support for
>> configurability, but I don't see anyone arguing that the current
>> default is preferable, unless that is what you are arguing.
>>
>> 4. Change the default to 64MB and also allow initdb-time
>> configurability.  This option also appears to enjoy substantial
>> support, perhaps more than #2.  Magnus seemed to be arguing that this
>> is preferable to #2, because then it's easier for people to change the
>> setting back if someone discovers a case where the higher default is a
>> problem; Tom, on the other hand, seems to think this is overkill.
>
> I was not arguing for #4 over #2, at least not strongly. I think #2 is fine,
> and I think #4 are fine. #4 allows a way out, but it's not *that* important
> unless we go *beyond* 64Mb.

OK, thanks for clarifying.  I can't see going beyond 64MB by default
when we're shipping max_wal_size=1GB.  In another 20 years when
PB-size thumb drives are commonplace we might reconsider.

> I was mainly arguing that we can't claim "it has a configure switch so it's
> kinda configurable" as a way out. If we want it configurable *at all*, it
> should be an initdb switch. If we are confident in our defaults, it doesn't
> have to be.
>
> I agree that #4 is best. I'm not sure it's worth the cost. I'm not worried
> at all about the risk of master/slave sync thing, per previous statement.
> But if it does have performance implications, per Andres suggestion, then
> making it configurable at initdb time probably comes with a cost that's not
> worth paying.

At this point it's hard to judge, because we don't have any idea what
the cost might be.  I guess if we want to pursue this approach,
somebody will have to code it up and benchmark it.  But what I'm
inclined to do for starters is put together a patch to go from 16MB ->
64MB.  Committing that early this cycle will give us time to
reconsider if that turns out to be painful for reasons we haven't
thought of yet.  And give tool authors time to make adjustments, if
any are needed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: increasing the default WAL segment size

From
Josh Berkus
Date:
On 08/25/2016 01:12 PM, Robert Haas wrote:
>> I agree that #4 is best. I'm not sure it's worth the cost. I'm not worried
>> > at all about the risk of master/slave sync thing, per previous statement.
>> > But if it does have performance implications, per Andres suggestion, then
>> > making it configurable at initdb time probably comes with a cost that's not
>> > worth paying.
> At this point it's hard to judge, because we don't have any idea what
> the cost might be.  I guess if we want to pursue this approach,
> somebody will have to code it up and benchmark it.  But what I'm
> inclined to do for starters is put together a patch to go from 16MB ->
> 64MB.  Committing that early this cycle will give us time to
> reconsider if that turns out to be painful for reasons we haven't
> thought of yet.  And give tool authors time to make adjustments, if
> any are needed.

The one thing I'd be worried about with the increase in size is folks
using PostgreSQL for very small databases.  If your database is only
30MB or so in size, the increase in size of the WAL will be pretty
significant (+144MB for the base 3 WAL segments).  I'm not sure this is
a real problem which users will notice (in today's scales, 144MB ain't
much), but if it turns out to be, it would be nice to have a way to
switch it back *just for them* without recompiling.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)



Re: increasing the default WAL segment size

From
Robert Haas
Date:
On Thu, Aug 25, 2016 at 1:43 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 08/25/2016 01:12 PM, Robert Haas wrote:
>>> I agree that #4 is best. I'm not sure it's worth the cost. I'm not worried
>>> > at all about the risk of master/slave sync thing, per previous statement.
>>> > But if it does have performance implications, per Andres suggestion, then
>>> > making it configurable at initdb time probably comes with a cost that's not
>>> > worth paying.
>> At this point it's hard to judge, because we don't have any idea what
>> the cost might be.  I guess if we want to pursue this approach,
>> somebody will have to code it up and benchmark it.  But what I'm
>> inclined to do for starters is put together a patch to go from 16MB ->
>> 64MB.  Committing that early this cycle will give us time to
>> reconsider if that turns out to be painful for reasons we haven't
>> thought of yet.  And give tool authors time to make adjustments, if
>> any are needed.
>
> The one thing I'd be worried about with the increase in size is folks
> using PostgreSQL for very small databases.  If your database is only
> 30MB or so in size, the increase in size of the WAL will be pretty
> significant (+144MB for the base 3 WAL segments).  I'm not sure this is
> a real problem which users will notice (in today's scales, 144MB ain't
> much), but if it turns out to be, it would be nice to have a way to
> switch it back *just for them* without recompiling.

I think you may be forgetting that "the base 3 WAL segments" is no
longer the default configuration.  checkpoint_segments=3 is history;
we now have max_wal_size=1GB, which is a maximum of 64 WAL segments,
not 3.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: increasing the default WAL segment size

From
Magnus Hagander
Date:
On Thu, Aug 25, 2016 at 7:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Aug 25, 2016 at 1:43 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 08/25/2016 01:12 PM, Robert Haas wrote:
>>> I agree that #4 is best. I'm not sure it's worth the cost. I'm not worried
>>> > at all about the risk of master/slave sync thing, per previous statement.
>>> > But if it does have performance implications, per Andres suggestion, then
>>> > making it configurable at initdb time probably comes with a cost that's not
>>> > worth paying.
>> At this point it's hard to judge, because we don't have any idea what
>> the cost might be.  I guess if we want to pursue this approach,
>> somebody will have to code it up and benchmark it.  But what I'm
>> inclined to do for starters is put together a patch to go from 16MB ->
>> 64MB.  Committing that early this cycle will give us time to
>> reconsider if that turns out to be painful for reasons we haven't
>> thought of yet.  And give tool authors time to make adjustments, if
>> any are needed.
>
> The one thing I'd be worried about with the increase in size is folks
> using PostgreSQL for very small databases.  If your database is only
> 30MB or so in size, the increase in size of the WAL will be pretty
> significant (+144MB for the base 3 WAL segments).  I'm not sure this is
> a real problem which users will notice (in today's scales, 144MB ain't
> much), but if it turns out to be, it would be nice to have a way to
> switch it back *just for them* without recompiling.

I think you may be forgetting that "the base 3 WAL segments" is no
longer the default configuration.  checkpoint_segments=3 is history;
we now have max_wal_size=1GB, which is a maximum of 64 WAL segments,
not 3.


And obviously that'd be 16 files if we increase the wal segment size. So the actual maximum size doesn't change, except you can currently set max_wal_size to something lower than 64Mb. If we change, the minimum value would become 64Mb, I assume. 


--

Re: increasing the default WAL segment size

From
Andres Freund
Date:
On 2016-08-25 13:45:29 -0400, Robert Haas wrote:
> I think you may be forgetting that "the base 3 WAL segments" is no
> longer the default configuration.  checkpoint_segments=3 is history;
> we now have max_wal_size=1GB, which is a maximum of 64 WAL segments,
> not 3.

Well, but min_wal_size still is 48MB. So sure, if you consistently have
a high WAL throughput, it'll be bigger. But otherwise pg_xlog will
shrink again.



Re: increasing the default WAL segment size

From
Robert Haas
Date:
On Thu, Aug 25, 2016 at 1:50 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-08-25 13:45:29 -0400, Robert Haas wrote:
>> I think you may be forgetting that "the base 3 WAL segments" is no
>> longer the default configuration.  checkpoint_segments=3 is history;
>> we now have max_wal_size=1GB, which is a maximum of 64 WAL segments,
>> not 3.
>
> Well, but min_wal_size still is 48MB. So sure, if you consistently have
> a high WAL throughput, it'll be bigger. But otherwise pg_xlog will
> shrink again.

Hmm, yeah.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: increasing the default WAL segment size

From
Alvaro Herrera
Date:
Robert Haas wrote:
> On Thu, Aug 25, 2016 at 1:43 PM, Josh Berkus <josh@agliodbs.com> wrote:

> > The one thing I'd be worried about with the increase in size is folks
> > using PostgreSQL for very small databases.  If your database is only
> > 30MB or so in size, the increase in size of the WAL will be pretty
> > significant (+144MB for the base 3 WAL segments).  I'm not sure this is
> > a real problem which users will notice (in today's scales, 144MB ain't
> > much), but if it turns out to be, it would be nice to have a way to
> > switch it back *just for them* without recompiling.
> 
> I think you may be forgetting that "the base 3 WAL segments" is no
> longer the default configuration.  checkpoint_segments=3 is history;
> we now have max_wal_size=1GB, which is a maximum of 64 WAL segments,
> not 3.

I think the relevant one for that case is the minimum, though:

#min_wal_size = 80MB

which corresponds to 5 segments.  I suppose the default value for this
minimum would change to some multiple of 64MB.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: increasing the default WAL segment size

From
Robert Haas
Date:
On Thu, Aug 25, 2016 at 2:49 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Robert Haas wrote:
>> On Thu, Aug 25, 2016 at 1:43 PM, Josh Berkus <josh@agliodbs.com> wrote:
>
>> > The one thing I'd be worried about with the increase in size is folks
>> > using PostgreSQL for very small databases.  If your database is only
>> > 30MB or so in size, the increase in size of the WAL will be pretty
>> > significant (+144MB for the base 3 WAL segments).  I'm not sure this is
>> > a real problem which users will notice (in today's scales, 144MB ain't
>> > much), but if it turns out to be, it would be nice to have a way to
>> > switch it back *just for them* without recompiling.
>>
>> I think you may be forgetting that "the base 3 WAL segments" is no
>> longer the default configuration.  checkpoint_segments=3 is history;
>> we now have max_wal_size=1GB, which is a maximum of 64 WAL segments,
>> not 3.
>
> I think the relevant one for that case is the minimum, though:
>
> #min_wal_size = 80MB
>
> which corresponds to 5 segments.  I suppose the default value for this
> minimum would change to some multiple of 64MB.

Yeah, Andres made the same point, although it looks like he
erroneously stated that the minimum was 48MB whereas you have it as
80MB, which seems to be the actual value.  I assume we would have to
raise that to either 128MB or 192MB, which does feel like a bit of a
hefty increase.  It doesn't matter if you're going to make extensive
use of the cluster, but if somebody's spinning up hundreds of clusters
each of which has very little activity it might not be an altogether
welcome change.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: increasing the default WAL segment size

From
Alvaro Herrera
Date:
Robert Haas wrote:
> On Thu, Aug 25, 2016 at 2:49 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:

> > I think the relevant one for that case is the minimum, though:
> >
> > #min_wal_size = 80MB
> >
> > which corresponds to 5 segments.  I suppose the default value for this
> > minimum would change to some multiple of 64MB.
> 
> Yeah, Andres made the same point, although it looks like he
> erroneously stated that the minimum was 48MB whereas you have it as
> 80MB, which seems to be the actual value.  I assume we would have to
> raise that to either 128MB or 192MB, which does feel like a bit of a
> hefty increase.  It doesn't matter if you're going to make extensive
> use of the cluster, but if somebody's spinning up hundreds of clusters
> each of which has very little activity it might not be an altogether
> welcome change.

Yeah, and it's also related to the point Josh Berkus was making about
clusters with little activity.

Does it work to set the minimum to one WAL segment, i.e. 64MB?  guc.c
has a hardcoded minimum of 2, but I couldn't find an explanation for it.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: increasing the default WAL segment size

From
Bruce Momjian
Date:
On Thu, Aug 25, 2016 at 04:21:33PM +0100, Simon Riggs wrote:
> If we do have the pain of change, should we also consider making WAL
> files variable length? What do we gain by having the files all the
> same size? ISTM better to have WAL files that vary in length up to 1GB
> in size.
> 
> (This is all about XLOG_SEG_SIZE; I presume XLOG_BLCKSZ can stay as it
> is, right?)

I think having WAL use variable length files would add complexity for
recycling WAL files.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: increasing the default WAL segment size

From
Robert Haas
Date:
On Thu, Aug 25, 2016 at 3:21 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Yeah, and it's also related to the point Josh Berkus was making about
> clusters with little activity.

Right.

> Does it work to set the minimum to one WAL segment, i.e. 64MB?  guc.c
> has a hardcoded minimum of 2, but I couldn't find an explanation for it.

Well, I think that when you overrun the end of one segment, you're
never going to be able to wrap around to the start of the same
segment; you're going to get sucked into needing another file.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: increasing the default WAL segment size

From
Alvaro Herrera
Date:
Robert Haas wrote:
> On Thu, Aug 25, 2016 at 3:21 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:

> > Does it work to set the minimum to one WAL segment, i.e. 64MB?  guc.c
> > has a hardcoded minimum of 2, but I couldn't find an explanation for it.
> 
> Well, I think that when you overrun the end of one segment, you're
> never going to be able to wrap around to the start of the same
> segment; you're going to get sucked into needing another file.

Sure, but that's a transient situation; after a couple of checkpoints,
the old segment can be removed without any danger, leaving only the
active segment.  [thinks]  Ah, on reflection, there's no way that this
buys anything: it is always critical to have enough disk space to have
one more segment to switch to.  So even if you're on tight disk
constraints, you cannot afford to allocate space for a single segment
only, because if you only have that and the need comes to create the
next one to switch to, you will just not have the space.

If we were to use the WAL space in a way different than the POSIX
file interface, we could probably do better.  But that seems too
onerous.

I suppose the only option is to keep the minimum at 2.  I don't see any
point in forcing the minimum to be more than that, however.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: increasing the default WAL segment size

From
Peter Geoghegan
Date:
On Wed, Aug 24, 2016 at 6:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> 3. archive_timeout is no longer a frequently used option.  Obviously,
> if you are frequently archiving partial segments, you don't want the
> segment size to be too large, because if it is, each forced segment
> switch potentially wastes a large amount of space (and bandwidth).
> But given streaming replication and pg_receivexlog, the use case for
> archiving partial segments is, at least according to my understanding,
> a lot narrower than it used to be.  So, I think we don't have to worry
> as much about keeping forced segment switches cheap as we did during
> the 8.x series.

Heroku uses archive_timeout. It is considered important, because S3
archives are more reliable than EBS storage. We want to cap how much
time can pass before WAL is shipped to S3, to some degree. It's weird
to talk about degrees of durability, since we tend to assume that it's
either/or, but distinctions like that start to matter when you have an
enormous number of databases. S3 has an extremely good track record,
reliability-wise.

We're not too concerned about the overhead of all of this, I think,
because WAL segments consist of zeroes at the end when archive_timeout
is applied (at least from 9.4 on). We compress the WAL segments, and
many zeroes compress very well.

I admit that I haven't looked at it in much detail, but that is my
current understanding.

-- 
Peter Geoghegan



Re: increasing the default WAL segment size

From
Gavin Flower
Date:
On 26/08/16 05:43, Josh Berkus wrote:
> On 08/25/2016 01:12 PM, Robert Haas wrote:
>>> I agree that #4 is best. I'm not sure it's worth the cost. I'm not worried
>>>> at all about the risk of master/slave sync thing, per previous statement.
>>>> But if it does have performance implications, per Andres suggestion, then
>>>> making it configurable at initdb time probably comes with a cost that's not
>>>> worth paying.
>> At this point it's hard to judge, because we don't have any idea what
>> the cost might be.  I guess if we want to pursue this approach,
>> somebody will have to code it up and benchmark it.  But what I'm
>> inclined to do for starters is put together a patch to go from 16MB ->
>> 64MB.  Committing that early this cycle will give us time to
>> reconsider if that turns out to be painful for reasons we haven't
>> thought of yet.  And give tool authors time to make adjustments, if
>> any are needed.
> The one thing I'd be worried about with the increase in size is folks
> using PostgreSQL for very small databases.  If your database is only
> 30MB or so in size, the increase in size of the WAL will be pretty
> significant (+144MB for the base 3 WAL segments).  I'm not sure this is
> a real problem which users will notice (in today's scales, 144MB ain't
> much), but if it turns out to be, it would be nice to have a way to
> switch it back *just for them* without recompiling.
>
Let such folk use Microsoft Access???  <Ducks & runs away very fast!>


More seriously:
Surely most such people would be using very old hardware & not likely to 
be upgrading to the most recent version of pg in the near future?  And 
for the ones using modern hardware: either they have enough resources 
not to notice, or very probably will know enough to hunt round for a way 
to reduce the WAL size - I strongly suspect.

Currently, I'm not support pg in any production environment, and using 
it for testing & keeping up-to-date with pg.  So it would affect me - 
however, I have enough resources so it is no problem in practice.



Cheers,
Gavin




Re: increasing the default WAL segment size

From
Michael Paquier
Date:
On Thu, Aug 25, 2016 at 10:25 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Wed, Aug 24, 2016 at 10:40:06PM -0300, Claudio Freire wrote:
>> > time instead of requiring a recompile; we probably don't save any
>> > significant number of cycles by compiling this into the server.
>>
>> FWIW, +1
>>
>> We're already hurt by the small segments due to a similar phenomenon
>> as the ssh case: TCP slow start. Designing the archive/recovery
>> command to work around TCP slow start is quite complex, and bigger
>> segments would just be a better thing.
>>
>> Not to mention that bigger segments compress better.
>
> This would be good time to rename pg_xlog and pg_clog directories too.

That would be an excellent timing to do so. The first CF is close by,
and such a change would be better at the beginning of the  development
cycle.
-- 
Michael



Re: increasing the default WAL segment size

From
Stephen Frost
Date:
Michael,

* Michael Paquier (michael.paquier@gmail.com) wrote:
> On Thu, Aug 25, 2016 at 10:25 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > On Wed, Aug 24, 2016 at 10:40:06PM -0300, Claudio Freire wrote:
> >> > time instead of requiring a recompile; we probably don't save any
> >> > significant number of cycles by compiling this into the server.
> >>
> >> FWIW, +1
> >>
> >> We're already hurt by the small segments due to a similar phenomenon
> >> as the ssh case: TCP slow start. Designing the archive/recovery
> >> command to work around TCP slow start is quite complex, and bigger
> >> segments would just be a better thing.
> >>
> >> Not to mention that bigger segments compress better.
> >
> > This would be good time to rename pg_xlog and pg_clog directories too.
>
> That would be an excellent timing to do so. The first CF is close by,
> and such a change would be better at the beginning of the  development
> cycle.

If we're going to be renaming things, we might wish to consider further
changes, such as putting everything that's temporary & not WAL-logged
into "pgsql_tmp" directories, so we don't need lists of "directories to
exclude" in things like the pg_basebackup-related code.

We should really have an independent thread about this though, as it's
not what Robert's asking about here.

Thanks!

Stephen

Re: increasing the default WAL segment size

From
Amit Kapila
Date:
On Thu, Aug 25, 2016 at 10:29 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Aug 25, 2016 at 11:21 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 25 August 2016 at 02:31, Robert Haas <robertmhaas@gmail.com> wrote:
>>> Furthermore, there is an enforced, synchronous fsync at the end of
>>> every segment, which actually does hurt performance on write-heavy
>>> workloads.[2] Of course, if that were the only reason to consider
>>> increasing the segment size, it would probably make more sense to just
>>> try to push that extra fsync into the background, but that's not
>>> really the case.  From what I hear, the gigantic number of files is a
>>> bigger pain point.
>>
>> I think we should fully describe the problem before finding a solution.
>
> Sure, that's usually a good idea.  I attempted to outline all of the
> possible issues of which I am aware in my original email, but of
> course you may know of considerations which I overlooked.
>
>> This is too big a change to just tweak a value without discussing the
>> actual issue.
>
> Again, I tried to make sure I was discussing the actual issues in my
> original email.  In brief: having to run archive_command multiple
> times per second imposes very tight latency requirements on it;
> directories with hundreds of thousands or millions of files are hard
> to manage; enforced synchronous fsyncs at the end of each segment hurt
> performance.
>
>> And if the problem is as described, how can a change of x4 be enough
>> to make it worth the pain of change? I think you're already admitting
>> it can't be worth it by discussing initdb configuration.
>
> I guess it depends on how much pain of change you think there will be.
> I would expect a change from 16MB -> 64MB to be fairly painless, but
> (1) it might break tools that aren't designed to cope with differing
> segment sizes and (2) it will increase disk utilization for people who
> have such low velocity systems that they never end up with more than 2
> WAL segments, and now those segments are bigger.  If you know of other
> impacts or have reason to believe those problems will be serious,
> please fill in the details.
>
> Despite the fact that initdb configuration has dominated this thread,
> I mentioned it only in the very last sentence of my email and only as
> a possibility.  I believe that a 4x change will be good enough for the
> majority of people for whom this is currently a pain point.  However,
> yes, I do believe that there are some people for whom it won't be
> sufficient.  And I believe that as we continue to enhance PostgreSQL
> to support higher and higher transaction rates, the number of people
> who need an extra-large WAL segment size will increase.  As I see it,
> there are three options here:
>
> 1. Do nothing.  So far, I don't see anybody arguing for that.
>
> 2. Change the default to 64MB and call it good.  This idea seems to
> have considerable support.
>
> 3. Allow initdb-time configurability but keep the default at 16MB.  I
> don't see any support for this.  There is clearly support for
> configurability, but I don't see anyone arguing that the current
> default is preferable, unless that is what you are arguing.
>
> 4. Change the default to 64MB and also allow initdb-time
> configurability.  This option also appears to enjoy substantial
> support, perhaps more than #2.  Magnus seemed to be arguing that this
> is preferable to #2, because then it's easier for people to change the
> setting back if someone discovers a case where the higher default is a
> problem; Tom, on the other hand, seems to think this is overkill.
>

If we change the default to 64MB, then I think it won't allow to use
old databases as-is because we store it in pg_control (I think one
will get below error [1] for old databases, if we just change default
and don't do anything else).  Do you have way to address it or you
think it is okay?

[1] -
FATAL:  database files are incompatible with server
DETAIL:  The database cluster was initialized with XLOG_SEG_SIZE
16777216, but the server was compiled with XLOG_SEG_SIZE 67108864.
HINT:  It looks like you need to recompile or initdb.
LOG:  database system is shut down

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: increasing the default WAL segment size

From
Michael Paquier
Date:
On Fri, Aug 26, 2016 at 12:25 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> If we change the default to 64MB, then I think it won't allow to use
> old databases as-is because we store it in pg_control (I think one
> will get below error [1] for old databases, if we just change default
> and don't do anything else).  Do you have way to address it or you
> think it is okay?

Those would still be able to work with ./configure
--with-wal-segsize=16, so that's not really an issue.
-- 
Michael



Re: increasing the default WAL segment size

From
Alvaro Herrera
Date:
Gavin Flower wrote:
> On 26/08/16 05:43, Josh Berkus wrote:

> >The one thing I'd be worried about with the increase in size is folks
> >using PostgreSQL for very small databases.  If your database is only
> >30MB or so in size, the increase in size of the WAL will be pretty
> >significant (+144MB for the base 3 WAL segments).  I'm not sure this is
> >a real problem which users will notice (in today's scales, 144MB ain't
> >much), but if it turns out to be, it would be nice to have a way to
> >switch it back *just for them* without recompiling.
> >
> Let such folk use Microsoft Access???  <Ducks & runs away very fast!>
> 
> 
> More seriously:
> Surely most such people would be using very old hardware & not likely to be
> upgrading to the most recent version of pg in the near future?  And for the
> ones using modern hardware: either they have enough resources not to notice,
> or very probably will know enough to hunt round for a way to reduce the WAL
> size - I strongly suspect.

I've seen people with unusual environments, such as running Pg in some
embedded platform with minimal resources, where they were baffled that
Postgres used so much disk space on files that were barely written to
and never read.  It wasn't a question of there being "large" drives to
buy, but one of not wanting to have a drive in the first place.  Now, I
grant that this was a few years ago already and disk tech (SSDs) has
changed that world; maybe that argument doesn't apply anymore.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: increasing the default WAL segment size

From
Amit Kapila
Date:
On Fri, Aug 26, 2016 at 9:04 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Fri, Aug 26, 2016 at 12:25 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> If we change the default to 64MB, then I think it won't allow to use
>> old databases as-is because we store it in pg_control (I think one
>> will get below error [1] for old databases, if we just change default
>> and don't do anything else).  Do you have way to address it or you
>> think it is okay?
>
> Those would still be able to work with ./configure
> --with-wal-segsize=16, so that's not really an issue.
>

Right, but do we need suggest users to do so?  The question/point was
if we deliver server with default value as 64MB, then it won't allow
to start old database.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: increasing the default WAL segment size

From
Michael Paquier
Date:
On Fri, Aug 26, 2016 at 12:54 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Aug 26, 2016 at 9:04 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Fri, Aug 26, 2016 at 12:25 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> If we change the default to 64MB, then I think it won't allow to use
>>> old databases as-is because we store it in pg_control (I think one
>>> will get below error [1] for old databases, if we just change default
>>> and don't do anything else).  Do you have way to address it or you
>>> think it is okay?
>>
>> Those would still be able to work with ./configure
>> --with-wal-segsize=16, so that's not really an issue.
>>
>
> Right, but do we need suggest users to do so?  The question/point was
> if we deliver server with default value as 64MB, then it won't allow
> to start old database.

Right, pg_upgrade could be made smarter by enforcing a conversion with
a dedicated option: we could get away by filling the existing segments
with zeros and add an XLOG switch record at the end of each segments
formerly at 16MB converted to 64MB. That would still be better than
converting each page LSN :(
-- 
Michael



Re: increasing the default WAL segment size

From
Andres Freund
Date:
On 2016-08-26 13:07:09 +0900, Michael Paquier wrote:
> On Fri, Aug 26, 2016 at 12:54 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Fri, Aug 26, 2016 at 9:04 AM, Michael Paquier
> > <michael.paquier@gmail.com> wrote:
> >> On Fri, Aug 26, 2016 at 12:25 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >>> If we change the default to 64MB, then I think it won't allow to use
> >>> old databases as-is because we store it in pg_control (I think one
> >>> will get below error [1] for old databases, if we just change default
> >>> and don't do anything else).  Do you have way to address it or you
> >>> think it is okay?
> >>
> >> Those would still be able to work with ./configure
> >> --with-wal-segsize=16, so that's not really an issue.
> >>
> >
> > Right, but do we need suggest users to do so?  The question/point was
> > if we deliver server with default value as 64MB, then it won't allow
> > to start old database.
> 
> Right, pg_upgrade could be made smarter by enforcing a conversion with
> a dedicated option: we could get away by filling the existing segments
> with zeros and add an XLOG switch record at the end of each segments
> formerly at 16MB converted to 64MB. That would still be better than
> converting each page LSN :(

Maybe I'm missing something here - but why would we need to do any of
that? The WAL already isn't compatible between versions, and we don't
reuse the old server's WAL anyway? Isn't all that's needed relaxing some
error check?



Re: increasing the default WAL segment size

From
Eduardo Morras
Date:
On Wed, 24 Aug 2016 21:31:35 -0400
Robert Haas <robertmhaas@gmail.com> wrote:

> Hi,
> 
> I'd like to propose that we increase the default WAL segment size,
> which is currently 16MB.  It was first set to that value in commit
> 47937403676d913c0e740eec6b85113865c6c8ab in October of 1999; prior to
> that, it was 64MB.  Between 1999 and now, there have been three
> significant changes that make me think it might be time to rethink
> this value:
<snip>
> 
> Thoughts?

From my ignorance, could block size affect this WAL size increase?

In past (didn't tried with >9 versions) you can change disk block page size from 8KB to 16 KB or 32KB (or 4) modifing
src/include/pg_config.hBLCKSZ 8192 and recompiling. (There are some mails in 1999-2002 about this topic)
 

> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company


---   ---
Eduardo Morras <emorrasg@yahoo.es>



Re: increasing the default WAL segment size

From
Robert Haas
Date:
On Fri, Aug 26, 2016 at 12:39 AM, Andres Freund <andres@anarazel.de> wrote:
> Maybe I'm missing something here - but why would we need to do any of
> that? The WAL already isn't compatible between versions, and we don't
> reuse the old server's WAL anyway? Isn't all that's needed relaxing some
> error check?

Yeah.  If this change is made in a new major version - and how else
would we do it? - it doesn't introduce any incompatibility that
wouldn't be present already.  pg_upgrade doesn't (and can't) migrate
WAL.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: increasing the default WAL segment size

From
Robert Haas
Date:
On Fri, Aug 26, 2016 at 4:45 AM, Eduardo Morras <emorrasg@yahoo.es> wrote:
> From my ignorance, could block size affect this WAL size increase?
>
> In past (didn't tried with >9 versions) you can change disk block page size from 8KB to 16 KB or 32KB (or 4) modifing
src/include/pg_config.hBLCKSZ 8192 and recompiling. (There are some mails in 1999-2002 about this topic)
 

Yeah, I think that's still supposed to work although I don't know
whether anyone has tried it lately.  It is a separate topic from this
issue, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: increasing the default WAL segment size

From
Peter Eisentraut
Date:
On 8/24/16 9:31 PM, Robert Haas wrote:
> I'd like to propose that we increase the default WAL segment size,
> which is currently 16MB.

While the discussion about the best default value is ongoing, maybe we
should at least *allow* some larger sizes, for testing out.  Currently,
configure says "Allowed values are 1,2,4,8,16,32,64.".  What might be a
good new upper limit?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: increasing the default WAL segment size

From
Robert Haas
Date:
On Tue, Sep 20, 2016 at 2:49 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 8/24/16 9:31 PM, Robert Haas wrote:
>> I'd like to propose that we increase the default WAL segment size,
>> which is currently 16MB.
>
> While the discussion about the best default value is ongoing, maybe we
> should at least *allow* some larger sizes, for testing out.  Currently,
> configure says "Allowed values are 1,2,4,8,16,32,64.".  What might be a
> good new upper limit?

1GB?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: increasing the default WAL segment size

From
Andres Freund
Date:
On 2016-09-20 16:05:44 -0400, Robert Haas wrote:
> On Tue, Sep 20, 2016 at 2:49 PM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
> > On 8/24/16 9:31 PM, Robert Haas wrote:
> >> I'd like to propose that we increase the default WAL segment size,
> >> which is currently 16MB.
> >
> > While the discussion about the best default value is ongoing, maybe we
> > should at least *allow* some larger sizes, for testing out.  Currently,
> > configure says "Allowed values are 1,2,4,8,16,32,64.".  What might be a
> > good new upper limit?

I'm doubtful it's worth increasing this.


> 1GB?

That sounds way too big to me. WAL file allocation would trigger pretty
massive IO storms during zeroing, max_wal_size is going to be hard to
tune, the amounts of dirty data during bulk loads is going to be very
hard to control.  If somebody wants to do something like this they
better be well informed enough to override a #define.

Andres



Re: increasing the default WAL segment size

From
Robert Haas
Date:
On Tue, Sep 20, 2016 at 4:09 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-09-20 16:05:44 -0400, Robert Haas wrote:
>> On Tue, Sep 20, 2016 at 2:49 PM, Peter Eisentraut
>> <peter.eisentraut@2ndquadrant.com> wrote:
>> > On 8/24/16 9:31 PM, Robert Haas wrote:
>> >> I'd like to propose that we increase the default WAL segment size,
>> >> which is currently 16MB.
>> >
>> > While the discussion about the best default value is ongoing, maybe we
>> > should at least *allow* some larger sizes, for testing out.  Currently,
>> > configure says "Allowed values are 1,2,4,8,16,32,64.".  What might be a
>> > good new upper limit?
>
> I'm doubtful it's worth increasing this.
>
>> 1GB?
>
> That sounds way too big to me. WAL file allocation would trigger pretty
> massive IO storms during zeroing, max_wal_size is going to be hard to
> tune, the amounts of dirty data during bulk loads is going to be very
> hard to control.  If somebody wants to do something like this they
> better be well informed enough to override a #define.

EnterpriseDB has customers generating multiple TB of WAL per day.
Even with a 1GB segment size, some of them will fill multiple files
per minute.  At the current limit of 64MB, a few of them would still
fill more than one file per second.  That is not sane.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: increasing the default WAL segment size

From
Andres Freund
Date:
On 2016-09-20 16:18:02 -0400, Robert Haas wrote:
> On Tue, Sep 20, 2016 at 4:09 PM, Andres Freund <andres@anarazel.de> wrote:
> > That sounds way too big to me. WAL file allocation would trigger pretty
> > massive IO storms during zeroing, max_wal_size is going to be hard to
> > tune, the amounts of dirty data during bulk loads is going to be very
> > hard to control.  If somebody wants to do something like this they
> > better be well informed enough to override a #define.
> 
> EnterpriseDB has customers generating multiple TB of WAL per day.

Sure, that's kind of common.


> Even with a 1GB segment size, some of them will fill multiple files
> per minute.  At the current limit of 64MB, a few of them would still
> fill more than one file per second.  That is not sane.

I doubt generating much larger files actually helps a lot there. I bet
you a patch review that 1GB files are going to regress in pretty much
every situation; especially when taking latency into account.
I think what's actually needed for that is:
- make it easier to implement archiving via streaming WAL; i.e. make pg_receivexlog actually usable
- make archiving parallel
- decouple WAL write & fsyncing granularity from segment size

Requiring a non-default compile time or even just cluster creation time
option for tuning isn't something worth expanding energy on imo.

Andres



Re: increasing the default WAL segment size

From
Robert Haas
Date:
On Tue, Sep 20, 2016 at 4:25 PM, Andres Freund <andres@anarazel.de> wrote:
>> Even with a 1GB segment size, some of them will fill multiple files
>> per minute.  At the current limit of 64MB, a few of them would still
>> fill more than one file per second.  That is not sane.
>
> I doubt generating much larger files actually helps a lot there. I bet
> you a patch review that 1GB files are going to regress in pretty much
> every situation; especially when taking latency into account.

Well, you have a point: let's find out.  Suppose we create a cluster
that generates WAL very quickly, and then try different WAL segment
sizes and see what works out best.  Maybe something like: create N
relatively small tables, with 100 or so tuples in each one.  Have N
backends, each assigned one of those tables, and it just updates all
the rows over and over in a tight loop.  Or feel free to suggest
something else.

> I think what's actually needed for that is:
> - make it easier to implement archiving via streaming WAL; i.e. make
>   pg_receivexlog actually usable
> - make archiving parallel
> - decouple WAL write & fsyncing granularity from segment size
>
> Requiring a non-default compile time or even just cluster creation time
> option for tuning isn't something worth expanding energy on imo.

I don't agree.  The latency requirements on an archive_command when
you're churning out 16MB files multiple times per second are insanely
tight, and saying that we shouldn't increase the size because it's
better to go redesign a bunch of other things that will eventually
*maybe* remove the need for archive_command does not seem like a
reasonable response.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: increasing the default WAL segment size

From
Andres Freund
Date:
On 2016-09-20 16:32:46 -0400, Robert Haas wrote:
> > Requiring a non-default compile time or even just cluster creation time
> > option for tuning isn't something worth expanding energy on imo.
> 
> I don't agree.  The latency requirements on an archive_command when
> you're churning out 16MB files multiple times per second are insanely
> tight, and saying that we shouldn't increase the size because it's
> better to go redesign a bunch of other things that will eventually
> *maybe* remove the need for archive_command does not seem like a
> reasonable response.

Oh, I'm on board with increasing the default size a bit. A different
default size isn't a non-default compile time option anymore though, and
I don't think 1GB is a reasonable default.

Running multiple archive_commands concurrently - pretty easy to
implement - isn't the same as removing the need for archive command. I'm
pretty sure that continously,and if necessary concurrently, archiving a
bunch of 64MB files is going to work better than irregularly
creating / transferring 1GB files.


Andres



Re: increasing the default WAL segment size

From
Robert Haas
Date:
On Tue, Sep 20, 2016 at 4:42 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-09-20 16:32:46 -0400, Robert Haas wrote:
>> > Requiring a non-default compile time or even just cluster creation time
>> > option for tuning isn't something worth expanding energy on imo.
>>
>> I don't agree.  The latency requirements on an archive_command when
>> you're churning out 16MB files multiple times per second are insanely
>> tight, and saying that we shouldn't increase the size because it's
>> better to go redesign a bunch of other things that will eventually
>> *maybe* remove the need for archive_command does not seem like a
>> reasonable response.
>
> Oh, I'm on board with increasing the default size a bit. A different
> default size isn't a non-default compile time option anymore though, and
> I don't think 1GB is a reasonable default.

But that's not the question.  What Peter said was: "maybe we should at
least *allow* some larger sizes, for testing out".  I see very little
merit in restricting the values that people can set via configure.
That just makes life difficult.  If a user picks a setting that
doesn't perform well, oops.

> Running multiple archive_commands concurrently - pretty easy to
> implement - isn't the same as removing the need for archive command. I'm
> pretty sure that continously,and if necessary concurrently, archiving a
> bunch of 64MB files is going to work better than irregularly
> creating / transferring 1GB files.

I'm not trying to block you from implementing parallel archiving, but
right now we don't have it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: increasing the default WAL segment size

From
Peter Eisentraut
Date:
On 9/21/16 8:12 AM, Robert Haas wrote:
>> Oh, I'm on board with increasing the default size a bit. A different
>> > default size isn't a non-default compile time option anymore though, and
>> > I don't think 1GB is a reasonable default.
> But that's not the question.  What Peter said was: "maybe we should at
> least *allow* some larger sizes, for testing out".  I see very little
> merit in restricting the values that people can set via configure.
> That just makes life difficult.  If a user picks a setting that
> doesn't perform well, oops.

Right.  If we think that a larger size can have some performance benefit
and we think that 64MB might be a good new default (as was the initial
suggestion), then we should surely allow at least say 128 and 256 to be
tried out.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:
Hello all,

Please find attached a patch to make wal segment size initdb configurable.

The attached patch removes --with-wal-segsize configure option and adds a new initdb option --wal-segsize. The module initdb passes the wal-segsize value into an environment variable which is used to overwrite the guc value wal_ segment_size and set the internal variables : XLogSegSize and XLOG_SEG_SIZE (xlog_internal.h). The default wal_segment_size is not changed but I have increased the maximum size to 1GB.

Since  XLOG_SEG_SIZE is now variable, it could not be used directly in src/bin modules and few macros and few changes had to be made: 
  -  in guc.c , remove GUC_UNIT_XSEGS which used XLOG_SEG_SIZE and introduce show functions for the guc which used the unit (min_wal_size and max_wal_size).
  -  For pg_basebackup, add new replication command SHOW_WAL_SEGSZ to fetch the wal_segment_size in bytes. 
  - pg_controldata, pg_resetxlog, pg_rewind, fetch the xlog_seg_size from the ControlFile.
  - Since pg_xlogdump reads the wal files, it uses the file size to determine the xlog_seg_size.
  - In pg_test_fsync, a buffer of size XLOG_SEG_SIZE was created, filled with random data and written to a temporary file to check for any write/fsync error before performing the tests. Since it does not affect the actual performance results, the XLOG_SEG_SIZE in the module is replaced with the default value (16MB).

Please note that the documents are not updated in this patch. 

Feedback and suggestions are welcome.
--

Beena Emerson

Have a Great Day!
Attachment

Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:
Hello all,

On Mon, Dec 19, 2016 at 3:14 PM, Beena Emerson <memissemerson@gmail.com> wrote:
Hello all,

Please find attached a patch to make wal segment size initdb configurable.

The attached patch removes --with-wal-segsize configure option and adds a new initdb option --wal-segsize. The module initdb passes the wal-segsize value into an environment variable which is used to overwrite the guc value wal_ segment_size and set the internal variables : XLogSegSize and XLOG_SEG_SIZE (xlog_internal.h). The default wal_segment_size is not changed but I have increased the maximum size to 1GB.

Since  XLOG_SEG_SIZE is now variable, it could not be used directly in src/bin modules and few macros and few changes had to be made: 
  -  in guc.c , remove GUC_UNIT_XSEGS which used XLOG_SEG_SIZE and introduce show functions for the guc which used the unit (min_wal_size and max_wal_size).
  -  For pg_basebackup, add new replication command SHOW_WAL_SEGSZ to fetch the wal_segment_size in bytes. 
  - pg_controldata, pg_resetxlog, pg_rewind, fetch the xlog_seg_size from the ControlFile.
  - Since pg_xlogdump reads the wal files, it uses the file size to determine the xlog_seg_size.
  - In pg_test_fsync, a buffer of size XLOG_SEG_SIZE was created, filled with random data and written to a temporary file to check for any write/fsync error before performing the tests. Since it does not affect the actual performance results, the XLOG_SEG_SIZE in the module is replaced with the default value (16MB).

Please note that the documents are not updated in this patch. 

Feedback and suggestions are welcome.

This patch has been added to the commit fest (https://commitfest.postgresql.org/12/921/

After further testing, I found that pg_standby contrib module does not work with the changes. I will fix it in the next version of the patch. Comments on the current patch are welcome. 


Thank you,

Beena Emerson

Have a Great Day!

Re: [HACKERS] increasing the default WAL segment size

From
Andres Freund
Date:
Hi,

On 2016-12-19 15:14:50 +0530, Beena Emerson wrote:
> The attached patch removes --with-wal-segsize configure option and adds a
> new initdb option --wal-segsize. The module initdb passes the wal-segsize
> value into an environment variable which is used to overwrite the guc value
> wal_ segment_size and set the internal variables : XLogSegSize and
> XLOG_SEG_SIZE (xlog_internal.h). The default wal_segment_size is not
> changed but I have increased the maximum size to 1GB.
> 
> Since  XLOG_SEG_SIZE is now variable, it could not be used directly in
> src/bin modules and few macros and few changes had to be made:

I do think this has the potential for negative performance
implications. Consider code like        /* skip over the page header */        if (CurrPos % XLogSegSize == 0)        {
          CurrPos += SizeOfXLogLongPHD;            currpos += SizeOfXLogLongPHD;        }        else
 
right now that's doable in an efficient manner, because XLogSegSize is
constant. If it's a variable and the compiler doesn't even know it's a
power-of-two, it'll have to do a full "div" - and that's quite easily
noticeable in a lot of cases.

Now it could entirely be that the costs of this will be swamped by
everything else, but I'd not want to rely on it.

I think we need tests with concurrent large-file copies. And then also
look at the profile to see whether the relevant places become new
hotspots (not that we introduce something that's just hidden for now).


We might be able to do a bit better, efficency wise, by storing
XLogSegSize as a "shift factor". I.e. the 16M setting would be 24
(i.e. XLogSegSize would be defined as 1 << 24).

Greetings,

Andres Freund



Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:
Hello Andres, 

On Tue, Dec 20, 2016 at 1:58 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2016-12-19 15:14:50 +0530, Beena Emerson wrote:
> The attached patch removes --with-wal-segsize configure option and adds a
> new initdb option --wal-segsize. The module initdb passes the wal-segsize
> value into an environment variable which is used to overwrite the guc value
> wal_ segment_size and set the internal variables : XLogSegSize and
> XLOG_SEG_SIZE (xlog_internal.h). The default wal_segment_size is not
> changed but I have increased the maximum size to 1GB.
>
> Since  XLOG_SEG_SIZE is now variable, it could not be used directly in
> src/bin modules and few macros and few changes had to be made:

I do think this has the potential for negative performance
implications. Consider code like
                        /* skip over the page header */
                        if (CurrPos % XLogSegSize == 0)
                        {
                                CurrPos += SizeOfXLogLongPHD;
                                currpos += SizeOfXLogLongPHD;
                        }
                        else
right now that's doable in an efficient manner, because XLogSegSize is
constant. If it's a variable and the compiler doesn't even know it's a
power-of-two, it'll have to do a full "div" - and that's quite easily
noticeable in a lot of cases.

Now it could entirely be that the costs of this will be swamped by
everything else, but I'd not want to rely on it.

I think we need tests with concurrent large-file copies. And then also
look at the profile to see whether the relevant places become new
hotspots (not that we introduce something that's just hidden for now).

We might be able to do a bit better, efficency wise, by storing
XLogSegSize as a "shift factor". I.e. the 16M setting would be 24
(i.e. XLogSegSize would be defined as 1 << 24).

 
Thank you.
I did not realize we could face such problems. I will perform the tests to check the performance and do the required changes.


--

Beena Emerson

Have a Great Day!

Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Tue, Dec 20, 2016 at 3:28 AM, Andres Freund <andres@anarazel.de> wrote:
> I do think this has the potential for negative performance
> implications. Consider code like
>                         /* skip over the page header */
>                         if (CurrPos % XLogSegSize == 0)
>                         {
>                                 CurrPos += SizeOfXLogLongPHD;
>                                 currpos += SizeOfXLogLongPHD;
>                         }
>                         else
> right now that's doable in an efficient manner, because XLogSegSize is
> constant. If it's a variable and the compiler doesn't even know it's a
> power-of-two, it'll have to do a full "div" - and that's quite easily
> noticeable in a lot of cases.
>
> Now it could entirely be that the costs of this will be swamped by
> everything else, but I'd not want to rely on it.

We could use the GUC assign hook to compute a mask and a shift, so
that this could be written as (CurrPos & mask_variable) == 0.  That
would avoid the division instruction, though not the memory access.  I
hope this is all in the noise, though.  I know this is code is hot but
I think it'll be hard to construct a test case where the bottleneck is
anything other than the speed at which the disk can absorb bytes.  I
suppose we could set fsync=off and put the whole cluster on a RAMDISK
to avoid those bottlenecks, but of course no real storage system
behaves like that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
Andres Freund
Date:
On 2016-12-20 08:10:29 -0500, Robert Haas wrote:
> We could use the GUC assign hook to compute a mask and a shift, so
> that this could be written as (CurrPos & mask_variable) == 0.  That
> would avoid the division instruction, though not the memory access.

I suspect that'd be fine.


> I hope this is all in the noise, though.

Could very well be.


> I know this is code is hot but I think it'll be hard to construct a
> test case where the bottleneck is anything other than the speed at
> which the disk can absorb bytes.

I don't think that's really true. Heikki's WAL changes made a *BIG*
difference. And pretty small changes in xlog.c can make noticeable
throughput differences both in single and multi-threaded
workloads. E.g. witnessed by the fact that the crc computation used to
be a major bottleneck (and the crc32c instruction still shows up
noticeably in profiles).  SSDs have become fast enough that it's
increasingly hard to saturate them.

Andres



Re: [HACKERS] increasing the default WAL segment size

From
Tomas Vondra
Date:
On 12/20/2016 02:19 PM, Andres Freund wrote:
> On 2016-12-20 08:10:29 -0500, Robert Haas wrote:
>> We could use the GUC assign hook to compute a mask and a shift, so
>> that this could be written as (CurrPos & mask_variable) == 0.  That
>> would avoid the division instruction, though not the memory access.
>
> I suspect that'd be fine.
>
>
>> I hope this is all in the noise, though.
>
> Could very well be.
>
>
>> I know this is code is hot but I think it'll be hard to construct a
>> test case where the bottleneck is anything other than the speed at
>> which the disk can absorb bytes.
>
> I don't think that's really true. Heikki's WAL changes made a *BIG*
> difference. And pretty small changes in xlog.c can make noticeable
> throughput differences both in single and multi-threaded
> workloads. E.g. witnessed by the fact that the crc computation used to
> be a major bottleneck (and the crc32c instruction still shows up
> noticeably in profiles).  SSDs have become fast enough that it's
> increasingly hard to saturate them.
>

It's not just SSDs. RAID controllers with write cache (which is 
typically just DRR3 memory anyway) have about the same effect even with 
spinning rust.

So yes, this might make a measurable difference.

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] increasing the default WAL segment size

From
Jim Nasby
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:       not tested
Spec compliant:           not tested
Documentation:            not tested

General comments:
There was some discussion about the impact of this on small installs, particularly around min_wal_size. The concern was
thatchanging the default segment size to 64MB would significantly increase min_wal_size in terms of bytes. The default
valuefor min_wal_size is 5 segments, so 16MB->64MB would mean going from 80MB to 320MB. IMHO if you're worried about
thatthen just initdb with a smaller segment size. There's probably a number of other changes a small environment wants
tomake besides that. Perhaps it'd be worth making DEFAULT_XLOG_SEG_SIZE a configure option to better support that.
 

It's not clear from the thread that there is consensus that this feature is desired. In particular, the performance
aspectsof changing segment size from a C constant to a variable are in question. Someone with access to large hardware
shouldtest that. Andres[1] and Robert[2] did suggest that the option could be changed to a bitshift, which IMHO would
alsosolve some sanity-checking issues.
 

+     * initdb passes the WAL segment size in an environment variable. We don't
+     * bother doing any sanity checking, we already check in initdb that the
+     * user gives a sane value.

That doesn't seem like a good idea to me. If anything, the backend should sanity-check and initdb just rely on that.
Perhapsthis is how other initdb options work, but it still seems bogus. In particular, verifying the size is a power of
2seems important, as failing that would probably be ReallyBad(tm).
 

The patch also blindly trusts the value read from the control file; I'm not sure if that's standard procedure or not,
butISTM it'd be worth sanity-checking that as well.
 

The patch leaves the base GUC units for min_wal_size and max_wal_size as the # of segments. I'm not sure if that's a
greatidea.
 

+ * convert_unit
+ *
+ * This takes the value in kbytes and then returns value in user-readable format

This function needs a more specific name, such as pretty_print_kb().

+        /* Check if wal_segment_size is in the power of 2 */
+        for (i = 0;; i++, pow2 = pow(2, i))
+            if (pow2 >= wal_segment_size)
+                break;
+
+        if (wal_segment_size != 1 && pow2 > wal_segment_size)
+        {
+            fprintf(stderr, _("%s: WAL segment size must be in the power of 2\n"), progname);
+            exit(1);
+        }

IMHO it'd be better to use the n & (n-1) check detailed at [3].

Actually, there's got to be other places that need to check this, so it'd be nice to just create a function that
verifiesa number is a power of 2.
 

+    if (log_fname != NULL)
+        XLogFromFileName(log_fname, &minXlogTli, &minXlogSegNo);
+

Please add a comment about why XLogFromFileName has to be delayed.
/*
+ * DEFAULT_XLOG_SEG_SIZE is the size of a single WAL file.  This must be a power
+ * of 2 and larger than XLOG_BLCKSZ (preferably, a great deal larger than
+ * XLOG_BLCKSZ).
+ *
+ * Changing DEFAULT_XLOG_SEG_SIZE requires an initdb.
+ */
+#define DEFAULT_XLOG_SEG_SIZE    (16*1024*1024)

That comment isn't really accurate. It would be more useful to explain that DEFAULT_XLOG_SEG_SIZE is the default size
ofa WAL segment used by initdb if a different value isn't specified.
 

1: https://www.postgresql.org/message-id/20161220082847.7t3t6utvxb6m5tfe%40alap3.anarazel.de
2: https://www.postgresql.org/message-id/CA%2BTgmoZTgnL25o68uPBTS6BD37ojD-1y-N88PkP57FzKqwcmmQ%40mail.gmail.com
3: http://stackoverflow.com/questions/108318/whats-the-simplest-way-to-test-whether-a-number-is-a-power-of-2-in-c

Re: [HACKERS] increasing the default WAL segment size

From
Michael Paquier
Date:
On Tue, Jan 3, 2017 at 6:23 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> +               /* Check if wal_segment_size is in the power of 2 */
> +               for (i = 0;; i++, pow2 = pow(2, i))
> +                       if (pow2 >= wal_segment_size)
> +                               break;
> +
> +               if (wal_segment_size != 1 && pow2 > wal_segment_size)
> +               {
> +                       fprintf(stderr, _("%s: WAL segment size must be in the power of 2\n"), progname);
> +                       exit(1);
> +               }

I recall taht pow(x, 2) and x * x result usually in the same assembly
code, but pow() can never be more optimal than a simple
multiplication. So I'd think that it is wiser to avoid it in this code
path. Documentation is missing for the new replication command
SHOW_WAL_SEG. Actually, why not just having an equivalent of the SQL
command and be able to query parameter values?
-- 
Michael



Re: [HACKERS] increasing the default WAL segment size

From
Simon Riggs
Date:
On 2 January 2017 at 21:23, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

> It's not clear from the thread that there is consensus that this feature is desired. In particular, the performance
aspectsof changing segment size from a C constant to a variable are in question. Someone with access to large hardware
shouldtest that. Andres[1] and Robert[2] did suggest that the option could be changed to a bitshift, which IMHO would
alsosolve some sanity-checking issues. 

Overall, Robert has made a good case. The only discussion now is about
the knock-on effects it causes.

One concern that has only barely been discussed is the effect of
zero-ing new WAL files. That is a linear effect and will adversely
effect performance as WAL segment size increases. (The already stated
fsync problem is also a linear effect but that reduces with WAL
segment size, hence the need for a trade-off and hence why
variable-size is preferable).

If we wish this feature to get committed ISTM that we should examine
server performance with a large fixed WAL segment size, so we can
measure the effects of this, particularly with regard to the poor user
that gets to add a new WAL file. ISTM that may reveal more work is
needed to be handed off to the WALWriter process (or other
issues/solutions).

Once we have that information we can consider whether to apply this
patch, so until then, -1 to apply this, though I am hopeful that this
can be applied in this release.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] increasing the default WAL segment size

From
Amit Kapila
Date:
On Tue, Jan 3, 2017 at 6:41 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 2 January 2017 at 21:23, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>
>> It's not clear from the thread that there is consensus that this feature is desired. In particular, the performance
aspectsof changing segment size from a C constant to a variable are in question. Someone with access to large hardware
shouldtest that. Andres[1] and Robert[2] did suggest that the option could be changed to a bitshift, which IMHO would
alsosolve some sanity-checking issues. 
>
> Overall, Robert has made a good case. The only discussion now is about
> the knock-on effects it causes.
>
> One concern that has only barely been discussed is the effect of
> zero-ing new WAL files. That is a linear effect and will adversely
> effect performance as WAL segment size increases.
>

Sorry, but I am not able to understand why this is a problem?  The
bigger the size of WAL segment, lesser the number of files.  So IIUC,
then it can only impact if zero-ing two 16MB files is cheaper than
zero-ing one 32MB file.  Is that your theory or you have something
else in mind?


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] increasing the default WAL segment size

From
Simon Riggs
Date:
On 3 January 2017 at 13:45, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Tue, Jan 3, 2017 at 6:41 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 2 January 2017 at 21:23, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>>
>>> It's not clear from the thread that there is consensus that this feature is desired. In particular, the performance
aspectsof changing segment size from a C constant to a variable are in question. Someone with access to large hardware
shouldtest that. Andres[1] and Robert[2] did suggest that the option could be changed to a bitshift, which IMHO would
alsosolve some sanity-checking issues. 
>>
>> Overall, Robert has made a good case. The only discussion now is about
>> the knock-on effects it causes.
>>
>> One concern that has only barely been discussed is the effect of
>> zero-ing new WAL files. That is a linear effect and will adversely
>> effect performance as WAL segment size increases.
>>
>
> Sorry, but I am not able to understand why this is a problem?  The
> bigger the size of WAL segment, lesser the number of files.  So IIUC,
> then it can only impact if zero-ing two 16MB files is cheaper than
> zero-ing one 32MB file.  Is that your theory or you have something
> else in mind?

The issue I see is that at present no backend needs to do more than
16MB of zeroing at one time, so the impact on response time is
reduced. If we start doing zeroing in larger chunks than the impact on
response times will increase. So instead of regular blips we have one
large blip, less often. I think the latter will be worse, but welcome
measurements that show that performance is smooth and regular with
large files sizes.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Tue, Jan 3, 2017 at 8:59 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 3 January 2017 at 13:45, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Tue, Jan 3, 2017 at 6:41 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> On 2 January 2017 at 21:23, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>>>
>>>> It's not clear from the thread that there is consensus that this feature is desired. In particular, the
performanceaspects of changing segment size from a C constant to a variable are in question. Someone with access to
largehardware should test that. Andres[1] and Robert[2] did suggest that the option could be changed to a bitshift,
whichIMHO would also solve some sanity-checking issues. 
>>>
>>> Overall, Robert has made a good case. The only discussion now is about
>>> the knock-on effects it causes.
>>>
>>> One concern that has only barely been discussed is the effect of
>>> zero-ing new WAL files. That is a linear effect and will adversely
>>> effect performance as WAL segment size increases.
>>>
>>
>> Sorry, but I am not able to understand why this is a problem?  The
>> bigger the size of WAL segment, lesser the number of files.  So IIUC,
>> then it can only impact if zero-ing two 16MB files is cheaper than
>> zero-ing one 32MB file.  Is that your theory or you have something
>> else in mind?
>
> The issue I see is that at present no backend needs to do more than
> 16MB of zeroing at one time, so the impact on response time is
> reduced. If we start doing zeroing in larger chunks than the impact on
> response times will increase. So instead of regular blips we have one
> large blip, less often. I think the latter will be worse, but welcome
> measurements that show that performance is smooth and regular with
> large files sizes.

Yeah.  I don't think there's any way to get around the fact that there
will be bigger latency spikes in some cases with larger WAL files.  I
think the question is whether they'll be common enough or serious
enough to worry about.  For example, in a quick test on my laptop,
zero-filling a 16 megabyte file using "dd if=/dev/zero of=x bs=8k
count=2048" takes about 11 milliseconds, and zero-filling a 64
megabyte file with a count of 8192 increases the time to almost 50
milliseconds.  That's something, but I wouldn't rate it as concerning.
There are a lot of things that can cause latency changes multiple
orders of magnitude larger than that, so worrying about that one in
particular would seem to me to be fairly pointless.  However, that's
also a measurement on an unloaded system with an SSD, and the impact
may be a lot more on a big system where with lots of concurrent
activity, and if the process that does the write also has to do an
fsync, that will increase the cost considerably, too.

But the flip side is that it's wrong to imagine that there's no harm
in leaving the situation as it is.  Even my MacBook Pro can crank out
about 2.7 WAL segments/second on "pgbench -c 16 -j 16 -T 60".  I think
a decent server with a few more CPU cores than my laptop has could do
4-5 times that.  So we shouldn't imagine that the costs of spewing out
a bajillion segment files are being paid only at the very high end.
Even somebody running PostgreSQL on a low-end virtual machine might
find it difficult to write an archive_command that can keep up if the
system is under continuous load.  Of course, as Stephen pointed out,
there are toolkits that can do it and you should probably be using one
of those anyway for other reasons, but nevertheless spitting out
almost 3 WAL segments per second even on a laptop gives a whole new
meaning to the term "continuous archiving".

Another point to consider is that a bigger WAL segment size can
actually *improve* latency because every segment switch triggers an
immediate fsync, and every backend in the system ends up waiting for
it to finish.  We should probably eventually try to push those flushes
into the background, and the zeroing work as well.  My impression
(possibly incorrect?) is that we expect to settle into a routine where
zeroing new segments is relatively uncommon because we reuse old
segment files, but the forced end-of-segment flushes never go away.
So it's possible we might actually come out ahead on latency with this
change, at least sometimes.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
Simon Riggs
Date:
On 3 January 2017 at 15:44, Robert Haas <robertmhaas@gmail.com> wrote:

> Yeah.  I don't think there's any way to get around the fact that there
> will be bigger latency spikes in some cases with larger WAL files.

One way would be for the WALwriter to zerofill new files ahead of
time, thus avoiding the latency spike.

> For example, in a quick test on my laptop,
> zero-filling a 16 megabyte file using "dd if=/dev/zero of=x bs=8k
> count=2048" takes about 11 milliseconds, and zero-filling a 64
> megabyte file with a count of 8192 increases the time to almost 50
> milliseconds.  That's something, but I wouldn't rate it as concerning.

I would rate that as concerning, especially if we allow much larger sizes.

> But the flip side is that it's wrong to imagine that there's no harm
> in leaving the situation as it is.

The case for change has been made; the only discussion is what's in
the new patch.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Tue, Jan 3, 2017 at 11:16 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 3 January 2017 at 15:44, Robert Haas <robertmhaas@gmail.com> wrote:
>> Yeah.  I don't think there's any way to get around the fact that there
>> will be bigger latency spikes in some cases with larger WAL files.
>
> One way would be for the WALwriter to zerofill new files ahead of
> time, thus avoiding the latency spike.

Sure, we could do that.  I think it's an independent improvement,
though: it is beneficial with or without this patch.

>> For example, in a quick test on my laptop,
>> zero-filling a 16 megabyte file using "dd if=/dev/zero of=x bs=8k
>> count=2048" takes about 11 milliseconds, and zero-filling a 64
>> megabyte file with a count of 8192 increases the time to almost 50
>> milliseconds.  That's something, but I wouldn't rate it as concerning.
>
> I would rate that as concerning, especially if we allow much larger sizes.

I don't really understand the concern.  If we allow large sizes but
they are not the default, people can make a throughput-vs-latency
trade-off when chosing a value for their installation.  Those kind of
trade-offs are common and unavoidable.  If we raise the default, then
it's more of a concern, but I'm not sure those numbers are big enough
to worry about.  I'm not sure how to decide which numbers are big
enough to worry about, either.

I guess we need some test results showing what happens with this patch
in the real world before we go further.  I agree that there's a
possible downside to raising the segment size, but my suspicion is
that the results are going to be better, not worse, because of
reducing the number of end-of-segment fsyncs.  There's no point
worrying too much about how we're going to mitigate the negative
impact until we know for sure that there is one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
Simon Riggs
Date:
On 3 January 2017 at 16:24, Robert Haas <robertmhaas@gmail.com> wrote:
> On Jan 3, 2017 at 11:16 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 3 January 2017 at 15:44, Robert Haas <robertmhaas@gmail.com> wrote:
>>> Yeah.  I don't think there's any way to get around the fact that there
>>> will be bigger latency spikes in some cases with larger WAL files.
>>
>> One way would be for the WALwriter to zerofill new files ahead of
>> time, thus avoiding the latency spike.
>
> Sure, we could do that.  I think it's an independent improvement,
> though: it is beneficial with or without this patch.

The latency spike problem is exacerbated by increasing file size, so I
think if we are allowing people to increase file size in this release
then we should fix the knock-on problem it causes in this release
also. If we don't fix it as part of this patch I would consider it an
open item.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Tue, Jan 3, 2017 at 3:38 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 3 January 2017 at 16:24, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Jan 3, 2017 at 11:16 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> On 3 January 2017 at 15:44, Robert Haas <robertmhaas@gmail.com> wrote:
>>>> Yeah.  I don't think there's any way to get around the fact that there
>>>> will be bigger latency spikes in some cases with larger WAL files.
>>>
>>> One way would be for the WALwriter to zerofill new files ahead of
>>> time, thus avoiding the latency spike.
>>
>> Sure, we could do that.  I think it's an independent improvement,
>> though: it is beneficial with or without this patch.
>
> The latency spike problem is exacerbated by increasing file size, so I
> think if we are allowing people to increase file size in this release
> then we should fix the knock-on problem it causes in this release
> also. If we don't fix it as part of this patch I would consider it an
> open item.

I think I'd like to see some benchmark results before forming an
opinion on whether that's a must-fix issue.  I'm not sure I believe
that allowing a larger WAL segment size is going to make things worse
more than it makes them better.  I think that should be tested, not
assumed true.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
Simon Riggs
Date:
On 3 January 2017 at 21:33, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Jan 3, 2017 at 3:38 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 3 January 2017 at 16:24, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Jan 3, 2017 at 11:16 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>> On 3 January 2017 at 15:44, Robert Haas <robertmhaas@gmail.com> wrote:
>>>>> Yeah.  I don't think there's any way to get around the fact that there
>>>>> will be bigger latency spikes in some cases with larger WAL files.
>>>>
>>>> One way would be for the WALwriter to zerofill new files ahead of
>>>> time, thus avoiding the latency spike.
>>>
>>> Sure, we could do that.  I think it's an independent improvement,
>>> though: it is beneficial with or without this patch.
>>
>> The latency spike problem is exacerbated by increasing file size, so I
>> think if we are allowing people to increase file size in this release
>> then we should fix the knock-on problem it causes in this release
>> also. If we don't fix it as part of this patch I would consider it an
>> open item.
>
> I think I'd like to see some benchmark results before forming an
> opinion on whether that's a must-fix issue.  I'm not sure I believe
> that allowing a larger WAL segment size is going to make things worse
> more than it makes them better.  I think that should be tested, not
> assumed true.

Strange response. Nothing has been assumed. I asked for tests and you
provided measurements.

I suggest we fix just the problem as the fastest way forwards.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Wed, Jan 4, 2017 at 3:05 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Strange response. Nothing has been assumed. I asked for tests and you
> provided measurements.

Sure, of zero-filling a file with dd.  But I also pointed out that in
a real PostgreSQL cluster, the change could actually *reduce* latency.

> I suggest we fix just the problem as the fastest way forwards.

If you want to do the work, sure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
Simon Riggs
Date:
On 4 January 2017 at 13:57, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jan 4, 2017 at 3:05 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Strange response. Nothing has been assumed. I asked for tests and you
>> provided measurements.
>
> Sure, of zero-filling a file with dd.  But I also pointed out that in
> a real PostgreSQL cluster, the change could actually *reduce* latency.

I think we are talking at cross purposes. We agree that the main
change is useful, but it causes another problem which I can't see how
you can characterize as reduced latency, based upon your own
measurements.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Wed, Jan 4, 2017 at 9:47 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 4 January 2017 at 13:57, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Jan 4, 2017 at 3:05 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> Strange response. Nothing has been assumed. I asked for tests and you
>>> provided measurements.
>>
>> Sure, of zero-filling a file with dd.  But I also pointed out that in
>> a real PostgreSQL cluster, the change could actually *reduce* latency.
>
> I think we are talking at cross purposes. We agree that the main
> change is useful, but it causes another problem which I can't see how
> you can characterize as reduced latency, based upon your own
> measurements.

Zero-filling files will take longer if the files are bigger.  That
will increase latency.  But we will also have fewer forced
end-of-segment syncs.  That will reduce latency.  Which effect is
bigger?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
Michael Paquier
Date:
On Thu, Jan 5, 2017 at 12:33 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jan 4, 2017 at 9:47 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 4 January 2017 at 13:57, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Wed, Jan 4, 2017 at 3:05 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>> Strange response. Nothing has been assumed. I asked for tests and you
>>>> provided measurements.
>>>
>>> Sure, of zero-filling a file with dd.  But I also pointed out that in
>>> a real PostgreSQL cluster, the change could actually *reduce* latency.
>>
>> I think we are talking at cross purposes. We agree that the main
>> change is useful, but it causes another problem which I can't see how
>> you can characterize as reduced latency, based upon your own
>> measurements.
>
> Zero-filling files will take longer if the files are bigger.  That
> will increase latency.  But we will also have fewer forced
> end-of-segment syncs.  That will reduce latency.  Which effect is
> bigger?

It depends on if the environment is CPU-bounded or I/O bounded. If CPU
is at its limit, zero-filling takes time. If that's the I/O, fsync()
would take longer to complete.
-- 
Michael



Re: [HACKERS] increasing the default WAL segment size

From
David Rowley
Date:
On 4 January 2017 at 01:16, Michael Paquier <michael.paquier@gmail.com> wrote:
> On Tue, Jan 3, 2017 at 6:23 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>> +               /* Check if wal_segment_size is in the power of 2 */
>> +               for (i = 0;; i++, pow2 = pow(2, i))
>> +                       if (pow2 >= wal_segment_size)
>> +                               break;
>> +
>> +               if (wal_segment_size != 1 && pow2 > wal_segment_size)
>> +               {
>> +                       fprintf(stderr, _("%s: WAL segment size must be in the power of 2\n"), progname);
>> +                       exit(1);
>> +               }
>
> I recall taht pow(x, 2) and x * x result usually in the same assembly
> code, but pow() can never be more optimal than a simple
> multiplication. So I'd think that it is wiser to avoid it in this code
> path. Documentation is missing for the new replication command
> SHOW_WAL_SEG. Actually, why not just having an equivalent of the SQL
> command and be able to query parameter values?

This would probably be nicer written using a bitwise trick to ensure
that no lesser significant bits are set. If it's a power of 2, then
subtracting 1 should have all the lesser significant bits as 1, so
binary ANDing to that should be 0. i.e no common bits.

Something like:

/* ensure segment size is a power of 2 */
if ((wal_segment_size & (wal_segment_size - 1)) != 0)
{  fprintf(stderr, _("%s: WAL segment size must be in the power of
2\n"), progname);  exit(1);
}

There's a similar trick in bitmapset.c for RIGHTMOST_ONE, so looks
like we already have assumptions about two's complement arithmetic

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:
Hello,

Thank you for your review.

On Tue, Jan 3, 2017 at 2:53 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:       not tested
Spec compliant:           not tested
Documentation:            not tested

General comments:
There was some discussion about the impact of this on small installs, particularly around min_wal_size. The concern was that changing the default segment size to 64MB would significantly increase min_wal_size in terms of bytes. The default value for min_wal_size is 5 segments, so 16MB->64MB would mean going from 80MB to 320MB. IMHO if you're worried about that then just initdb with a smaller segment size. There's probably a number of other changes a small environment wants to make besides that. Perhaps it'd be worth making DEFAULT_XLOG_SEG_SIZE a configure option to better support that.

The patch maintains the current XLOG_SEG_SIZE of 16MB as the default. Only the capability to change its value has been moved for configure to initdb. 
 

It's not clear from the thread that there is consensus that this feature is desired. In particular, the performance aspects of changing segment size from a C constant to a variable are in question. Someone with access to large hardware should test that. Andres[1] and Robert[2] did suggest that the option could be changed to a bitshift, which IMHO would also solve some sanity-checking issues.

+        * initdb passes the WAL segment size in an environment variable. We don't
+        * bother doing any sanity checking, we already check in initdb that the
+        * user gives a sane value.

That doesn't seem like a good idea to me. If anything, the backend should sanity-check and initdb just rely on that. Perhaps this is how other initdb options work, but it still seems bogus. In particular, verifying the size is a power of 2 seems important, as failing that would probably be ReallyBad(tm).

The patch also blindly trusts the value read from the control file; I'm not sure if that's standard procedure or not, but ISTM it'd be worth sanity-checking that as well.

There is a CRC check to detect error in the file. I think all the ControlFile values are used directly and not re-verified.
 

The patch leaves the base GUC units for min_wal_size and max_wal_size as the # of segments. I'm not sure if that's a great idea.

I think we can leave it as is. This is used in CalculateCheckpontSegments and in XLOGfileslop to calculate the segment numbers.


+ * convert_unit
+ *
+ * This takes the value in kbytes and then returns value in user-readable format

This function needs a more specific name, such as pretty_print_kb().

I agree pretty_print_kb would have been a better for this function. However, I have realised that using the show hook and this function is not suitable and have found a better way of handling the removal of GUC_UNIT_XSEGS which no longer needs this function : using the GUC_UNIT_KB, convert the value in bytes to wal segment count instead in the assign hook. The next version of patch will use this.

 

+               /* Check if wal_segment_size is in the power of 2 */
+               for (i = 0;; i++, pow2 = pow(2, i))
+                       if (pow2 >= wal_segment_size)
+                               break;
+
+               if (wal_segment_size != 1 && pow2 > wal_segment_size)
+               {
+                       fprintf(stderr, _("%s: WAL segment size must be in the power of 2\n"), progname);
+                       exit(1);
+               }

IMHO it'd be better to use the n & (n-1) check detailed at [3].

Yes, even I had come across it. I will incorporate this in the next version of the patch. 
 

Actually, there's got to be other places that need to check this, so it'd be nice to just create a function that verifies a number is a power of 2.

+       if (log_fname != NULL)
+               XLogFromFileName(log_fname, &minXlogTli, &minXlogSegNo);
+

Please add a comment about why XLogFromFileName has to be delayed.

Oh yes!. 
 

 /*
+ * DEFAULT_XLOG_SEG_SIZE is the size of a single WAL file.  This must be a power
+ * of 2 and larger than XLOG_BLCKSZ (preferably, a great deal larger than
+ * XLOG_BLCKSZ).
+ *
+ * Changing DEFAULT_XLOG_SEG_SIZE requires an initdb.
+ */
+#define DEFAULT_XLOG_SEG_SIZE  (16*1024*1024)

That comment isn't really accurate. It would be more useful to explain that DEFAULT_XLOG_SEG_SIZE is the default size of a WAL segment used by initdb if a different value isn't specified.

I will correct this comment


The new version of the patch will be posted soon.
 

Thank you,

Beena Emerson

Have a Great Day!

Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:
Hello,

On Tue, Jan 3, 2017 at 5:46 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Tue, Jan 3, 2017 at 6:23 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> +               /* Check if wal_segment_size is in the power of 2 */
> +               for (i = 0;; i++, pow2 = pow(2, i))
> +                       if (pow2 >= wal_segment_size)
> +                               break;
> +
> +               if (wal_segment_size != 1 && pow2 > wal_segment_size)
> +               {
> +                       fprintf(stderr, _("%s: WAL segment size must be in the power of 2\n"), progname);
> +                       exit(1);
> +               }

I recall taht pow(x, 2) and x * x result usually in the same assembly
code, but pow() can never be more optimal than a simple
multiplication. So I'd think that it is wiser to avoid it in this code
path. Documentation is missing for the new replication command
SHOW_WAL_SEG.

As mentioned earlier, documents are not fully updated.
 
Actually, why not just having an equivalent of the SQL
command and be able to query parameter values?

This patch only needed the wal_segment_size and hence I made this specific command. 
How often and why would we need other parameter values in the replication connection?
Making it a more general command to fetch any parameter can be a separate topic. If it gets consensus, maybe it could be done and used here.


Thank you, 

Beena Emerson

Have a Great Day!

Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Thu, Jan 5, 2017 at 6:39 AM, Beena Emerson <memissemerson@gmail.com> wrote:
> This patch only needed the wal_segment_size and hence I made this specific
> command.
> How often and why would we need other parameter values in the replication
> connection?
> Making it a more general command to fetch any parameter can be a separate
> topic. If it gets consensus, maybe it could be done and used here.

I think the idea of supporting SHOW here is better than adding a
special-purpose command just for the WAL size.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
Michael Paquier
Date:
On Thu, Jan 5, 2017 at 8:39 PM, Beena Emerson <memissemerson@gmail.com> wrote:
> On Tue, Jan 3, 2017 at 5:46 PM, Michael Paquier <michael.paquier@gmail.com>
> wrote:
>> Actually, why not just having an equivalent of the SQL
>> command and be able to query parameter values?
>
> This patch only needed the wal_segment_size and hence I made this specific
> command.
> How often and why would we need other parameter values in the replication
> connection?
> Making it a more general command to fetch any parameter can be a separate
> topic. If it gets consensus, maybe it could be done and used here.

I concur that for this patch it may not be necessary. But let's not
narrow us in a corner when designing things. Being able to query the
value of parameters is something that I think is actually useful for
cases where custom GUCs are loaded on the server's
shared_preload_libraries to do validation checks (one case is a
logical decoder on backend, with streaming receiver as client
expecting the logical decoder to do a minimum). This can allow a
client to do checks only using a replication stream. Another case that
I have in mind is for utilities like pg_rewind, we have been
discussing about being able to not need a superuser when querying the
target server. Having such a command would allow for example pg_rewind
to do a 'SHOW full_page_writes' without the need of an extra
connection.
-- 
Michael



Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:


On Fri, Jan 6, 2017 at 11:36 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Thu, Jan 5, 2017 at 8:39 PM, Beena Emerson <memissemerson@gmail.com> wrote:
> On Tue, Jan 3, 2017 at 5:46 PM, Michael Paquier <michael.paquier@gmail.com>
> wrote:
>> Actually, why not just having an equivalent of the SQL
>> command and be able to query parameter values?
>
> This patch only needed the wal_segment_size and hence I made this specific
> command.
> How often and why would we need other parameter values in the replication
> connection?
> Making it a more general command to fetch any parameter can be a separate
> topic. If it gets consensus, maybe it could be done and used here.

I concur that for this patch it may not be necessary. But let's not
narrow us in a corner when designing things. Being able to query the
value of parameters is something that I think is actually useful for
cases where custom GUCs are loaded on the server's
shared_preload_libraries to do validation checks (one case is a
logical decoder on backend, with streaming receiver as client
expecting the logical decoder to do a minimum). This can allow a
client to do checks only using a replication stream. Another case that
I have in mind is for utilities like pg_rewind, we have been
discussing about being able to not need a superuser when querying the
target server. Having such a command would allow for example pg_rewind
to do a 'SHOW full_page_writes' without the need of an extra
connection.


I see the point. I will change the SHOW_WAL_SEGSZ to a general SHOW command in the next version of the patch.
 
Thank you, 

Beena Emerson

Have a Great Day!

Re: [HACKERS] increasing the default WAL segment size

From
Michael Paquier
Date:
On Fri, Jan 6, 2017 at 6:32 PM, Beena Emerson <memissemerson@gmail.com> wrote:
> I see the point. I will change the SHOW_WAL_SEGSZ to a general SHOW command
> in the next version of the patch.

Could you split things? There could be one patch to introduce the SHOW
command, and one on top of it for your patch to be able to change the
WAL segment size wiht initdb.
-- 
Michael



Re: [HACKERS] increasing the default WAL segment size

From
Jim Nasby
Date:
On 1/4/17 10:03 PM, David Rowley wrote:
>> I recall taht pow(x, 2) and x * x result usually in the same assembly
>> code, but pow() can never be more optimal than a simple
>> multiplication. So I'd think that it is wiser to avoid it in this code
>> path. Documentation is missing for the new replication command
>> SHOW_WAL_SEG. Actually, why not just having an equivalent of the SQL
>> command and be able to query parameter values?
> This would probably be nicer written using a bitwise trick to ensure
> that no lesser significant bits are set. If it's a power of 2, then
> subtracting 1 should have all the lesser significant bits as 1, so
> binary ANDing to that should be 0. i.e no common bits.
>
> Something like:
>
> /* ensure segment size is a power of 2 */
> if ((wal_segment_size & (wal_segment_size - 1)) != 0)
> {
>    fprintf(stderr, _("%s: WAL segment size must be in the power of
> 2\n"), progname);
>    exit(1);
> }
>
> There's a similar trick in bitmapset.c for RIGHTMOST_ONE, so looks
> like we already have assumptions about two's complement arithmetic

Well, now that there's 3 places that need to do almost the same thing, I 
think it'd be best to just centralize this somewhere. I realize that's 
not going to save any significant amount of code, but it would make it 
crystal clear what's going on (assuming the excellent comment above 
RIGHTMOST_ONE was kept).
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: [HACKERS] increasing the default WAL segment size

From
Jim Nasby
Date:
On 1/5/17 5:38 AM, Beena Emerson wrote:
> On Tue, Jan 3, 2017 at 2:53 AM, Jim Nasby <Jim.Nasby@bluetreble.com
> <mailto:Jim.Nasby@bluetreble.com>> wrote:
>     General comments:
>     There was some discussion about the impact of this on small
>     installs, particularly around min_wal_size. The concern was that
...
> The patch maintains the current XLOG_SEG_SIZE of 16MB as the default.
> Only the capability to change its value has been moved for configure to
> initdb.

Ah, I missed that. Thanks for clarifying.

>     It's not clear from the thread that there is consensus that this
>     feature is desired. In particular, the performance aspects of
>     changing segment size from a C constant to a variable are in
>     question. Someone with access to large hardware should test that.
>     Andres[1] and Robert[2] did suggest that the option could be changed
>     to a bitshift, which IMHO would also solve some sanity-checking issues.

Are you going to change to a bitshift in the next patch?

>     +        * initdb passes the WAL segment size in an environment
>     variable. We don't
>     +        * bother doing any sanity checking, we already check in
>     initdb that the
>     +        * user gives a sane value.
>
>     That doesn't seem like a good idea to me. If anything, the backend
>     should sanity-check and initdb just rely on that. Perhaps this is
>     how other initdb options work, but it still seems bogus. In
>     particular, verifying the size is a power of 2 seems important, as
>     failing that would probably be ReallyBad(tm).
>
>     The patch also blindly trusts the value read from the control file;
>     I'm not sure if that's standard procedure or not, but ISTM it'd be
>     worth sanity-checking that as well.
>
>
> There is a CRC check to detect error in the file. I think all the
> ControlFile values are used directly and not re-verified.

Sounds good. I do still think the variable from initdb should be 
sanity-checked.

>     The patch leaves the base GUC units for min_wal_size and
>     max_wal_size as the # of segments. I'm not sure if that's a great idea.
>
>
> I think we can leave it as is. This is used in
> CalculateCheckpontSegments and in XLOGfileslop to calculate the segment
> numbers.

My concern here is that we just changed from segments to KB for all the 
checkpoint settings, and this is introducing segments back in, but ...

> I agree pretty_print_kb would have been a better for this function.
> However, I have realised that using the show hook and this function is
> not suitable and have found a better way of handling the removal of
> GUC_UNIT_XSEGS which no longer needs this function : using the
> GUC_UNIT_KB, convert the value in bytes to wal segment count instead in
> the assign hook. The next version of patch will use this.

... it sounds like you're going back to exposing KB to users, and that's 
all that really matters.

>     IMHO it'd be better to use the n & (n-1) check detailed at [3].

See my other email about that.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Sat, Jan 7, 2017 at 7:45 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> Well, now that there's 3 places that need to do almost the same thing, I
> think it'd be best to just centralize this somewhere. I realize that's not
> going to save any significant amount of code, but it would make it crystal
> clear what's going on (assuming the excellent comment above RIGHTMOST_ONE
> was kept).

Hmm.  This sounds a lot like what fls() and my_log2() also do.  I've
been quietly advocating for fls() because we only provide an
implementation in src/port if the operating system doesn't have it,
and the operating system may have an implementation that optimizes to
a single machine-language instruction (bsrl on x86, I think, see
4f658dc851a73fc309a61be2503c29ed78a1592e).  But the fact that our
src/port implementation uses a loop instead of the RIGHTMOST_ONE()
trick seems non-optimal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
David Rowley
Date:
On 10 January 2017 at 07:40, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Jan 7, 2017 at 7:45 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>> Well, now that there's 3 places that need to do almost the same thing, I
>> think it'd be best to just centralize this somewhere. I realize that's not
>> going to save any significant amount of code, but it would make it crystal
>> clear what's going on (assuming the excellent comment above RIGHTMOST_ONE
>> was kept).
>
> Hmm.  This sounds a lot like what fls() and my_log2() also do.  I've
> been quietly advocating for fls() because we only provide an
> implementation in src/port if the operating system doesn't have it,
> and the operating system may have an implementation that optimizes to
> a single machine-language instruction (bsrl on x86, I think, see
> 4f658dc851a73fc309a61be2503c29ed78a1592e).  But the fact that our
> src/port implementation uses a loop instead of the RIGHTMOST_ONE()
> trick seems non-optimal.

It does really sound like we need a bitutils.c as mentioned in [1].
It would be good to make use of GCC's __builtin_popcount [2] instead
of the number_of_ones[] array in bitmapset.c. It should be a bit
faster and less cache polluting.

[1] https://www.postgresql.org/message-id/14578.1462595165@sss.pgh.pa.us
[2] https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html


-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] increasing the default WAL segment size

From
Michael Paquier
Date:
On Sun, Jan 8, 2017 at 9:52 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>> I agree pretty_print_kb would have been a better for this function.
>> However, I have realised that using the show hook and this function is
>> not suitable and have found a better way of handling the removal of
>> GUC_UNIT_XSEGS which no longer needs this function : using the
>> GUC_UNIT_KB, convert the value in bytes to wal segment count instead in
>> the assign hook. The next version of patch will use this.
>
>
> ... it sounds like you're going back to exposing KB to users, and that's all
> that really matters.
>
>>     IMHO it'd be better to use the n & (n-1) check detailed at [3].

That would be better.

So I am looking at the proposed patch, though there have been reviews
the patch was in "Needs Review" state, and as far as I can see it is a
couple of things for frontends. Just by grepping for XLOG_SEG_SIZE I
have spotted the following problems:
- pg_standby uses it to know about the next segment available.
- pg_receivexlog still uses it in segment handling.
It may be a good idea to just remove XLOG_SEG_SIZE and fix the code
paths that fail to compile without it, frontend utilities included
because a lot of them now rely on the value coded in xlog_internal.h,
but with this patch the value is set up in the context of initdb. And
this would induce major breakages in many backup tools, pg_rman coming
first in mind... We could replace it with for example a macro that
frontends could use to check if the size of the WAL segment is in a
valid range if the tool does not have direct access to the Postgres
instance (aka the size of the WAL segment used there) as there are as
well offline tools.

-#define XLogSegSize        ((uint32) XLOG_SEG_SIZE)
+
+extern uint32 XLogSegSize;
+#define XLOG_SEG_SIZE XLogSegSize
This bit is really bad for frontend declaring xlog_internal.h...

--- a/src/bin/pg_test_fsync/pg_test_fsync.c
+++ b/src/bin/pg_test_fsync/pg_test_fsync.c
@@ -62,7 +62,7 @@ static const char *progname;
static int secs_per_test = 5;static int needs_unlink = 0;
-static char full_buf[XLOG_SEG_SIZE],
+static char full_buf[DEFAULT_XLOG_SEG_SIZE],
This would make sense as a new option of pg_test_fsync.

A performance study would be a good idea as well. Regarding the
generic SHOW command in the replication protocol, I may do it for next
CF, I have use cases for it in my pocket.
-- 
Michael



Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:


On Tue, Jan 17, 2017 at 12:18 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Sun, Jan 8, 2017 at 9:52 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>> I agree pretty_print_kb would have been a better for this function.
>> However, I have realised that using the show hook and this function is
>> not suitable and have found a better way of handling the removal of
>> GUC_UNIT_XSEGS which no longer needs this function : using the
>> GUC_UNIT_KB, convert the value in bytes to wal segment count instead in
>> the assign hook. The next version of patch will use this.
>
>
> ... it sounds like you're going back to exposing KB to users, and that's all
> that really matters.
>
>>     IMHO it'd be better to use the n & (n-1) check detailed at [3].

That would be better.

So I am looking at the proposed patch, though there have been reviews
the patch was in "Needs Review" state, and as far as I can see it is a
couple of things for frontends. Just by grepping for XLOG_SEG_SIZE I
have spotted the following problems:
- pg_standby uses it to know about the next segment available.

Yes. I am aware of this and had mentioned it in my post.
 
- pg_receivexlog still uses it in segment handling.
It may be a good idea to just remove XLOG_SEG_SIZE and fix the code
paths that fail to compile without it, frontend utilities included
because a lot of them now rely on the value coded in xlog_internal.h,
but with this patch the value is set up in the context of initdb. And
this would induce major breakages in many backup tools, pg_rman coming
first in mind... We could replace it with for example a macro that
frontends could use to check if the size of the WAL segment is in a
valid range if the tool does not have direct access to the Postgres
instance (aka the size of the WAL segment used there) as there are as
well offline tools.

I will see whats the best way to do this. 
 

-#define XLogSegSize        ((uint32) XLOG_SEG_SIZE)
+
+extern uint32 XLogSegSize;
+#define XLOG_SEG_SIZE XLogSegSize
This bit is really bad for frontend declaring xlog_internal.h...

--- a/src/bin/pg_test_fsync/pg_test_fsync.c
+++ b/src/bin/pg_test_fsync/pg_test_fsync.c
@@ -62,7 +62,7 @@ static const char *progname;

 static int secs_per_test = 5;
 static int needs_unlink = 0;
-static char full_buf[XLOG_SEG_SIZE],
+static char full_buf[DEFAULT_XLOG_SEG_SIZE],
This would make sense as a new option of pg_test_fsync.

A performance study would be a good idea as well. Regarding the
generic SHOW command in the replication protocol, I may do it for next
CF, I have use cases for it in my pocket.

 
Thank you for your review. 

I have already made patch for the generic SHOW replication command (attached) and am working on the new initdb patch based on that.
I have not yet fixed the pg_standby issue. I am trying to address all the comments and bugs still. 


--
 

Beena Emerson

Have a Great Day!
Attachment

Re: [HACKERS] increasing the default WAL segment size

From
Michael Paquier
Date:
On Tue, Jan 17, 2017 at 4:06 PM, Beena Emerson <memissemerson@gmail.com> wrote:
> I have already made patch for the generic SHOW replication command
> (attached) and am working on the new initdb patch based on that.
> I have not yet fixed the pg_standby issue. I am trying to address all the
> comments and bugs still.

Having documentation for this patch in protocol.sgml would be nice.
-- 
Michael



Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:


On Tue, Jan 17, 2017 at 12:38 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Tue, Jan 17, 2017 at 4:06 PM, Beena Emerson <memissemerson@gmail.com> wrote:
> I have already made patch for the generic SHOW replication command
> (attached) and am working on the new initdb patch based on that.
> I have not yet fixed the pg_standby issue. I am trying to address all the
> comments and bugs still.

Having documentation for this patch in protocol.sgml would be nice.

Yes. I will add that. 

--
Thank you, 

Beena Emerson

Have a Great Day!

Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:


On Tue, Jan 17, 2017 at 12:50 PM, Beena Emerson <memissemerson@gmail.com> wrote:


On Tue, Jan 17, 2017 at 12:38 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Tue, Jan 17, 2017 at 4:06 PM, Beena Emerson <memissemerson@gmail.com> wrote:
> I have already made patch for the generic SHOW replication command
> (attached) and am working on the new initdb patch based on that.
> I have not yet fixed the pg_standby issue. I am trying to address all the
> comments and bugs still.

Having documentation for this patch in protocol.sgml would be nice.

Yes. I will add that. 



PFA the patch with the documentation included. 


--
Thank you, 

Beena Emerson

Have a Great Day!
Attachment

Re: [HACKERS] increasing the default WAL segment size

From
Michael Paquier
Date:
On Tue, Jan 17, 2017 at 7:19 PM, Beena Emerson <memissemerson@gmail.com> wrote:
> PFA the patch with the documentation included.

It is usually better to keep doc lines under control of 72-80
characters if possible.

+   /* column 1: Wal segment size */
+   len = strlen(value);
+   pq_sendint(&buf, len, 4);
+   pq_sendbytes(&buf, value, len);
Bip. Error. This is a parameter value, not the WAL segment size.

Except those minor points this works as expected, and is consistent
with non-replication sessions, so that's nice by itself:
=# create user toto replication login;
CREATE ROLE

$ psql "replication=1" -U toto
=> SHOW foo;
ERROR:  42704: unrecognized configuration parameter "foo"
LOCATION:  GetConfigOptionByName, guc.c:7968
Time: 0.245 ms
=> SHOW log_directory;
ERROR:  42501: must be superuser to examine "log_directory"
LOCATION:  GetConfigOptionByName, guc.c:7974

I think that we could get a committer look at that at the least.
-- 
Michael



Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Tue, Jan 17, 2017 at 8:54 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> I think that we could get a committer look at that at the least.

This is sort of awkward, because it would be nice to reuse the code
for the existing SHOW command rather than reinventing the wheel, but
it's not very easy to do that, primarily because there are a number of
places which rely on being able to do catalog access, which is not
possible with a replication connection in hand.  I got it working
after hacking various things, so I have a complete list of the
problems involved:

1. ShowGUCConfigOption() calls TupleDescInitEntry(), which does a
catcache lookup to get the types pg_type entry.  This isn't any big
problem; I hacked around it by adding a TupleDescInitBuiltinEntry()
which knows about the types that guc.c (and likely other builtins)
care about.

2. SendRowDescriptionMessage calls getBaseTypeAndTypmod(), which does
a catcache lookup to figure out whether the type is a domain.  I
short-circuited it by having it assume anything with an OID less than
10000 is not a domain.

3. printtup_prepare_info calls getTypeOutputInfo(), which does a
catcache lookup to figure out the type output function's OID and
whether it's a varlena.  I bypassed that with an unspeakable hack.

4. printtup.c's code in general assumes that a DR_printtup always has
a portal.  It doesn't seem to mind if the portal doesn't contain
anything very meaningful, but it has to have one.  This problem has
nothing to do with catalog access, but it's a problem.  I solved it by
(surprise) creating a portal, but I am not sure that's a very good
idea.

Problems 2-4 actually have to do with a DestReceiver of type
DestRemote really, really wanting to have an associated Portal and
database connection, so one approach is to create a stripped-down
DestReceiver that doesn't care about those things and then passing
that to GetPGVariable.  That's not any less code than the way Beena
coded it, of course; it's probably more.  On the other hand, the
stripped-down DestReciever implementation is more likely to be usable
the next time somebody wants to add a new replication command, whereas
this ad-hoc code to directly construct protocol messages will not be
reusable.

Opinions?  (Hacked-up patch attached for educational purposes.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Wed, Jan 18, 2017 at 12:42 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Problems 2-4 actually have to do with a DestReceiver of type
> DestRemote really, really wanting to have an associated Portal and
> database connection, so one approach is to create a stripped-down
> DestReceiver that doesn't care about those things and then passing
> that to GetPGVariable.

I tried that and it worked out pretty well, so I'm inclined to go with
this approach.  Proposed patches attached.  0001 adds the new
DestReceiver type, and 0002 is a revised patch to implement the SHOW
command itself.

Thoughts, comments?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] increasing the default WAL segment size

From
Michael Paquier
Date:
On Fri, Jan 20, 2017 at 12:49 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jan 18, 2017 at 12:42 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Problems 2-4 actually have to do with a DestReceiver of type
>> DestRemote really, really wanting to have an associated Portal and
>> database connection, so one approach is to create a stripped-down
>> DestReceiver that doesn't care about those things and then passing
>> that to GetPGVariable.
>
> I tried that and it worked out pretty well, so I'm inclined to go with
> this approach.  Proposed patches attached.  0001 adds the new
> DestReceiver type, and 0002 is a revised patch to implement the SHOW
> command itself.
>
> Thoughts, comments?

This looks like a sensible approach to me. DestRemoteSimple could be
useful for background workers that are not connected to a database as
well. Isn't there a problem with PGC_REAL parameters?
-- 
Michael



Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Fri, Jan 20, 2017 at 2:34 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Fri, Jan 20, 2017 at 12:49 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Jan 18, 2017 at 12:42 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> Problems 2-4 actually have to do with a DestReceiver of type
>>> DestRemote really, really wanting to have an associated Portal and
>>> database connection, so one approach is to create a stripped-down
>>> DestReceiver that doesn't care about those things and then passing
>>> that to GetPGVariable.
>>
>> I tried that and it worked out pretty well, so I'm inclined to go with
>> this approach.  Proposed patches attached.  0001 adds the new
>> DestReceiver type, and 0002 is a revised patch to implement the SHOW
>> command itself.
>>
>> Thoughts, comments?
>
> This looks like a sensible approach to me. DestRemoteSimple could be
> useful for background workers that are not connected to a database as
> well. Isn't there a problem with PGC_REAL parameters?

No, because the output of SHOW is always of type text, regardless of
the type of the GUC.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
Michael Paquier
Date:
On Sat, Jan 21, 2017 at 4:50 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jan 20, 2017 at 2:34 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Fri, Jan 20, 2017 at 12:49 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Wed, Jan 18, 2017 at 12:42 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>> Problems 2-4 actually have to do with a DestReceiver of type
>>>> DestRemote really, really wanting to have an associated Portal and
>>>> database connection, so one approach is to create a stripped-down
>>>> DestReceiver that doesn't care about those things and then passing
>>>> that to GetPGVariable.
>>>
>>> I tried that and it worked out pretty well, so I'm inclined to go with
>>> this approach.  Proposed patches attached.  0001 adds the new
>>> DestReceiver type, and 0002 is a revised patch to implement the SHOW
>>> command itself.
>>>
>>> Thoughts, comments?
>>
>> This looks like a sensible approach to me. DestRemoteSimple could be
>> useful for background workers that are not connected to a database as
>> well. Isn't there a problem with PGC_REAL parameters?
>
> No, because the output of SHOW is always of type text, regardless of
> the type of the GUC.

Thinking about that over night, that looks pretty nice. What would be
nicer though would be to add INT8OID and BYTEAOID in the list, and
convert as well the other replication commands. At the end, I think
that we should finish by being able to remove all pq_* routine
dependencies in walsender.c, saving quite a couple of lines.
-- 
Michael



Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:

Hello,

Please find attached an updated WIP patch. I have incorporated almost all comments. This is to be applied over Robert's patches. I will post performance results later on. 

1. shift (>>) and AND (&) operations: The assign hook of wal_segment_size sets the WalModMask and WalShiftBit. All the modulo and division operations using XLogSegSize has been replaced with these. However, there are many preprocessors which divide with XLogSegSize in xlog_internal.h. I have not changed these because it would mean I will have to reassign the WalShiftBit along with XLogSegSize in all the modules which use these macros. That does not seem to be a good idea. Also, this means shift operator can be used only in couple of places.

2. pg_standby: it deals with WAL files, so I have used the file size to set the XLogSegSize (similar to pg_xlogdump).  Also, macro MaxSegmentsPerLogFile using  XLOG_SEG_SIZE  is now defined in SetWALFileNameForCleanup where it is used. Since XLOG_SEG_SIZE is not preset, the code which throws an message if the file size is greater than XLOG_SEG_SIZE had to be removed.

3. XLOGChooseNumBuffers: This function, called during the creation of Shared Memory Segment, requires XLogSegSize which is set from the ControlFile. Hence temporarily read the ControlFile in XLOGShmemSize before invoking  XLOGChooseNumBuffer. The ControlFile is read again and stored on the Shared Memory later on. 

4. IsValidXLogSegSize: This is a macro to verify the XLogSegSize. This is used in initdb, pg_xlogdump, pg_standby. 

5. Macro for power2: There were couple of ideas to make it centralised.  For now, I have just defined it in xlog_internal.

6. Since CRC is used to verify the ControlFile before reading all the contents from it as is and I do not see the need to re-verify the xlog_seg_size.   

7. min/max_wal_size still take values in KB unit and internally store it as segment count. Though the calculation is now shifted to their respective assign hook as the GUC_UNIT_XSEGS had to be removed.

8. Declaring XLogSegSize: There are 2 internal variables for the same parameter. In original code XLOG_SEG_SIZE is defined in the auto-generated file src/include/pg_config.h. And xlog_internal.h defines:

#define XLogSegSize     ((uint32) XLOG_SEG_SIZE)

To avoid renaming all parts of code, I made the following change in xlog_internal.h

+ extern uint32 XLogSegSize;

+#define XLOG_SEG_SIZE XLogSegSize

 would it be better to just use one variable XLogSegSize everywhere. But few external modules could be using XLOG_SEG_SIZE. Thoughts?

9. Documentation will be added in next version of patch.-- 


Beena Emerson

On Sat, Jan 21, 2017 at 5:30 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Sat, Jan 21, 2017 at 4:50 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jan 20, 2017 at 2:34 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Fri, Jan 20, 2017 at 12:49 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Wed, Jan 18, 2017 at 12:42 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>> Problems 2-4 actually have to do with a DestReceiver of type
>>>> DestRemote really, really wanting to have an associated Portal and
>>>> database connection, so one approach is to create a stripped-down
>>>> DestReceiver that doesn't care about those things and then passing
>>>> that to GetPGVariable.
>>>
>>> I tried that and it worked out pretty well, so I'm inclined to go with
>>> this approach.  Proposed patches attached.  0001 adds the new
>>> DestReceiver type, and 0002 is a revised patch to implement the SHOW
>>> command itself.
>>>
>>> Thoughts, comments?
>>
>> This looks like a sensible approach to me. DestRemoteSimple could be
>> useful for background workers that are not connected to a database as
>> well. Isn't there a problem with PGC_REAL parameters?
>
> No, because the output of SHOW is always of type text, regardless of
> the type of the GUC.

Thinking about that over night, that looks pretty nice. What would be
nicer though would be to add INT8OID and BYTEAOID in the list, and
convert as well the other replication commands. At the end, I think
that we should finish by being able to remove all pq_* routine
dependencies in walsender.c, saving quite a couple of lines.
--
Michael



--
Thank you, 

Beena Emerson

Have a Great Day!
Attachment

Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Fri, Jan 20, 2017 at 7:00 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
>> No, because the output of SHOW is always of type text, regardless of
>> the type of the GUC.
>
> Thinking about that over night, that looks pretty nice. What would be
> nicer though would be to add INT8OID and BYTEAOID in the list, and
> convert as well the other replication commands. At the end, I think
> that we should finish by being able to remove all pq_* routine
> dependencies in walsender.c, saving quite a couple of lines.

Might be worth investigating, but I don't feel any obligation to do
that right now.  Thanks for the review; committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
Michael Paquier
Date:
On Wed, Jan 25, 2017 at 6:58 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jan 20, 2017 at 7:00 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>>> No, because the output of SHOW is always of type text, regardless of
>>> the type of the GUC.
>>
>> Thinking about that over night, that looks pretty nice. What would be
>> nicer though would be to add INT8OID and BYTEAOID in the list, and
>> convert as well the other replication commands. At the end, I think
>> that we should finish by being able to remove all pq_* routine
>> dependencies in walsender.c, saving quite a couple of lines.
>
> Might be worth investigating, but I don't feel any obligation to do
> that right now.  Thanks for the review; committed.

OK, I have done this refactoring effort as attached because I think
that's really worth it. And here are the diff numbers:
 3 files changed, 113 insertions(+), 162 deletions(-)
That's a bit less than what I thought first because of all the
singularities of bytea in its output and the way TIMELINE_HISTORY
takes advantage of the message level routines. Still for
IDENTIFY_SYSTEM, START_REPLICATION and CREATE_REPLICATION_SLOT the
gains in readability are here.

What do you think?
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Tue, Jan 24, 2017 at 10:26 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Jan 25, 2017 at 6:58 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Jan 20, 2017 at 7:00 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>>> No, because the output of SHOW is always of type text, regardless of
>>>> the type of the GUC.
>>>
>>> Thinking about that over night, that looks pretty nice. What would be
>>> nicer though would be to add INT8OID and BYTEAOID in the list, and
>>> convert as well the other replication commands. At the end, I think
>>> that we should finish by being able to remove all pq_* routine
>>> dependencies in walsender.c, saving quite a couple of lines.
>>
>> Might be worth investigating, but I don't feel any obligation to do
>> that right now.  Thanks for the review; committed.
>
> OK, I have done this refactoring effort as attached because I think
> that's really worth it. And here are the diff numbers:
>  3 files changed, 113 insertions(+), 162 deletions(-)
> That's a bit less than what I thought first because of all the
> singularities of bytea in its output and the way TIMELINE_HISTORY
> takes advantage of the message level routines. Still for
> IDENTIFY_SYSTEM, START_REPLICATION and CREATE_REPLICATION_SLOT the
> gains in readability are here.

Seems OK to me, but I think I'd want to hear a few other opinions
before committing it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
Andres Freund
Date:
On 2017-01-26 13:16:13 -0500, Robert Haas wrote:
> > OK, I have done this refactoring effort as attached because I think
> > that's really worth it. And here are the diff numbers:
> >  3 files changed, 113 insertions(+), 162 deletions(-)
> > That's a bit less than what I thought first because of all the
> > singularities of bytea in its output and the way TIMELINE_HISTORY
> > takes advantage of the message level routines. Still for
> > IDENTIFY_SYSTEM, START_REPLICATION and CREATE_REPLICATION_SLOT the
> > gains in readability are here.
> 
> Seems OK to me, but I think I'd want to hear a few other opinions
> before committing it.

Just to be absolutely sure: We're talking about Michael's cleanup patch,
not the thread's original topic?

Andres



Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Thu, Jan 26, 2017 at 1:34 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-01-26 13:16:13 -0500, Robert Haas wrote:
>> > OK, I have done this refactoring effort as attached because I think
>> > that's really worth it. And here are the diff numbers:
>> >  3 files changed, 113 insertions(+), 162 deletions(-)
>> > That's a bit less than what I thought first because of all the
>> > singularities of bytea in its output and the way TIMELINE_HISTORY
>> > takes advantage of the message level routines. Still for
>> > IDENTIFY_SYSTEM, START_REPLICATION and CREATE_REPLICATION_SLOT the
>> > gains in readability are here.
>>
>> Seems OK to me, but I think I'd want to hear a few other opinions
>> before committing it.
>
> Just to be absolutely sure: We're talking about Michael's cleanup patch,
> not the thread's original topic?

Correct.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
Andres Freund
Date:
Hi,

On 2017-01-23 11:35:11 +0530, Beena Emerson wrote:
> Please find attached an updated WIP patch. I have incorporated almost all
> comments. This is to be applied over Robert's patches. I will post
> performance results later on.
>
> 1. shift (>>) and AND (&) operations: The assign hook of wal_segment_size
> sets the WalModMask and WalShiftBit. All the modulo and division operations
> using XLogSegSize has been replaced with these. However, there are many
> preprocessors which divide with XLogSegSize in xlog_internal.h. I have not
> changed these because it would mean I will have to reassign the WalShiftBit
> along with XLogSegSize in all the modules which use these macros. That does
> not seem to be a good idea. Also, this means shift operator can be used
> only in couple of places.

I think it'd be better not to have XLogSegSize anymore. Silently
changing a macros behaviour from being a compile time constant to
something runtime configurable is a bad idea.


> 8. Declaring XLogSegSize: There are 2 internal variables for the same
> parameter. In original code XLOG_SEG_SIZE is defined in the auto-generated
> file src/include/pg_config.h. And xlog_internal.h defines:
>
> #define XLogSegSize     ((uint32) XLOG_SEG_SIZE)
>
> To avoid renaming all parts of code, I made the following change in
> xlog_internal.h
>
> + extern uint32 XLogSegSize;
>
> +#define XLOG_SEG_SIZE XLogSegSize
>
>  would it be better to just use one variable XLogSegSize everywhere. But
> few external modules could be using XLOG_SEG_SIZE. Thoughts?

They'll quite possibly break with configurable size anyway.  So I'd
rather have those broken explicitly.



> +/*
> + * These variables are set in assign_wal_segment_size
> + *
> + * WalModMask: It is an AND mask for XLogSegSize to allow for faster modulo
> + *        operations using it.
> + *
> + * WalShiftBit: It is an shift bit for XLogSegSize to allow for faster
> + *        division operations using it.
> + *
> + * UsableBytesInSegment: It is the number of bytes in a WAL segment usable for
> + *        WAL data.
> + */
> +uint32        WalModMask;
> +static int    UsableBytesInSegment;
> +static int    WalShiftBit;

This could use some editorializing. "Faster modulo operations" isn't an
explaining how/why it's actually being used. Same for WalShiftBit.

>  /*
>   * Private, possibly out-of-date copy of shared LogwrtResult.
> @@ -957,6 +975,7 @@ XLogInsertRecord(XLogRecData *rdata,
>      if (!XLogInsertAllowed())
>          elog(ERROR, "cannot make new WAL entries during recovery");
>
> +
>      /*----------
>       *

Spurious newline change.

>          if (ptr % XLOG_BLCKSZ == SizeOfXLogShortPHD &&
> -            ptr % XLOG_SEG_SIZE > XLOG_BLCKSZ)
> +            (ptr & WalModMask) > XLOG_BLCKSZ)
>              initializedUpto = ptr - SizeOfXLogShortPHD;
>          else if (ptr % XLOG_BLCKSZ == SizeOfXLogLongPHD &&
> -                 ptr % XLOG_SEG_SIZE < XLOG_BLCKSZ)
> +                 (ptr & WalModMask) < XLOG_BLCKSZ)
>              initializedUpto = ptr - SizeOfXLogLongPHD;
>          else
>              initializedUpto = ptr;

How about we introduce a XLogSegmentOffset(XLogRecPtr) function like
macro in a first patch?  That'll reduce the amount of change in the
commit actually changing things quite noticeably, and makes it easier to
adjust things later.  I see very little benefit for in-place usage of
either % XLOG_SEG_SIZE or & WalModMask.


> @@ -1794,6 +1813,7 @@ XLogBytePosToRecPtr(uint64 bytepos)
>      uint32        seg_offset;
>      XLogRecPtr    result;
>
> +
>      fullsegs = bytepos / UsableBytesInSegment;
>      bytesleft = bytepos % UsableBytesInSegment;

spurious change.

> @@ -1878,7 +1898,7 @@ XLogRecPtrToBytePos(XLogRecPtr ptr)
>
>      XLByteToSeg(ptr, fullsegs);
>
> -    fullpages = (ptr % XLOG_SEG_SIZE) / XLOG_BLCKSZ;
> +    fullpages = (ptr & WalModMask) / XLOG_BLCKSZ;
>      offset = ptr % XLOG_BLCKSZ;
>
>      if (fullpages == 0)
> @@ -2043,7 +2063,7 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic)
>          /*
>           * If first page of an XLOG segment file, make it a long header.
>           */
> -        if ((NewPage->xlp_pageaddr % XLogSegSize) == 0)
> +        if ((NewPage->xlp_pageaddr & WalModMask) == 0)
>          {
>              XLogLongPageHeader NewLongPage = (XLogLongPageHeader) NewPage;
>
> @@ -2095,6 +2115,7 @@ CalculateCheckpointSegments(void)
>       *      number of segments consumed between checkpoints.
>       *-------
>       */
> +
>      target = (double) max_wal_size / (2.0 + CheckPointCompletionTarget);

spurious change.


>  void
> +assign_wal_segment_size(int newval, void *extra)
> +{
> +    /*
> +     * During system initialization, XLogSegSize is not set so we use
> +     * DEFAULT_XLOG_SEG_SIZE instead.
> +     */
> +    int    WalSegSize = (XLogSegSize == 0) ? DEFAULT_XLOG_SEG_SIZE : XLOG_SEG_SIZE;
> +
> +    wal_segment_size = newval;
> +    UsableBytesInSegment = (wal_segment_size * UsableBytesInPage) -
> +                           (SizeOfXLogLongPHD - SizeOfXLogShortPHD);
> +    WalModMask = WalSegSize - 1;
> +
> +    /* Set the WalShiftBit */
> +    WalShiftBit = 0;
> +    while (WalSegSize > 1)
> +    {
> +        WalSegSize = WalSegSize >> 1;
> +        WalShiftBit++;
> +    }
> +}

Hm. Are GUC hooks a good way to compute the masks?  Interdependent GUCs
are unfortunately not working well, and several GUCs might end up
depending on these.  I think it might be better to assign the variables
somewhere early in StartupXLOG() or such.


> +
> +void
> +assign_min_wal_size(int newval, void *extra)
> +{
> +    /*
> +     * During system initialization, XLogSegSize is not set so we use
> +     * DEFAULT_XLOG_SEG_SIZE instead.
> +     *
> +     * min_wal_size is in kB and XLogSegSize is in bytes and so it is
> +     * converted to kB for the calculation.
> +     */
> +    int    WalSegSize = (XLogSegSize == 0) ? (DEFAULT_XLOG_SEG_SIZE / 1024) :
> +                                          (XLOG_SEG_SIZE / 1024);
> +
> +    min_wal_size = newval / WalSegSize;
> +}
> +
> +void
>  assign_max_wal_size(int newval, void *extra)
>  {
> -    max_wal_size = newval;
> +    /*
> +     * During system initialization, XLogSegSize is not set so we use
> +     * DEFAULT_XLOG_SEG_SIZE instead.
> +     *
> +     * max_wal_size is in kB and XLogSegSize is in bytes and so it is
> +     * converted to bytes for the calculation.
> +     */
> +    int    WalSegSize = (XLogSegSize == 0) ? (DEFAULT_XLOG_SEG_SIZE / 1024) :
> +                                          (XLOG_SEG_SIZE / 1024);
> +
> +    max_wal_size = newval / WalSegSize;
>      CalculateCheckpointSegments();
>  }

I don't think it's a good idea to have GUCs that are initially set to
the wrong value and such.  How about just storing bytes, and converting
into segments upon use?



> @@ -2135,8 +2205,8 @@ XLOGfileslop(XLogRecPtr PriorRedoPtr)
>       * correspond to. Always recycle enough segments to meet the minimum, and
>       * remove enough segments to stay below the maximum.
>       */
> -    minSegNo = PriorRedoPtr / XLOG_SEG_SIZE + min_wal_size - 1;
> -    maxSegNo = PriorRedoPtr / XLOG_SEG_SIZE + max_wal_size - 1;
> +    minSegNo = (PriorRedoPtr >> WalShiftBit) + min_wal_size - 1;
> +    maxSegNo = (PriorRedoPtr >> WalShiftBit) + max_wal_size - 1;

I think a macro would be good here too (same prerequisite patch as above).

> @@ -4677,8 +4749,18 @@ XLOGShmemSize(void)
>       */
>      if (XLOGbuffers == -1)
>      {
> -        char        buf[32];
> -
> +        /*
> +         * The calculation of XLOGbuffers, requires the now run-time parameter
> +         * XLogSegSize from the ControlFile. The value determined here is
> +         * required to create the shared memory segment. Hence, temporarily
> +         * allocating space and reading ControlFile here.
> +         */

I don't like comments containing things like "the now run-time paramter"
much - they are likely going to still be there in 10 years, and will be
hard to understand.


But anyway, how about we simply remove the "max one segment" boundary
instead? I don't think it's actually very meaningful - several people
posted benchmarks with more than one segment being beneficial.


> diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c
> index 31290d3..87efc3c 100644
> --- a/src/bin/pg_basebackup/streamutil.c
> +++ b/src/bin/pg_basebackup/streamutil.c
> @@ -238,6 +238,59 @@ GetConnection(void)
>  }
>
>  /*
> + * Run the SHOW_WAL_SEGMENT_SIZE command to set the XLogSegSize
> + */
> +bool
> +SetXLogSegSize(PGconn *conn)
> +{

I think this is a confusing function name, because it sounds like
you're setting the SegSize remotely or such. I think making it
XLogRecPtr RetrieveXLogSegSize(conn); or such would lead to better code.

> diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
> index 963802e..4ceebdc 100644
> --- a/src/bin/pg_resetxlog/pg_resetxlog.c
> +++ b/src/bin/pg_resetxlog/pg_resetxlog.c
> @@ -57,6 +57,7 @@
>  #include "storage/large_object.h"
>  #include "pg_getopt.h"
>
> +uint32        XLogSegSize;

This seems like a bad idea - having the same local variable both in
frontend and backend programs seems like a recipe for disaster.


Greetings,

Andres Freund



Re: [HACKERS] increasing the default WAL segment size

From
Andres Freund
Date:
Hi,

On 2017-01-25 12:26:21 +0900, Michael Paquier wrote:
> diff --git a/src/backend/access/common/tupdesc.c b/src/backend/access/common/tupdesc.c
> index 083c0303dc..2eb3a420ac 100644
> --- a/src/backend/access/common/tupdesc.c
> +++ b/src/backend/access/common/tupdesc.c
> @@ -629,6 +629,14 @@ TupleDescInitBuiltinEntry(TupleDesc desc,
>              att->attstorage = 'p';
>              att->attcollation = InvalidOid;
>              break;
> +
> +        case INT8OID:
> +            att->attlen = 8;
> +            att->attbyval = true;
> +            att->attalign = 'd';
> +            att->attstorage = 'p';
> +            att->attcollation = InvalidOid;
> +            break;
>      }
>  }

INT8 isn't unconditionally byval, is it?

>      /* slot_name */
> -    len = strlen(NameStr(MyReplicationSlot->data.name));
> -    pq_sendint(&buf, len, 4);    /* col1 len */
> -    pq_sendbytes(&buf, NameStr(MyReplicationSlot->data.name), len);
> +    values[0] = PointerGetDatum(cstring_to_text(NameStr(MyReplicationSlot->data.name)));

That seems a bit long.


I've not done like the most careful review ever, but I'm in favor of the
general change (provided the byval thing is fixed obviously).

Greetings,

Andres Freund



Re: [HACKERS] increasing the default WAL segment size

From
Michael Paquier
Date:
On Fri, Jan 27, 2017 at 4:20 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-01-25 12:26:21 +0900, Michael Paquier wrote:
>> diff --git a/src/backend/access/common/tupdesc.c b/src/backend/access/common/tupdesc.c
>> index 083c0303dc..2eb3a420ac 100644
>> --- a/src/backend/access/common/tupdesc.c
>> +++ b/src/backend/access/common/tupdesc.c
>> @@ -629,6 +629,14 @@ TupleDescInitBuiltinEntry(TupleDesc desc,
>>                       att->attstorage = 'p';
>>                       att->attcollation = InvalidOid;
>>                       break;
>> +
>> +             case INT8OID:
>> +                     att->attlen = 8;
>> +                     att->attbyval = true;
>> +                     att->attalign = 'd';
>> +                     att->attstorage = 'p';
>> +                     att->attcollation = InvalidOid;
>> +                     break;
>>       }
>>  }
>
> INT8 isn't unconditionally byval, is it?

Doh. Of course.

>>       /* slot_name */
>> -     len = strlen(NameStr(MyReplicationSlot->data.name));
>> -     pq_sendint(&buf, len, 4);       /* col1 len */
>> -     pq_sendbytes(&buf, NameStr(MyReplicationSlot->data.name), len);
>> +     values[0] = PointerGetDatum(cstring_to_text(NameStr(MyReplicationSlot->data.name)));
>
> That seems a bit long.

Sure. What about that:
-   len = strlen(NameStr(MyReplicationSlot->data.name));
-   pq_sendint(&buf, len, 4);   /* col1 len */
-   pq_sendbytes(&buf, NameStr(MyReplicationSlot->data.name), len);
+   slot_name = NameStr(MyReplicationSlot->data.name);
+   values[0] = PointerGetDatum(cstring_to_text(slot_name));

> I've not done like the most careful review ever, but I'm in favor of the
> general change (provided the byval thing is fixed obviously).

Thanks for the review.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:
Hello Andres,

Thank you for your review.

On Fri, Jan 27, 2017 at 12:39 AM, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2017-01-23 11:35:11 +0530, Beena Emerson wrote:
> Please find attached an updated WIP patch. I have incorporated almost all
> comments. This is to be applied over Robert's patches. I will post
> performance results later on.
>
> 1. shift (>>) and AND (&) operations: The assign hook of wal_segment_size
> sets the WalModMask and WalShiftBit. All the modulo and division operations
> using XLogSegSize has been replaced with these. However, there are many
> preprocessors which divide with XLogSegSize in xlog_internal.h. I have not
> changed these because it would mean I will have to reassign the WalShiftBit
> along with XLogSegSize in all the modules which use these macros. That does
> not seem to be a good idea. Also, this means shift operator can be used
> only in couple of places.

I think it'd be better not to have XLogSegSize anymore. Silently
changing a macros behaviour from being a compile time constant to
something runtime configurable is a bad idea.

I dont think I understood u clearly. You mean convert the macros using XLogSegSize to functions?
 
> 8. Declaring XLogSegSize: There are 2 internal variables for the same
> parameter. In original code XLOG_SEG_SIZE is defined in the auto-generated
> file src/include/pg_config.h. And xlog_internal.h defines:
>
> #define XLogSegSize     ((uint32) XLOG_SEG_SIZE)
>
> To avoid renaming all parts of code, I made the following change in
> xlog_internal.h
>
> + extern uint32 XLogSegSize;
>
> +#define XLOG_SEG_SIZE XLogSegSize
>
>  would it be better to just use one variable XLogSegSize everywhere. But
> few external modules could be using XLOG_SEG_SIZE. Thoughts?

They'll quite possibly break with configurable size anyway.  So I'd
rather have those broken explicitly.

Ok. I will remove the XLOG_SEGS_SIZE variable then?


> +/*
> + * These variables are set in assign_wal_segment_size
> + *
> + * WalModMask: It is an AND mask for XLogSegSize to allow for faster modulo
> + *           operations using it.
> + *
> + * WalShiftBit: It is an shift bit for XLogSegSize to allow for faster
> + *           division operations using it.
> + *
> + * UsableBytesInSegment: It is the number of bytes in a WAL segment usable for
> + *           WAL data.
> + */
> +uint32               WalModMask;
> +static int   UsableBytesInSegment;
> +static int   WalShiftBit;

This could use some editorializing. "Faster modulo operations" isn't an
explaining how/why it's actually being used. Same for WalShiftBit.

 I will change these comments.
 

>  /*
>   * Private, possibly out-of-date copy of shared LogwrtResult.
> @@ -957,6 +975,7 @@ XLogInsertRecord(XLogRecData *rdata,
>       if (!XLogInsertAllowed())
>               elog(ERROR, "cannot make new WAL entries during recovery");
>
> +
>       /*----------
>        *

Spurious newline change.

>               if (ptr % XLOG_BLCKSZ == SizeOfXLogShortPHD &&
> -                     ptr % XLOG_SEG_SIZE > XLOG_BLCKSZ)
> +                     (ptr & WalModMask) > XLOG_BLCKSZ)
>                       initializedUpto = ptr - SizeOfXLogShortPHD;
>               else if (ptr % XLOG_BLCKSZ == SizeOfXLogLongPHD &&
> -                              ptr % XLOG_SEG_SIZE < XLOG_BLCKSZ)
> +                              (ptr & WalModMask) < XLOG_BLCKSZ)
>                       initializedUpto = ptr - SizeOfXLogLongPHD;
>               else
>                       initializedUpto = ptr;

How about we introduce a XLogSegmentOffset(XLogRecPtr) function like
macro in a first patch?  That'll reduce the amount of change in the
commit actually changing things quite noticeably, and makes it easier to
adjust things later.  I see very little benefit for in-place usage of
either % XLOG_SEG_SIZE or & WalModMask.

I will check this.
 


> @@ -1794,6 +1813,7 @@ XLogBytePosToRecPtr(uint64 bytepos)
>       uint32          seg_offset;
>       XLogRecPtr      result;
>
> +
>       fullsegs = bytepos / UsableBytesInSegment;
>       bytesleft = bytepos % UsableBytesInSegment;

spurious change.

> @@ -1878,7 +1898,7 @@ XLogRecPtrToBytePos(XLogRecPtr ptr)
>
>       XLByteToSeg(ptr, fullsegs);
>
> -     fullpages = (ptr % XLOG_SEG_SIZE) / XLOG_BLCKSZ;
> +     fullpages = (ptr & WalModMask) / XLOG_BLCKSZ;
>       offset = ptr % XLOG_BLCKSZ;
>
>       if (fullpages == 0)
> @@ -2043,7 +2063,7 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic)
>               /*
>                * If first page of an XLOG segment file, make it a long header.
>                */
> -             if ((NewPage->xlp_pageaddr % XLogSegSize) == 0)
> +             if ((NewPage->xlp_pageaddr & WalModMask) == 0)
>               {
>                       XLogLongPageHeader NewLongPage = (XLogLongPageHeader) NewPage;
>
> @@ -2095,6 +2115,7 @@ CalculateCheckpointSegments(void)
>        *        number of segments consumed between checkpoints.
>        *-------
>        */
> +
>       target = (double) max_wal_size / (2.0 + CheckPointCompletionTarget);

spurious change.


>  void
> +assign_wal_segment_size(int newval, void *extra)
> +{
> +     /*
> +      * During system initialization, XLogSegSize is not set so we use
> +      * DEFAULT_XLOG_SEG_SIZE instead.
> +      */
> +     int     WalSegSize = (XLogSegSize == 0) ? DEFAULT_XLOG_SEG_SIZE : XLOG_SEG_SIZE;
> +
> +     wal_segment_size = newval;
> +     UsableBytesInSegment = (wal_segment_size * UsableBytesInPage) -
> +                                                (SizeOfXLogLongPHD - SizeOfXLogShortPHD);
> +     WalModMask = WalSegSize - 1;
> +
> +     /* Set the WalShiftBit */
> +     WalShiftBit = 0;
> +     while (WalSegSize > 1)
> +     {
> +             WalSegSize = WalSegSize >> 1;
> +             WalShiftBit++;
> +     }
> +}

Hm. Are GUC hooks a good way to compute the masks?  Interdependent GUCs
are unfortunately not working well, and several GUCs might end up
depending on these.  I think it might be better to assign the variables
somewhere early in StartupXLOG() or such.

I am not sure about these interdependent GUCs. I need to study this better and make changes as required.


> +
> +void
> +assign_min_wal_size(int newval, void *extra)
> +{
> +     /*
> +      * During system initialization, XLogSegSize is not set so we use
> +      * DEFAULT_XLOG_SEG_SIZE instead.
> +      *
> +      * min_wal_size is in kB and XLogSegSize is in bytes and so it is
> +      * converted to kB for the calculation.
> +      */
> +     int     WalSegSize = (XLogSegSize == 0) ? (DEFAULT_XLOG_SEG_SIZE / 1024) :
> +                                                                               (XLOG_SEG_SIZE / 1024);
> +
> +     min_wal_size = newval / WalSegSize;
> +}
> +
> +void
>  assign_max_wal_size(int newval, void *extra)
>  {
> -     max_wal_size = newval;
> +     /*
> +      * During system initialization, XLogSegSize is not set so we use
> +      * DEFAULT_XLOG_SEG_SIZE instead.
> +      *
> +      * max_wal_size is in kB and XLogSegSize is in bytes and so it is
> +      * converted to bytes for the calculation.
> +      */
> +     int     WalSegSize = (XLogSegSize == 0) ? (DEFAULT_XLOG_SEG_SIZE / 1024) :
> +                                                                               (XLOG_SEG_SIZE / 1024);
> +
> +     max_wal_size = newval / WalSegSize;
>       CalculateCheckpointSegments();
>  }

I don't think it's a good idea to have GUCs that are initially set to
the wrong value and such.  How about just storing bytes, and converting
into segments upon use?

max_wal_size is used in CalculateCheckpointSegments and XLOGfileslop.
min_wal_size is used in XLOGfileslop only.

XLOGfileslop is called after the postgres has started up and would have XLogSegSize set by then but CalculateCheckpointSegments  would be a problem. assign_max_wal_size calls CalculateCheckpointSegments which will need the value as segment count not bytes. If we continue as bytes, then we will need to shift the WalSegSize adjustment in the CalculateCheckpointSegments. 
 

 
> @@ -2135,8 +2205,8 @@ XLOGfileslop(XLogRecPtr PriorRedoPtr)
>        * correspond to. Always recycle enough segments to meet the minimum, and
>        * remove enough segments to stay below the maximum.
>        */
> -     minSegNo = PriorRedoPtr / XLOG_SEG_SIZE + min_wal_size - 1;
> -     maxSegNo = PriorRedoPtr / XLOG_SEG_SIZE + max_wal_size - 1;
> +     minSegNo = (PriorRedoPtr >> WalShiftBit) + min_wal_size - 1;
> +     maxSegNo = (PriorRedoPtr >> WalShiftBit) + max_wal_size - 1;

I think a macro would be good here too (same prerequisite patch as above).

> @@ -4677,8 +4749,18 @@ XLOGShmemSize(void)
>        */
>       if (XLOGbuffers == -1)
>       {
> -             char            buf[32];
> -
> +             /*
> +              * The calculation of XLOGbuffers, requires the now run-time parameter
> +              * XLogSegSize from the ControlFile. The value determined here is
> +              * required to create the shared memory segment. Hence, temporarily
> +              * allocating space and reading ControlFile here.
> +              */

I don't like comments containing things like "the now run-time paramter"
much - they are likely going to still be there in 10 years, and will be
hard to understand.

you are right. 
 

But anyway, how about we simply remove the "max one segment" boundary
instead? I don't think it's actually very meaningful - several people
posted benchmarks with more than one segment being beneficial.


> diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c
> index 31290d3..87efc3c 100644
> --- a/src/bin/pg_basebackup/streamutil.c
> +++ b/src/bin/pg_basebackup/streamutil.c
> @@ -238,6 +238,59 @@ GetConnection(void)
>  }
>
>  /*
> + * Run the SHOW_WAL_SEGMENT_SIZE command to set the XLogSegSize
> + */
> +bool
> +SetXLogSegSize(PGconn *conn)
> +{

I think this is a confusing function name, because it sounds like
you're setting the SegSize remotely or such. I think making it
XLogRecPtr RetrieveXLogSegSize(conn); or such would lead to better code.

I agree. I will do the needful.
 

> diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
> index 963802e..4ceebdc 100644
> --- a/src/bin/pg_resetxlog/pg_resetxlog.c
> +++ b/src/bin/pg_resetxlog/pg_resetxlog.c
> @@ -57,6 +57,7 @@
>  #include "storage/large_object.h"
>  #include "pg_getopt.h"
>
> +uint32               XLogSegSize;

This seems like a bad idea - having the same local variable both in
frontend and backend programs seems like a recipe for disaster.

I had to use the same variable name because they were used in the macros specified in the xlog_internal.h,  So re-assigning this variable would automatically make the macros using XLogSegSize accessible in these programs. Else they will throw error "undefined reference to `XLogSegSize'"
 

--
Thank you, 

Beena Emerson

Have a Great Day!

Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Thu, Jan 26, 2017 at 8:53 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
>> I've not done like the most careful review ever, but I'm in favor of the
>> general change (provided the byval thing is fixed obviously).
>
> Thanks for the review.

Why not use pg_ltoa and pg_lltoa like the output functions for the datatype do?

Might use CStringGetTextDatum(blah) instead of
PointerGetDatum(cstring_to_text(blah)).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
Michael Paquier
Date:
On Sat, Jan 28, 2017 at 7:29 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Jan 26, 2017 at 8:53 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>>> I've not done like the most careful review ever, but I'm in favor of the
>>> general change (provided the byval thing is fixed obviously).
>>
>> Thanks for the review.
>
> Why not use pg_ltoa and pg_lltoa like the output functions for the datatype do?

No particular reason.

> Might use CStringGetTextDatum(blah) instead of
> PointerGetDatum(cstring_to_text(blah)).

Yes, thanks.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] increasing the default WAL segment size

From
Michael Paquier
Date:
On Sat, Jan 28, 2017 at 8:04 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Sat, Jan 28, 2017 at 7:29 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Jan 26, 2017 at 8:53 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>>> I've not done like the most careful review ever, but I'm in favor of the
>>>> general change (provided the byval thing is fixed obviously).
>>>
>>> Thanks for the review.
>>
>> Why not use pg_ltoa and pg_lltoa like the output functions for the datatype do?
>
> No particular reason.
>
>> Might use CStringGetTextDatum(blah) instead of
>> PointerGetDatum(cstring_to_text(blah)).
>
> Yes, thanks.

I am going to create a new thread for this refactoring patch, as
that's different than the goal of this thread.

Now, regarding the main patch. Per the status of the last couple of
days, the patch has received a review but no new versions, so I am
marking it as returned with feedback for now. Feel free to update the
status of the patch to something else if you think that's more
adapted.
-- 
Michael



Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:
Hello,
PFA the updated patches.

On Fri, Jan 27, 2017 at 2:17 PM, Beena Emerson <memissemerson@gmail.com> wrote:
Hello Andres,

Thank you for your review.

On Fri, Jan 27, 2017 at 12:39 AM, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2017-01-23 11:35:11 +0530, Beena Emerson wrote:
> Please find attached an updated WIP patch. I have incorporated almost all
> comments. This is to be applied over Robert's patches. I will post
> performance results later on.
>
> 1. shift (>>) and AND (&) operations: The assign hook of wal_segment_size
> sets the WalModMask and WalShiftBit. All the modulo and division operations
> using XLogSegSize has been replaced with these. However, there are many
> preprocessors which divide with XLogSegSize in xlog_internal.h. I have not
> changed these because it would mean I will have to reassign the WalShiftBit
> along with XLogSegSize in all the modules which use these macros. That does
> not seem to be a good idea. Also, this means shift operator can be used
> only in couple of places.

I think it'd be better not to have XLogSegSize anymore. Silently
changing a macros behaviour from being a compile time constant to
something runtime configurable is a bad idea.

I dont think I understood u clearly. You mean convert the macros using XLogSegSize to functions?

I have moved the ModMask related changes to a separate patch  01-add-XLogSegmentOffset-macro.patch. This creates the macro XLogSegmentOffset and replaces all "% XLogSegSize" and "% XLOG_SEG_SIZE" with this macro.
I have not included the shift operator because the changes only apply to about 4 lines did not give any performance boost or such.

Hm. Are GUC hooks a good way to compute the masks?  Interdependent GUCs
are unfortunately not working well, and several GUCs might end up
depending on these.  I think it might be better to assign the variables
somewhere early in StartupXLOG() or such.

I am not sure about these interdependent GUCs. I need to study this better and make changes as required.


The process flow is such thatthe Control File which sets the XLogSegSIze is read after the GUC options are initialized. StartupXLOG is called by StartupProcessMain() which restores the XLOG and then exits. Hence he value initialised here are not persistent throughout the postgres run. It throws error during pg_ctl stop.
The XLogSegSize adjustment in assign hooks have been removed and a new macro ConvertToXSegs is used to convert the min and max wal_size to the segment count when required. wal_segment_size set from ReadControlFile also affects the Checkpointsegment value and hence the assign_wal_segment_size calls CalculateCheckpointSegments.
 
Documentation is updated


Performance Tests:

I ran pgbench tests for different wal segment size on database of scale factor 300 with shared_buffers of 8GB. Each of the tests ran for 10 min and a median of 3 readings were considered. The following table shows the performance of the patch wrt the HEAD for different client count for various wal-segsize value. We could say that there is not performance difference.

163264128





8MB-1.360.020.43-0.24
16MB-0.380.18-0.090.4
32MB-0.520.290.390.59
64MB-0.150.040.520.38



--
Thank you, 

Beena Emerson

Have a Great Day!
Attachment

Re: [HACKERS] increasing the default WAL segment size

From
Alvaro Herrera
Date:
Now that we've renamed "xlog" to "wal" in user-facing elements, I think
we should strive to use the name "wal" internally too in new code, not
"xlog" anymore.  This patch introduces several variables, macros,
functions that ought to change names now -- XLogSegmentOffset should be
WALSegmentOffset for example.  (I expect that as we touch code over
time, the use of "xlog" will decrease, though not fully disappear).

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Wed, Feb 15, 2017 at 8:46 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Now that we've renamed "xlog" to "wal" in user-facing elements, I think
> we should strive to use the name "wal" internally too in new code, not
> "xlog" anymore.  This patch introduces several variables, macros,
> functions that ought to change names now -- XLogSegmentOffset should be
> WALSegmentOffset for example.  (I expect that as we touch code over
> time, the use of "xlog" will decrease, though not fully disappear).

Ugh.

I think that's going to lead to a complete mess.  We'll end up with
newer and older sections of the code being randomly inconsistent with
each other.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
Andres Freund
Date:
On 2017-02-15 22:46:38 -0300, Alvaro Herrera wrote:
> Now that we've renamed "xlog" to "wal" in user-facing elements, I think
> we should strive to use the name "wal" internally too in new code, not
> "xlog" anymore.  This patch introduces several variables, macros,
> functions that ought to change names now -- XLogSegmentOffset should be
> WALSegmentOffset for example.  (I expect that as we touch code over
> time, the use of "xlog" will decrease, though not fully disappear).

I think this will just decrease the consistency in xlog.c (note the
name) et al.



Re: [HACKERS] increasing the default WAL segment size

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2017-02-15 22:46:38 -0300, Alvaro Herrera wrote:
>> Now that we've renamed "xlog" to "wal" in user-facing elements, I think
>> we should strive to use the name "wal" internally too in new code, not
>> "xlog" anymore.  This patch introduces several variables, macros,
>> functions that ought to change names now -- XLogSegmentOffset should be
>> WALSegmentOffset for example.  (I expect that as we touch code over
>> time, the use of "xlog" will decrease, though not fully disappear).

> I think this will just decrease the consistency in xlog.c (note the
> name) et al.

It's also going to make back-patching bug fixes in the area a real
nightmare.  Let's not change the code more than necessary to implement
the desired user-facing behavior.
        regards, tom lane



Re: [HACKERS] increasing the default WAL segment size

From
Kuntal Ghosh
Date:
On Mon, Feb 6, 2017 at 11:09 PM, Beena Emerson <memissemerson@gmail.com> wrote:
>
> Hello,
> PFA the updated patches.
I've started reviewing the patches.
01-add-XLogSegmentOffset-macro.patch looks clean to me. I'll post my
detailed review after looking into the second patch. But, both the
patches needs a rebase based on the commit 85c11324cabaddcfaf3347df7
(Rename user-facing tools with "xlog" in the name to say "wal").

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:
Hello,

On Thu, Feb 16, 2017 at 3:37 PM, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
On Mon, Feb 6, 2017 at 11:09 PM, Beena Emerson <memissemerson@gmail.com> wrote:
>
> Hello,
> PFA the updated patches.
I've started reviewing the patches.
01-add-XLogSegmentOffset-macro.patch looks clean to me. I'll post my
detailed review after looking into the second patch. But, both the
patches needs a rebase based on the commit 85c11324cabaddcfaf3347df7
(Rename user-facing tools with "xlog" in the name to say "wal").


 
PFA the rebased patches.


Thank you, 

Beena Emerson

Have a Great Day!
Attachment

Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:


On Mon, Feb 20, 2017 at 2:47 PM, Beena Emerson <memissemerson@gmail.com> wrote:
Hello,

On Thu, Feb 16, 2017 at 3:37 PM, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
On Mon, Feb 6, 2017 at 11:09 PM, Beena Emerson <memissemerson@gmail.com> wrote:
>
> Hello,
> PFA the updated patches.
I've started reviewing the patches.
01-add-XLogSegmentOffset-macro.patch looks clean to me. I'll post my
detailed review after looking into the second patch. But, both the
patches needs a rebase based on the commit 85c11324cabaddcfaf3347df7
(Rename user-facing tools with "xlog" in the name to say "wal").


 
PFA the rebased patches.


Hello,

The recent commit  c29aff959dc64f7321062e7f33d8c6ec23db53d has again changed the code and the second patch cannot be applied cleanly. Please find attached the rebased 02 patch. 01 patch is the same .
 


--
Thank you, 

Beena Emerson

Have a Great Day!
Attachment

Re: [HACKERS] increasing the default WAL segment size

From
Kuntal Ghosh
Date:
On Fri, Feb 24, 2017 at 12:47 PM, Beena Emerson <memissemerson@gmail.com> wrote:
>
> Hello,
>
> The recent commit  c29aff959dc64f7321062e7f33d8c6ec23db53d has again changed
> the code and the second patch cannot be applied cleanly. Please find
> attached the rebased 02 patch. 01 patch is the same .
>
I've done an initial review of the patch. The objective of the patch
is to modify the wal-segsize as an initdb-time parameter instead of a
compile time parameter.

The patch introduces following three different techniques to expose
the XLogSize to different modules:

1. Directly read XLogSegSize from the control file
This is used by default, i.e., StartupXLOG() and looks good to me.

2. Run the SHOW wal_segment_size command to fetch and set the XLogSegSize

+   if (!RetrieveXLogSegSize(conn))
+       disconnect_and_exit(1);
+
You need the same logic in pg_receivewal.c as well.

3. Retrieve the XLogSegSize by reading the file size of WAL files
+       if (private.inpath != NULL)
+           sprintf(full_path, "%s/%s", private.inpath, fname);
+       else
+           strcpy(full_path, fname);
+
+       stat(full_path, &fst);
+
+       if (!IsValidXLogSegSize(fst.st_size))
+       {
+           fprintf(stderr,
+                   _("%s: file size %d is invalid \n"),
+                   progname, (int) fst.st_size);
+
+           return EXIT_FAILURE;
+
+       }
+
+       XLogSegSize = (int) fst.st_size;
I see couple of issues with this approach:

* You should check the return value of stat() before going ahead.
Something like,
if (stat(filename, &fst) < 0)           error "file doesn't exist"

* You're considering any WAL file with a power of 2 as valid. Suppose,
the correct WAL seg size is 64mb. For some reason, the server
generated a 16mb invalid WAL file(maybe it crashed while creating the
WAL file). Your code seems to treat this as a valid file which I think
is incorrect. Do you agree with that?

Is it possible to unify these different techniques of reading
XLogSegSize in a generalized function with a proper documentation
describing the scope and limitations of each approach?

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:
Hello,

Since the following commit, does not enable us to apply the patch cleanly, I have attached a rebased patch 02.  patch 01 does not have any problems.
commit 9e3755ecb2d058f7d123dd35a2e1784006190962
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Sat Feb 25 16:12:24 2017 -0500

    Remove useless duplicate inclusions of system header files.


--
Thank you, 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

Re: [HACKERS] increasing the default WAL segment size

From
Jim Nasby
Date:
On 2/24/17 6:30 AM, Kuntal Ghosh wrote:
> * You're considering any WAL file with a power of 2 as valid. Suppose,
> the correct WAL seg size is 64mb. For some reason, the server
> generated a 16mb invalid WAL file(maybe it crashed while creating the
> WAL file). Your code seems to treat this as a valid file which I think
> is incorrect. Do you agree with that?

Detecting correct WAL size based on the size of a random WAL file seems 
like a really bad idea to me.

I also don't see the reason for #2... or is that how initdb writes out 
the correct control file?
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: [HACKERS] increasing the default WAL segment size

From
Kuntal Ghosh
Date:
On Tue, Feb 28, 2017 at 9:45 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> On 2/24/17 6:30 AM, Kuntal Ghosh wrote:
>>
>> * You're considering any WAL file with a power of 2 as valid. Suppose,
>> the correct WAL seg size is 64mb. For some reason, the server
>> generated a 16mb invalid WAL file(maybe it crashed while creating the
>> WAL file). Your code seems to treat this as a valid file which I think
>> is incorrect. Do you agree with that?
>
>
> Detecting correct WAL size based on the size of a random WAL file seems like
> a really bad idea to me.
+1



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:


On Tue, Feb 28, 2017 at 9:45 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 2/24/17 6:30 AM, Kuntal Ghosh wrote:
* You're considering any WAL file with a power of 2 as valid. Suppose,
the correct WAL seg size is 64mb. For some reason, the server
generated a 16mb invalid WAL file(maybe it crashed while creating the
WAL file). Your code seems to treat this as a valid file which I think
is incorrect. Do you agree with that?

Detecting correct WAL size based on the size of a random WAL file seems like a really bad idea to me.

I also don't see the reason for #2... or is that how initdb writes out the correct control file?

The initdb module reads the size from the option provided and sets the environment variable. This variable is read in src/backend/access/transam/xlog.c and the ControlFile written.
Unlike pg_resetwal and pg_rewind, pg_basebackup cannot access the Control file. It only accesses the wal log folder.  So we get the XLogSegSize from the SHOW command using  replication connection. 
As Kuntal pointed out, I might need to set it from  pg_receivewal.c as well. 

Thank you, 

Beena Emerson

EnterpriseDB: https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

Re: [HACKERS] increasing the default WAL segment size

From
Ashutosh Sharma
Date:
Hi,

I took a look at this patch. Overall, the patch looks good to me.
However, there are some review comments that I would like to share,

1. I think the macro 'PATH_MAX' used in pg_waldump.c file is specific
to Linux. It needs to be changed to some constant value or may be
MAXPGPATH inorder to make it platform independent.

2. As already mentioned by Jim and Kuntal upthread, you are trying to
detect the configured WAL segment size in pg_waldump.c and
pg_standby.c files based on the size of the random WAL file which
doesn't look like a good idea. But, then I think the only option we
have is to pass the location of pg_control file to pg_waldump module
along with the start and end wal segments.

3. When trying to compile '02-initdb-walsegsize-v2.patch' on Windows,
I got this warning message,

Warning    1    warning C4005: 'DEFAULT_XLOG_SEG_SIZE' : macro
redefinition
c:\users\ashu\postgresql\src\include\pg_config_manual.h    20

Apart from these, I am not having any comments as of now. I am still
validating the patch on Windows. If I find any issues i will update
it.

--
With Regards,
Ashutosh Sharma.
EnterpriseDB: http://www.enterprisedb.com

On Tue, Feb 28, 2017 at 10:36 AM, Beena Emerson <memissemerson@gmail.com> wrote:
>
>
> On Tue, Feb 28, 2017 at 9:45 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>>
>> On 2/24/17 6:30 AM, Kuntal Ghosh wrote:
>>>
>>> * You're considering any WAL file with a power of 2 as valid. Suppose,
>>> the correct WAL seg size is 64mb. For some reason, the server
>>> generated a 16mb invalid WAL file(maybe it crashed while creating the
>>> WAL file). Your code seems to treat this as a valid file which I think
>>> is incorrect. Do you agree with that?
>>
>>
>> Detecting correct WAL size based on the size of a random WAL file seems
>> like a really bad idea to me.
>>
>>
>> I also don't see the reason for #2... or is that how initdb writes out the
>> correct control file?
>
>
> The initdb module reads the size from the option provided and sets the
> environment variable. This variable is read in
> src/backend/access/transam/xlog.c and the ControlFile written.
> Unlike pg_resetwal and pg_rewind, pg_basebackup cannot access the Control
> file. It only accesses the wal log folder.  So we get the XLogSegSize from
> the SHOW command using  replication connection.
> As Kuntal pointed out, I might need to set it from  pg_receivewal.c as well.
>
> Thank you,
>
> Beena Emerson
>
> EnterpriseDB: https://www.enterprisedb.com/
> The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Mon, Feb 27, 2017 at 11:15 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>> * You're considering any WAL file with a power of 2 as valid. Suppose,
>> the correct WAL seg size is 64mb. For some reason, the server
>> generated a 16mb invalid WAL file(maybe it crashed while creating the
>> WAL file). Your code seems to treat this as a valid file which I think
>> is incorrect. Do you agree with that?
>
> Detecting correct WAL size based on the size of a random WAL file seems like
> a really bad idea to me.

It's not the most elegant thing ever, but I'm not sure I really see a
big problem with it.  Today, if the WAL file were the wrong size, we'd
just error out.  With the patch, if the WAL file were the wrong size,
but happened to be a size that we consider legal, pg_waldump would
treat it as a legal file and try to display the WAL records contained
therein.  This doesn't seem like a huge problem from her; what are you
worried about?

I agree that it would be bad if, for example, pg_resetwal saw a broken
WAL file in pg_wal and consequently did the reset incorrectly, because
the whole point of pg_resetwal is to escape situations where the
contents of pg_wal may be bogus.  However, pg_resetwal can read the
value from the control file, so the technique of believing the file
size doesn't need to be used in that case anyway.  The only tools that
need to infer the WAL size from the sizes of the segments actually
present are those that neither have a running cluster (where SHOW can
be used) nor access to the control file.  There aren't many of those,
and pg_waldump, at least, is a debugging tool anyway.  IIUC, the other
case where this comes up is for pg_standby, but if the WAL segments
aren't all the same size that tool is presumably going to croak with
or without these changes, so I'm not really sure there's much of an
issue here.

I might be missing something.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:
Hello,

On Tue, Mar 7, 2017 at 10:46 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
Hi,

I took a look at this patch. Overall, the patch looks good to me.
However, there are some review comments that I would like to share,

1. I think the macro 'PATH_MAX' used in pg_waldump.c file is specific
to Linux. It needs to be changed to some constant value or may be
MAXPGPATH inorder to make it platform independent.

2. As already mentioned by Jim and Kuntal upthread, you are trying to
detect the configured WAL segment size in pg_waldump.c and
pg_standby.c files based on the size of the random WAL file which
doesn't look like a good idea. But, then I think the only option we
have is to pass the location of pg_control file to pg_waldump module
along with the start and end wal segments.

3. When trying to compile '02-initdb-walsegsize-v2.patch' on Windows,
I got this warning message,

Warning    1    warning C4005: 'DEFAULT_XLOG_SEG_SIZE' : macro
redefinition
c:\users\ashu\postgresql\src\include\pg_config_manual.h    20

Apart from these, I am not having any comments as of now. I am still
validating the patch on Windows. If I find any issues i will update
it.

Thank you for your reviews Kuntal, Jim, Ashutosh

Attached in an updated 02 patch which:
  1. Call RetrieveXLogSegSize(conn) in pg_receivewal.c
  2. Remove the warning in Windows
  3. Change PATH_MAX in pg_waldump with MAXPGPATH
Regarding the usage of the wal file size as the XLogSegSize, I agree with what Robert has said. Generally, the wal size will be of the expected wal_segment_size and to have it any other size, esspecially of a valid power2 value is extremely rare and I feel it is not a major cause of concern.

Thank you,

Beena Emerson
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [HACKERS] increasing the default WAL segment size

From
tushar
Date:
On 03/10/2017 11:23 AM, Beena Emerson wrote:

Thank you for your reviews Kuntal, Jim, Ashutosh

Attached in an updated 02 patch which:
  1. Call RetrieveXLogSegSize(conn) in pg_receivewal.c
  2. Remove the warning in Windows
  3. Change PATH_MAX in pg_waldump with MAXPGPATH
Regarding the usage of the wal file size as the XLogSegSize, I agree with what Robert has said. Generally, the wal size will be of the expected wal_segment_size and to have it any other size, esspecially of a valid power2 value is extremely rare and I feel it is not a major cause of concern.
We (Prabhat and I) have started basic  testing of this feature -
2 quick issue -

1)at the time of initdb, we have set - "--wal-segsize 4"  ,so all the WAL file size should be 4 MB each  but in the postgresql.conf file , it is  mentioned

#wal_keep_segments = 0          # in logfile segments, 16MB each; 0 disables

so the comment  (16MB ) mentioned against parameter 'wal_keep_segments'  looks wrong , either we should remove this or modify it .

2)Getting "Aborted (core dumped)"  error at the time of running pg_basebackup  , (this issue is only coming on Linux32 ,not on Linux64)
 we have  double check to confirm it .

Steps to reproduce on Linux32
===================
fetch the sources
apply both the patches
 ./configure --with-zlib   --enable-debug  --enable-cassert  --enable-depend --prefix=$PWD/edbpsql --with-openssl CFLAGS="-g -O0"; make all install
Performed initdb with switch "--wal-segsize 4"
start the server
run pg_basebackup

[centos@tushar-centos bin]$ ./pg_basebackup -v -D /tmp/myslave
*** glibc detected *** ./pg_basebackup: free(): invalid pointer: 0x08da7f00 ***
======= Backtrace: =========
/lib/libc.so.6[0xae7e31]
/home/centos/pg10_10mar/postgresql/edbpsql/lib/libpq.so.5(PQclear+0x16d)[0x6266f5]
./pg_basebackup[0x8051441]
./pg_basebackup[0x804e7b5]
/lib/libc.so.6(__libc_start_main+0xe6)[0xa8dd26]
./pg_basebackup[0x804a231]
======= Memory map: ========
00153000-0017b000 r-xp 00000000 fc:01 1271       /lib/libk5crypto.so.3.1
0017b000-0017c000 r--p 00028000 fc:01 1271       /lib/libk5crypto.so.3.1
0017c000-0017d000 rw-p 00029000 fc:01 1271       /lib/libk5crypto.so.3.1
0017d000-0017e000 rw-p 00000000 00:00 0
0017e000-00180000 r-xp 00000000 fc:01 1241       /lib/libkeyutils.so.1.3
00180000-00181000 r--p 00001000 fc:01 1241       /lib/libkeyutils.so.1.3
00181000-00182000 rw-p 00002000 fc:01 1241       /lib/libkeyutils.so.1.3
002ad000-002b9000 r-xp 00000000 fc:01 1152       /lib/libnss_files-2.12.so
002b9000-002ba000 r--p 0000b000 fc:01 1152       /lib/libnss_files-2.12.so
002ba000-002bb000 rw-p 0000c000 fc:01 1152       /lib/libnss_files-2.12.so
004ad000-004b0000 r-xp 00000000 fc:01 1267       /lib/libcom_err.so.2.1
004b0000-004b1000 r--p 00002000 fc:01 1267       /lib/libcom_err.so.2.1
004b1000-004b2000 rw-p 00003000 fc:01 1267       /lib/libcom_err.so.2.1
004ec000-005c3000 r-xp 00000000 fc:01 1199       /lib/libkrb5.so.3.3
005c3000-005c9000 r--p 000d6000 fc:01 1199       /lib/libkrb5.so.3.3
005c9000-005ca000 rw-p 000dc000 fc:01 1199       /lib/libkrb5.so.3.3
00617000-00642000 r-xp 00000000 fc:01 2099439    /home/centos/pg10_10mar/postgresql/edbpsql/lib/libpq.so.5.10
00642000-00644000 rw-p 0002a000 fc:01 2099439    /home/centos/pg10_10mar/postgresql/edbpsql/lib/libpq.so.5.10
00792000-0079c000 r-xp 00000000 fc:01 1255       /lib/libkrb5support.so.0.1
0079c000-0079d000 r--p 00009000 fc:01 1255       /lib/libkrb5support.so.0.1
0079d000-0079e000 rw-p 0000a000 fc:01 1255       /lib/libkrb5support.so.0.1
007fd000-0083b000 r-xp 00000000 fc:01 1280       /lib/libgssapi_krb5.so.2.2
0083b000-0083c000 r--p 0003e000 fc:01 1280       /lib/libgssapi_krb5.so.2.2
0083c000-0083d000 rw-p 0003f000 fc:01 1280       /lib/libgssapi_krb5.so.2.2
0083f000-009ed000 r-xp 00000000 fc:01 292057     /usr/lib/libcrypto.so.1.0.1e
009ed000-009fd000 r--p 001ae000 fc:01 292057     /usr/lib/libcrypto.so.1.0.1e
009fd000-00a04000 rw-p 001be000 fc:01 292057     /usr/lib/libcrypto.so.1.0.1e
00a04000-00a07000 rw-p 00000000 00:00 0
00a51000-00a6f000 r-xp 00000000 fc:01 14109      /lib/ld-2.12.so
00a6f000-00a70000 r--p 0001d000 fc:01 14109      /lib/ld-2.12.so
00a70000-00a71000 rw-p 0001e000 fc:01 14109      /lib/ld-2.12.so
00a77000-00c08000 r-xp 00000000 fc:01 14110      /lib/libc-2.12.so
00c08000-00c0a000 r--p 00191000 fc:01 14110      /lib/libc-2.12.so
00c0a000-00c0b000 rw-p 00193000 fc:01 14110      /lib/libc-2.12.so
00c0b000-00c0e000 rw-p 00000000 00:00 0
00c10000-00c22000 r-xp 00000000 fc:01 14355      /lib/libz.so.1.2.3
00c22000-00c23000 r--p 00011000 fc:01 14355      /lib/libz.so.1.2.3
00c23000-00c24000 rw-p 00012000 fc:01 14355      /lib/libz.so.1.2.3
00c52000-00c55000 r-xp 00000000 fc:01 14375      /lib/libdl-2.12.so
00c55000-00c56000 r--p 00002000 fc:01 14375      /lib/libdl-2.12.so
00c56000-00c57000 rw-p 00003000 fc:01 14375      /lib/libdl-2.12.so
00c59000-00c70000 r-xp 00000000 fc:01 14379      /lib/libpthread-2.12.so
00c70000-00c71000 r--p 00016000 fc:01 14379      /lib/libpthread-2.12.so
00c71000-00c72000 rw-p 00017000 fc:01 14379      /lib/libpthread-2.12.so
00c72000-00c74000 rw-p 00000000 00:00 0
00d0a000-00d0b000 r-xp 00000000 00:00 0          [vdso]
00d8f000-00dac000 r-xp 00000000 fc:01 14392      /lib/libselinux.so.1
00dac000-00dad000 r--p 0001d000 fc:01 14392      /lib/libselinux.so.1
00dad000-00dae000 rw-p 0001e000 fc:01 14392      /lib/libselinux.so.1
00db0000-00dc5000 r-xp 00000000 fc:01 1430       /lib/libresolv-2.12.so
00dc5000-00dc6000 ---p 00015000 fc:01 1430       /lib/libresolv-2.12.so
00dc6000-00dc7000 r--p 00015000 fc:01 1430       /lib/libresolv-2.12.so
00dc7000-00dc8000 rw-p 00016000 fc:01 1430       /lib/libresolv-2.12.so
00dc8000-00dca000 rw-p 00000000 00:00 0
00dcc000-00de9000 r-xp 00000000 fc:01 1312       /lib/libgcc_s-4.4.7-20120601.so.1
00de9000-00dea000 rw-p 0001d000 fc:01 1312       /lib/libgcc_s-4.4.7-20120601.so.1
05576000-055d8000 r-xp 00000000 fc:01 275065     /usr/lib/libssl.so.1.0.1e
055d8000-055db000 r--p 00061000 fc:01 275065     /usr/lib/libssl.so.1.0.1e
055db000-055df000 rw-p 00064000 fc:01 275065     /usr/lib/libssl.so.1.0.1e
08048000-0805f000 r-xp 00000000 fc:01 2099490    /home/centos/pg10_10mar/postgresql/edbpsql/bin/pg_basebackup
0805f000-08060000 rw-p 00016000 fc:01 2099490    /home/centos/pg10_10mar/postgresql/edbpsql/bin/pg_basebackup
08060000-08062000 rw-p 00000000 00:00 0
08d9f000-08dc0000 rw-p 00000000 00:00 0          [heap]
b7519000-b7719000 r--p 00000000 fc:01 269751     /usr/lib/locale/locale-archive
b7719000-b771e000 rw-p 00000000 00:00 0
b772a000-b772c000 rw-p 00000000 00:00 0
bfbf6000-bfc0b000 rw-p 00000000 00:00 0          [stack]
Aborted (core dumped)
[centos@tushar-centos bin]$

same scenario is working fine against HEAD (v10 ) on Linux32 [i.e no patch applied] 

[centos@tushar-centos bin]$ ./pg_basebackup --verbose -D /tmp/slave11
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: transaction log start point: 0/2800024 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: transaction log end point: 0/28000E4
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: base backup completed
[centos@tushar-centos bin]$
-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:
Hello,

Thank you testing. I just wanted to confirm few things since I do not have linux32 setup yet.

On Fri, Mar 10, 2017 at 1:09 PM, tushar <tushar.ahuja@enterprisedb.com> wrote:
On 03/10/2017 11:23 AM, Beena Emerson wrote:

Thank you for your reviews Kuntal, Jim, Ashutosh

Attached in an updated 02 patch which:
  1. Call RetrieveXLogSegSize(conn) in pg_receivewal.c
  2. Remove the warning in Windows
  3. Change PATH_MAX in pg_waldump with MAXPGPATH
Regarding the usage of the wal file size as the XLogSegSize, I agree with what Robert has said. Generally, the wal size will be of the expected wal_segment_size and to have it any other size, esspecially of a valid power2 value is extremely rare and I feel it is not a major cause of concern.
We (Prabhat and I) have started basic  testing of this feature -
2 quick issue -

1)at the time of initdb, we have set - "--wal-segsize 4"  ,so all the WAL file size should be 4 MB each  but in the postgresql.conf file , it is  mentioned

#wal_keep_segments = 0          # in logfile segments, 16MB each; 0 disables

so the comment  (16MB ) mentioned against parameter 'wal_keep_segments'  looks wrong , either we should remove this or modify it .

2)Getting "Aborted (core dumped)"  error at the time of running pg_basebackup  , (this issue is only coming on Linux32 ,not on Linux64)
 we have  double check to confirm it .

Steps to reproduce on Linux32
===================
fetch the sources
apply both the patches
 ./configure --with-zlib   --enable-debug  --enable-cassert  --enable-depend --prefix=$PWD/edbpsql --with-openssl CFLAGS="-g -O0"; make all install
Performed initdb with switch "--wal-segsize 4"

Does the crash occur with only size 4?
 
start the server
run pg_basebackup

[centos@tushar-centos bin]$ ./pg_basebackup -v -D /tmp/myslave
*** glibc detected *** ./pg_basebackup: free(): invalid pointer: 0x08da7f00 ***

[centos@tushar-centos bin]$

same scenario is working fine against HEAD (v10 ) on Linux32 [i.e no patch applied] 

[centos@tushar-centos bin]$ ./pg_basebackup --verbose -D /tmp/slave11
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: transaction log start point: 0/2800024 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: transaction log end point: 0/28000E4
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: base backup completed
[centos@tushar-centos bin]$

Just to confirm, was this done with configure flag --with-wal-segsize=4 ?
 

Thank you, 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] increasing the default WAL segment size

From
Prabhat Sahu
Date:
Hi,

2)Getting "Aborted (core dumped)"  error at the time of running pg_basebackup  , (this issue is only coming on Linux32 ,not on Linux64)
 we have  double check to confirm it .

Steps to reproduce on Linux32
===================
fetch the sources
apply both the patches
 ./configure --with-zlib   --enable-debug  --enable-cassert  --enable-depend --prefix=$PWD/edbpsql --with-openssl CFLAGS="-g -O0"; make all install
Performed initdb with switch "--wal-segsize 4"

Does the crash occur with only size 4?


Crash occurs for the value of "--wal-segsize " 1, 2, 4, 8  with stack details as below :

[bin]$ ./pg_basebackup -v -D /tmp/slave8
*** glibc detected *** ./pg_basebackup: free(): invalid pointer: 0x09fe9f00 ***
======= Backtrace: =========
/lib/libc.so.6[0xbafb81]
/home/edb/PGsrc13march/postgresql/edbpsql/lib/libpq.so.5(PQclear+0x16d)[0x5696f5]
./pg_basebackup[0x8051441]
./pg_basebackup[0x804e7b5]
/lib/libc.so.6(__libc_start_main+0xe6)[0xb55d36]
./pg_basebackup[0x804a231]
======= Memory map: ========
00165000-001c7000 r-xp 00000000 08:03 1333807    /usr/lib/libssl.so.1.0.1e
001c7000-001ca000 r--p 00061000 08:03 1333807    /usr/lib/libssl.so.1.0.1e
001ca000-001ce000 rw-p 00064000 08:03 1333807    /usr/lib/libssl.so.1.0.1e
001ce000-0020c000 r-xp 00000000 08:03 1717206    /lib/libgssapi_krb5.so.2.2
0020c000-0020d000 r--p 0003e000 08:03 1717206    /lib/libgssapi_krb5.so.2.2
0020d000-0020e000 rw-p 0003f000 08:03 1717206    /lib/libgssapi_krb5.so.2.2
0020e000-002e5000 r-xp 00000000 08:03 1717208    /lib/libkrb5.so.3.3
002e5000-002eb000 r--p 000d6000 08:03 1717208    /lib/libkrb5.so.3.3
002eb000-002ec000 rw-p 000dc000 08:03 1717208    /lib/libkrb5.so.3.3
002ec000-00309000 r-xp 00000000 08:03 1706348    /lib/libgcc_s-4.4.7-20120601.so.1
00309000-0030a000 rw-p 0001d000 08:03 1706348    /lib/libgcc_s-4.4.7-20120601.so.1
00362000-00510000 r-xp 00000000 08:03 1333806    /usr/lib/libcrypto.so.1.0.1e
00510000-00520000 r--p 001ae000 08:03 1333806    /usr/lib/libcrypto.so.1.0.1e
00520000-00527000 rw-p 001be000 08:03 1333806    /usr/lib/libcrypto.so.1.0.1e
00527000-0052a000 rw-p 00000000 00:00 0
0055a000-00585000 r-xp 00000000 08:03 419296     /home/edb/PGsrc13march/postgresql/edbpsql/lib/libpq.so.5.10
00585000-00587000 rw-p 0002a000 08:03 419296     /home/edb/PGsrc13march/postgresql/edbpsql/lib/libpq.so.5.10
0086b000-0086e000 r-xp 00000000 08:03 1717205    /lib/libcom_err.so.2.1
0086e000-0086f000 r--p 00002000 08:03 1717205    /lib/libcom_err.so.2.1
0086f000-00870000 rw-p 00003000 08:03 1717205    /lib/libcom_err.so.2.1
008e3000-00900000 r-xp 00000000 08:03 1725674    /lib/libselinux.so.1
00900000-00901000 r--p 0001d000 08:03 1725674    /lib/libselinux.so.1
00901000-00902000 rw-p 0001e000 08:03 1725674    /lib/libselinux.so.1
00a10000-00a11000 r-xp 00000000 00:00 0          [vdso]
00af9000-00b03000 r-xp 00000000 08:03 1717209    /lib/libkrb5support.so.0.1
00b03000-00b04000 r--p 00009000 08:03 1717209    /lib/libkrb5support.so.0.1
00b04000-00b05000 rw-p 0000a000 08:03 1717209    /lib/libkrb5support.so.0.1
00b19000-00b37000 r-xp 00000000 08:03 1704925    /lib/ld-2.12.so
00b37000-00b38000 r--p 0001d000 08:03 1704925    /lib/ld-2.12.so
00b38000-00b39000 rw-p 0001e000 08:03 1704925    /lib/ld-2.12.so
00b3f000-00ccf000 r-xp 00000000 08:03 1704931    /lib/libc-2.12.so
00ccf000-00cd0000 ---p 00190000 08:03 1704931    /lib/libc-2.12.so
00cd0000-00cd2000 r--p 00190000 08:03 1704931    /lib/libc-2.12.so
00cd2000-00cd3000 rw-p 00192000 08:03 1704931    /lib/libc-2.12.so
00cd3000-00cd6000 rw-p 00000000 00:00 0
00cd8000-00cef000 r-xp 00000000 08:03 1704933    /lib/libpthread-2.12.so
00cef000-00cf0000 r--p 00016000 08:03 1704933    /lib/libpthread-2.12.so
00cf0000-00cf1000 rw-p 00017000 08:03 1704933    /lib/libpthread-2.12.so
00cf1000-00cf3000 rw-p 00000000 00:00 0
00cf5000-00cf8000 r-xp 00000000 08:03 1704977    /lib/libdl-2.12.so
00cf8000-00cf9000 r--p 00002000 08:03 1704977    /lib/libdl-2.12.so
00cf9000-00cfa000 rw-p 00003000 08:03 1704977    /lib/libdl-2.12.so
00d33000-00d45000 r-xp 00000000 08:03 1704980    /lib/libz.so.1.2.3
00d45000-00d46000 r--p 00011000 08:03 1704980    /lib/libz.so.1.2.3
00d46000-00d47000 rw-p 00012000 08:03 1704980    /lib/libz.so.1.2.3
00de4000-00e0c000 r-xp 00000000 08:03 1710117    /lib/libk5crypto.so.3.1
00e0c000-00e0d000 r--p 00028000 08:03 1710117    /lib/libk5crypto.so.3.1
00e0d000-00e0e000 rw-p 00029000 08:03 1710117    /lib/libk5crypto.so.3.1
00e0e000-00e0f000 rw-p 00000000 00:00 0
00e18000-00e1a000 r-xp 00000000 08:03 1710112    /lib/libkeyutils.so.1.3
00e1a000-00e1b000 r--p 00001000 08:03 1710112    /lib/libkeyutils.so.1.3
00e1b000-00e1c000 rw-p 00002000 08:03 1710112    /lib/libkeyutils.so.1.3
00f04000-00f10000 r-xp 00000000 08:03 1704932    /lib/libnss_files-2.12.so
00f10000-00f11000 r--p 0000b000 08:03 1704932    /lib/libnss_files-2.12.so
00f11000-00f12000 rw-p 0000c000 08:03 1704932    /lib/libnss_files-2.12.so
07be1000-07bf6000 r-xp 00000000 08:03 1704916    /lib/libresolv-2.12.so
07bf6000-07bf7000 ---p 00015000 08:03 1704916    /lib/libresolv-2.12.so
07bf7000-07bf8000 r--p 00015000 08:03 1704916    /lib/libresolv-2.12.so
07bf8000-07bf9000 rw-p 00016000 08:03 1704916    /lib/libresolv-2.12.so
07bf9000-07bfb000 rw-p 00000000 00:00 0
08048000-0805f000 r-xp 00000000 08:03 539967     /home/edb/PGsrc13march/postgresql/edbpsql/bin/pg_basebackup
0805f000-08060000 rw-p 00016000 08:03 539967     /home/edb/PGsrc13march/postgresql/edbpsql/bin/pg_basebackup
08060000-08062000 rw-p 00000000 00:00 0
09fe1000-0a002000 rw-p 00000000 00:00 0          [heap]
b74ec000-b76ec000 r--p 00000000 08:03 1333666    /usr/lib/locale/locale-archive
b76ec000-b76f1000 rw-p 00000000 00:00 0
b7700000-b7702000 rw-p 00000000 00:00 0
bfcf0000-bfd05000 rw-p 00000000 00:00 0          [stack]
Aborted (core dumped)

For value the value of "--wal-segsize " 16, 32, 64... (all multiple of 16)  we are getting "Segmentation fault" message as below:
[bin]$ ./pg_basebackup -v -D /tmp/slave16
Segmentation fault (core dumped)

and for all other values of "--wal-segsize " 3, 5, 7, 9, 10, 11, ... 15, 17, 18, ...  we are getting invalid message during "initdb":
[bin]$ ./initdb -D data1 --wal-segsize=17
initdb: Invalid WAL segment size 17

 
 
start the server
run pg_basebackup

[centos@tushar-centos bin]$ ./pg_basebackup -v -D /tmp/myslave
*** glibc detected *** ./pg_basebackup: free(): invalid pointer: 0x08da7f00 ***

[centos@tushar-centos bin]$

same scenario is working fine against HEAD (v10 ) on Linux32 [i.e no patch applied] 

[centos@tushar-centos bin]$ ./pg_basebackup --verbose -D /tmp/slave11
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: transaction log start point: 0/2800024 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: transaction log end point: 0/28000E4
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: base backup completed
[centos@tushar-centos bin]$

Just to confirm, was this done with configure flag --with-wal-segsize=4 ?

we also have configure with the option "--with-wal-segsize=4" and getting warning.
./configure --with-zlib   --enable-debug  --enable-cassert  --enable-depend --prefix=$PWD/inst --with-openssl CFLAGS="-g -O0" --with-wal-segsize=4

configure: WARNING: unrecognized options: --with-wal-segsize


Thanks & Regards,

Prabhat Kumar Sahu
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:
Hello,

On Mon, Mar 13, 2017 at 1:49 PM, Prabhat Sahu <prabhat.sahu@enterprisedb.com> wrote:
Hi,

2)Getting "Aborted (core dumped)"  error at the time of running pg_basebackup  , (this issue is only coming on Linux32 ,not on Linux64)
 we have  double check to confirm it .

Steps to reproduce on Linux32
===================
fetch the sources
apply both the patches
 ./configure --with-zlib   --enable-debug  --enable-cassert  --enable-depend --prefix=$PWD/edbpsql --with-openssl CFLAGS="-g -O0"; make all install
Performed initdb with switch "--wal-segsize 4"

Does the crash occur with only size 4?


Crash occurs for the value of "--wal-segsize " 1, 2, 4, 8  with stack details as below :

For value the value of "--wal-segsize " 16, 32, 64... (all multiple of 16)  we are getting "Segmentation fault" message as below:
[bin]$ ./pg_basebackup -v -D /tmp/slave16
Segmentation fault (core dumped)

and for all other values of "--wal-segsize " 3, 5, 7, 9, 10, 11, ... 15, 17, 18, ...  we are getting invalid message during "initdb":
[bin]$ ./initdb -D data1 --wal-segsize=17
initdb: Invalid WAL segment size 17

 The permissible values for  the wal-segment size is power of 2 from 1 to 1024. Hence the Invalid message is expected behaviour.

Just to summarize, In Linux32, values 1 to 8 crashed and 16 to 1024 gave segmentation fault. 
 
 
start the server
run pg_basebackup

[centos@tushar-centos bin]$ ./pg_basebackup -v -D /tmp/myslave
*** glibc detected *** ./pg_basebackup: free(): invalid pointer: 0x08da7f00 ***

[centos@tushar-centos bin]$

same scenario is working fine against HEAD (v10 ) on Linux32 [i.e no patch applied] 

[centos@tushar-centos bin]$ ./pg_basebackup --verbose -D /tmp/slave11
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: transaction log start point: 0/2800024 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: transaction log end point: 0/28000E4
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: base backup completed
[centos@tushar-centos bin]$

Just to confirm, was this done with configure flag --with-wal-segsize=4 ?

we also have configure with the option "--with-wal-segsize=4" and getting warning.
./configure --with-zlib   --enable-debug  --enable-cassert  --enable-depend --prefix=$PWD/inst --with-openssl CFLAGS="-g -O0" --with-wal-segsize=4

configure: WARNING: unrecognized options: --with-wal-segsize

 configure option was for the HEAD, without the patch applied. 

I guess, I am missing something regarding the 32 bit machines, I am looking into it. 


Thank you, 

--

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:
Hello,

Attached is the updated patch. It fixes the issues and also updates few code comments. 

On Fri, Mar 10, 2017 at 1:09 PM, tushar <tushar.ahuja@enterprisedb.com> wrote:

1)at the time of initdb, we have set - "--wal-segsize 4"  ,so all the WAL file size should be 4 MB each  but in the postgresql.conf file , it is  mentioned

#wal_keep_segments = 0          # in logfile segments, 16MB each; 0 disables 

so the comment  (16MB ) mentioned against parameter 'wal_keep_segments'  looks wrong , either we should remove this or modify it .

Removed.
 

2)Getting "Aborted (core dumped)"  error at the time of running pg_basebackup  , (this issue is only coming on Linux32 ,not on Linux64)
 we have  double check to confirm it .

 Can you please check with the new patch?

-- 
Thank you, 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

Re: [HACKERS] increasing the default WAL segment size

From
tushar
Date:
On 03/14/2017 11:14 AM, Beena Emerson wrote:
Hello,

Attached is the updated patch. It fixes the issues and also updates few code comments. 

 Can you please check with the new patch?
Thanks, both issues has been fixed now.
-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Tue, Mar 14, 2017 at 1:44 AM, Beena Emerson <memissemerson@gmail.com> wrote:
> Attached is the updated patch. It fixes the issues and also updates few code
> comments.

I did an initial readthrough of this patch tonight just to get a
feeling for what's going on.  Based on that, here are a few review
comments:

The changes to pg_standby seem to completely break the logic to wait
until the file has attained the correct size.  I don't know how to
salvage that logic off-hand, but just breaking it isn't acceptable.

+         Note that changing this value requires an initdb.

Instead, maybe say something like "Note that this value is fixed for
the lifetime of the database cluster."

-int            max_wal_size = 64;    /* 1 GB */
-int            min_wal_size = 5;    /* 80 MB */
+int            wal_segment_size = 2048;    /* 16 MB */
+int            max_wal_size = 1024 * 1024;    /* 1 GB */
+int            min_wal_size = 80 * 1024;    /* 80 MB */

If wal_segment_size is now measured in multiple of XLOG_BLCKSZ, then
it's not the case that 2048 is always 16MB.  If the other values are
now measured in kB, perhaps rename the variables to add _kb, to avoid
confusion with the way it used to work (and in general).  The problem
with leaving this as-is is that any existing references to
max_wal_size in core or extension code will silently break; you want
it to break in a noticeable way so that it gets fixed.

+ * UsableBytesInSegment: It is set in assign_wal_segment_size and stores the
+ *         number of bytes in a WAL segment usable for WAL data.

The comment doesn't need to say where it gets set, and it doesn't need
to repeat the variable name.  Just say "The number of bytes in a..."

+assign_wal_segment_size(int newval, void *extra)

Why does a PGC_INTERNAL GUC need an assign hook?  I think the GUC
should only be there to expose the value; it shouldn't have
calculation logic associated with it.
    /*
+     * initdb passes the WAL segment size in an environment variable. We don't
+     * bother doing any sanity checking, we already check in initdb that the
+     * user gives a sane value.
+     */
+    XLogSegSize = pg_atoi(getenv("XLOG_SEG_SIZE"), sizeof(uint32), 0);

I think we should bother.  I don't like the idea of the postmaster
crashing in flames without so much as a reasonable error message if
this parameter-passing mechanism goes wrong.

+        {"wal-segsize", required_argument, NULL, 'Z'},

When adding an option with no documented short form, generally one
picks a number that isn't a character for the value at the end.  See
pg_regress.c or initdb.c for examples.

+               wal_segment_size = atoi(str_wal_segment_size);

So, you're comfortable interpreting --wal-segsize=1TB or
--wal-segsize=1GB as 1?  Implicitly, 1MB?

+ * ControlFile is not accessible here so use SHOW wal_segment_size command
+ * to set the XLogSegSize

Breaks compatibility with pre-9.6 servers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:
Hello,

Thank you for your comments, I will post an updated patch soon.

On Fri, Mar 17, 2017 at 6:40 AM, Robert Haas <robertmhaas@gmail.com> wrote:

+assign_wal_segment_size(int newval, void *extra)

Why does a PGC_INTERNAL GUC need an assign hook?  I think the GUC
should only be there to expose the value; it shouldn't have
calculation logic associated with it.

The Checkpoint Segments and the UsableBytesInSegment had to be changed depending on the value of  wal_segment_size set during initdb. I will figure out another way to assign these values without using this assign_hook.


+               wal_segment_size = atoi(str_wal_segment_size);

So, you're comfortable interpreting --wal-segsize=1TB or
--wal-segsize=1GB as 1?  Implicitly, 1MB?

The option was intended to only accept values in MB as the original  config --with-wal-segsize option, unfortunately, the patch does not throw error as in the config option when the units are specified. 

Error with config option --with-wal-segsize=1MB
configure: error: Invalid WAL segment size. Allowed values are 1,2,4,8,16,32,64.

Should we imitate this behaviour and just add a check to see if it only contains numbers? or would it be better to allow the use of the units and make appropriate code changes?

--
Thank you, 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Fri, Mar 17, 2017 at 2:08 AM, Beena Emerson <memissemerson@gmail.com> wrote:
> The option was intended to only accept values in MB as the original  config
> --with-wal-segsize option, unfortunately, the patch does not throw error as
> in the config option when the units are specified.

Yeah, you want to use strtol(), so that you can throw an error if
*endptr isn't '\0'.

> Error with config option --with-wal-segsize=1MB
> configure: error: Invalid WAL segment size. Allowed values are
> 1,2,4,8,16,32,64.
>
> Should we imitate this behaviour and just add a check to see if it only
> contains numbers? or would it be better to allow the use of the units and
> make appropriate code changes?

I think just restricting it to numeric values would be fine.  If
somebody wants to do the work to make it accept a unit suffix, I don't
have a problem with that, but it doesn't seem like a must-have.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
Peter Eisentraut
Date:
On 3/16/17 21:10, Robert Haas wrote:
> The changes to pg_standby seem to completely break the logic to wait
> until the file has attained the correct size.  I don't know how to
> salvage that logic off-hand, but just breaking it isn't acceptable.

I think we would have to extend restore_command with an additional
placeholder that communicates the segment size, and add a new pg_standby
option to accept that size somehow.  And specifying the size would have
to be mandatory, for complete robustness.  Urgh.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] increasing the default WAL segment size

From
Peter Eisentraut
Date:
On 3/17/17 16:20, Peter Eisentraut wrote:
> On 3/16/17 21:10, Robert Haas wrote:
>> The changes to pg_standby seem to completely break the logic to wait
>> until the file has attained the correct size.  I don't know how to
>> salvage that logic off-hand, but just breaking it isn't acceptable.
> 
> I think we would have to extend restore_command with an additional
> placeholder that communicates the segment size, and add a new pg_standby
> option to accept that size somehow.  And specifying the size would have
> to be mandatory, for complete robustness.  Urgh.

Another way would be to name the WAL files in a more self-describing
way.  For example, instead of

000000010000000000000001
000000010000000000000002
000000010000000000000003

name them (for 16 MB)

000000010000000001
000000010000000002
000000010000000003

Then, pg_standby and similar tools can compute the expected file size
from the file name length: 16 ^ (24 - fnamelen)

However, that way you can't actually support 64 MB segments.  The next
jump up would have to be 256 MB (unless you want to go to a base other
than 16).

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] increasing the default WAL segment size

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 3/17/17 16:20, Peter Eisentraut wrote:
>> I think we would have to extend restore_command with an additional
>> placeholder that communicates the segment size, and add a new pg_standby
>> option to accept that size somehow.  And specifying the size would have
>> to be mandatory, for complete robustness.  Urgh.

> Another way would be to name the WAL files in a more self-describing
> way.  For example, instead of

Actually, if you're content with having tools obtain this info by
examining the WAL files, we shouldn't need to muck with the WAL naming
convention (which seems like it would be a horrid mess, anyway --- too
much outside code knows that).  Tools could get the segment size out of
XLogLongPageHeaderData.xlp_seg_size in the first page of the segment.
        regards, tom lane



Re: [HACKERS] increasing the default WAL segment size

From
Peter Eisentraut
Date:
On 3/17/17 16:56, Tom Lane wrote:
> Tools could get the segment size out of
> XLogLongPageHeaderData.xlp_seg_size in the first page of the segment.

OK, then pg_standby would have to wait until the file is at least
XLOG_BLCKSZ, then look inside and get the expected final size.  A bit
more complicated than now, but seems doable.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Fri, Mar 17, 2017 at 6:11 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 3/17/17 16:56, Tom Lane wrote:
>> Tools could get the segment size out of
>> XLogLongPageHeaderData.xlp_seg_size in the first page of the segment.
>
> OK, then pg_standby would have to wait until the file is at least
> XLOG_BLCKSZ, then look inside and get the expected final size.  A bit
> more complicated than now, but seems doable.

Yeah, that doesn't sound too bad.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
David Steele
Date:

On 3/17/17 4:56 PM, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
>> On 3/17/17 16:20, Peter Eisentraut wrote:
>>> I think we would have to extend restore_command with an additional
>>> placeholder that communicates the segment size, and add a new pg_standby
>>> option to accept that size somehow.  And specifying the size would have
>>> to be mandatory, for complete robustness.  Urgh.
>
>> Another way would be to name the WAL files in a more self-describing
>> way.  For example, instead of
>
> Actually, if you're content with having tools obtain this info by
> examining the WAL files, we shouldn't need to muck with the WAL naming
> convention (which seems like it would be a horrid mess, anyway --- too
> much outside code knows that).  Tools could get the segment size out of
> XLogLongPageHeaderData.xlp_seg_size in the first page of the segment.
>
>             regards, tom lane

+1

-- 
-David
david@pgmasters.net



Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:
Hello,

PFA the updated patch.

On Fri, Mar 17, 2017 at 6:40 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Mar 14, 2017 at 1:44 AM, Beena Emerson <memissemerson@gmail.com> wrote:
> Attached is the updated patch. It fixes the issues and also updates few code
> comments.

I did an initial readthrough of this patch tonight just to get a
feeling for what's going on.  Based on that, here are a few review
comments:

The changes to pg_standby seem to completely break the logic to wait
until the file has attained the correct size.  I don't know how to
salvage that logic off-hand, but just breaking it isn't acceptable.

Using, the XLogLongPageHeader->xlp_seg_size, all the original checks have been retained. This methid is even used in pg_waldump.

 

+         Note that changing this value requires an initdb.

Instead, maybe say something like "Note that this value is fixed for
the lifetime of the database cluster."

Corrected.
 

-int            max_wal_size = 64;    /* 1 GB */
-int            min_wal_size = 5;    /* 80 MB */
+int            wal_segment_size = 2048;    /* 16 MB */
+int            max_wal_size = 1024 * 1024;    /* 1 GB */
+int            min_wal_size = 80 * 1024;    /* 80 MB */

If wal_segment_size is now measured in multiple of XLOG_BLCKSZ, then
it's not the case that 2048 is always 16MB.  If the other values are
now measured in kB, perhaps rename the variables to add _kb, to avoid
confusion with the way it used to work (and in general).  The problem
with leaving this as-is is that any existing references to
max_wal_size in core or extension code will silently break; you want
it to break in a noticeable way so that it gets fixed.


The  wal_segment_size  now is DEFAULT_XLOG_SEG_SIZE / XLOG_BLCKSZ;
min and max wal_size  have _kb postfix
  
+ * UsableBytesInSegment: It is set in assign_wal_segment_size and stores the
+ *         number of bytes in a WAL segment usable for WAL data.

The comment doesn't need to say where it gets set, and it doesn't need
to repeat the variable name.  Just say "The number of bytes in a..."

Done.
 

+assign_wal_segment_size(int newval, void *extra)

Why does a PGC_INTERNAL GUC need an assign hook?  I think the GUC
should only be there to expose the value; it shouldn't have
calculation logic associated with it.

Removed the function and called the functions in ReadControlFile.
 

     /*
+     * initdb passes the WAL segment size in an environment variable. We don't
+     * bother doing any sanity checking, we already check in initdb that the
+     * user gives a sane value.
+     */
+    XLogSegSize = pg_atoi(getenv("XLOG_SEG_SIZE"), sizeof(uint32), 0);

I think we should bother.  I don't like the idea of the postmaster
crashing in flames without so much as a reasonable error message if
this parameter-passing mechanism goes wrong.

I have rechecked the XLogSegSize.
 

+        {"wal-segsize", required_argument, NULL, 'Z'},

When adding an option with no documented short form, generally one
picks a number that isn't a character for the value at the end.  See
pg_regress.c or initdb.c for examples.
 
Done. 
 

+               wal_segment_size = atoi(str_wal_segment_size);

So, you're comfortable interpreting --wal-segsize=1TB or
--wal-segsize=1GB as 1?  Implicitly, 1MB?

Imitating the current behaviour of config option --with-wal-segment, I have used strtol to throw an error if the value is not only integers.
 

+ * ControlFile is not accessible here so use SHOW wal_segment_size command
+ * to set the XLogSegSize

Breaks compatibility with pre-9.6 servers.

Added check for the version, the SHOW command will be run only in v10 and above. Previous versions do not need this.
 

--
Thank you, 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

Re: [HACKERS] increasing the default WAL segment size

From
David Steele
Date:
Hi Beena,

On 3/20/17 2:07 PM, Beena Emerson wrote:
> Added check for the version, the SHOW command will be run only in v10
> and above. Previous versions do not need this.

I've just had the chance to have a look at this patch.  This is not a 
complete review, just a test of something I've been curious about.

With 16MB WAL segments the filename neatly aligns with the LSN.  For 
example:

WAL FILE 0000000100000001000000FE = LSN 1/FE000000

This no longer holds true with this patch.  I created a cluster with 1GB 
segments and the sequence looked like:

000000010000000000000001
000000010000000000000002
000000010000000000000003
000000010000000100000000

Whereas I had expected something like:

000000010000000000000040
000000010000000000000080
0000000100000000000000CO
000000010000000100000000

I scanned the thread but couldn't find any mention of this so I'm 
curious to know if it was considered? Was the prior correspondence 
merely serendipitous?

I'm honestly not sure which way I think is better, but I know either way 
it represents a pretty big behavioral change for any tools looking at 
pg_wal or using the various helper functions.

It's a probably a good thing to do at the same time as the rename, just 
want to make sure we are all aware of the changes.

-- 
-David
david@pgmasters.net



Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Mon, Mar 20, 2017 at 7:23 PM, David Steele <david@pgmasters.net> wrote:
> With 16MB WAL segments the filename neatly aligns with the LSN.  For
> example:
>
> WAL FILE 0000000100000001000000FE = LSN 1/FE000000
>
> This no longer holds true with this patch.

It is already possible to change the WAL segment size using the
configure option --with-wal-segsize, and I think the patch should be
consistent with whatever that existing option does.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:
PFA an updated patch.

This fixes an issue reported by Tushar internally. Since the patch changes the way min and max wal_size is stored internally from segment count to size in kb, it limited the maximum size of min and max_wal_size to 2GB in 32 bit systems.

The minimum required segment is 2 and the minimum wal size is 1MB. So the lowest possible value of the min/max wal_size is 2MB. Hence, I have changed the internal representation to MB instead of KB so that we can increase the range. Also, for any wal-seg-size, it retains the default seg count as 5 and 64 for min and max wal_size. Consequently, the size of the variables increase automatically according to the wal_segment_size. This behaviour is similar to that of existing code.

I have also run pg_indent on the files. 


On Mon, Mar 20, 2017 at 11:37 PM, Beena Emerson <memissemerson@gmail.com> wrote:
Hello,

PFA the updated patch.

On Fri, Mar 17, 2017 at 6:40 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Mar 14, 2017 at 1:44 AM, Beena Emerson <memissemerson@gmail.com> wrote:
> Attached is the updated patch. It fixes the issues and also updates few code
> comments.

I did an initial readthrough of this patch tonight just to get a
feeling for what's going on.  Based on that, here are a few review
comments:

The changes to pg_standby seem to completely break the logic to wait
until the file has attained the correct size.  I don't know how to
salvage that logic off-hand, but just breaking it isn't acceptable.

Using, the XLogLongPageHeader->xlp_seg_size, all the original checks have been retained. This methid is even used in pg_waldump.

 

+         Note that changing this value requires an initdb.

Instead, maybe say something like "Note that this value is fixed for
the lifetime of the database cluster."

Corrected.
 

-int            max_wal_size = 64;    /* 1 GB */
-int            min_wal_size = 5;    /* 80 MB */
+int            wal_segment_size = 2048;    /* 16 MB */
+int            max_wal_size = 1024 * 1024;    /* 1 GB */
+int            min_wal_size = 80 * 1024;    /* 80 MB */

If wal_segment_size is now measured in multiple of XLOG_BLCKSZ, then
it's not the case that 2048 is always 16MB.  If the other values are
now measured in kB, perhaps rename the variables to add _kb, to avoid
confusion with the way it used to work (and in general).  The problem
with leaving this as-is is that any existing references to
max_wal_size in core or extension code will silently break; you want
it to break in a noticeable way so that it gets fixed.


The  wal_segment_size  now is DEFAULT_XLOG_SEG_SIZE / XLOG_BLCKSZ;
min and max wal_size  have _kb postfix
  
+ * UsableBytesInSegment: It is set in assign_wal_segment_size and stores the
+ *         number of bytes in a WAL segment usable for WAL data.

The comment doesn't need to say where it gets set, and it doesn't need
to repeat the variable name.  Just say "The number of bytes in a..."

Done.
 

+assign_wal_segment_size(int newval, void *extra)

Why does a PGC_INTERNAL GUC need an assign hook?  I think the GUC
should only be there to expose the value; it shouldn't have
calculation logic associated with it.

Removed the function and called the functions in ReadControlFile.
 

     /*
+     * initdb passes the WAL segment size in an environment variable. We don't
+     * bother doing any sanity checking, we already check in initdb that the
+     * user gives a sane value.
+     */
+    XLogSegSize = pg_atoi(getenv("XLOG_SEG_SIZE"), sizeof(uint32), 0);

I think we should bother.  I don't like the idea of the postmaster
crashing in flames without so much as a reasonable error message if
this parameter-passing mechanism goes wrong.

I have rechecked the XLogSegSize.
 

+        {"wal-segsize", required_argument, NULL, 'Z'},

When adding an option with no documented short form, generally one
picks a number that isn't a character for the value at the end.  See
pg_regress.c or initdb.c for examples.
 
Done. 
 

+               wal_segment_size = atoi(str_wal_segment_size);

So, you're comfortable interpreting --wal-segsize=1TB or
--wal-segsize=1GB as 1?  Implicitly, 1MB?

Imitating the current behaviour of config option --with-wal-segment, I have used strtol to throw an error if the value is not only integers.
 

+ * ControlFile is not accessible here so use SHOW wal_segment_size command
+ * to set the XLogSegSize

Breaks compatibility with pre-9.6 servers.

Added check for the version, the SHOW command will be run only in v10 and above. Previous versions do not need this.
 

--
Thank you, 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



--
Thank you, 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

Re: [HACKERS] increasing the default WAL segment size

From
Stephen Frost
Date:
Robert,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Mon, Mar 20, 2017 at 7:23 PM, David Steele <david@pgmasters.net> wrote:
> > With 16MB WAL segments the filename neatly aligns with the LSN.  For
> > example:
> >
> > WAL FILE 0000000100000001000000FE = LSN 1/FE000000
> >
> > This no longer holds true with this patch.
>
> It is already possible to change the WAL segment size using the
> configure option --with-wal-segsize, and I think the patch should be
> consistent with whatever that existing option does.

Considering how little usage that option has likely seen (I can't say
I've ever run into usage of it so far...), I'm not really sure that it
makes sense to treat it as final when we're talking about changing the
default here.

In short, I'm also concerned about this change to make WAL file names no
longer match up with LSNs and also about the odd stepping that you get
as a result of this change when it comes to WAL file names.

Thanks!

Stephen

Re: [HACKERS] increasing the default WAL segment size

From
David Steele
Date:
On 3/21/17 9:04 AM, Stephen Frost wrote:
> Robert,
>
> * Robert Haas (robertmhaas@gmail.com) wrote:
>> On Mon, Mar 20, 2017 at 7:23 PM, David Steele <david@pgmasters.net> wrote:
>>> With 16MB WAL segments the filename neatly aligns with the LSN.  For
>>> example:
>>>
>>> WAL FILE 0000000100000001000000FE = LSN 1/FE000000
>>>
>>> This no longer holds true with this patch.
>>
>> It is already possible to change the WAL segment size using the
>> configure option --with-wal-segsize, and I think the patch should be
>> consistent with whatever that existing option does.
>
> Considering how little usage that option has likely seen (I can't say
> I've ever run into usage of it so far...), I'm not really sure that it
> makes sense to treat it as final when we're talking about changing the
> default here.

+1.  A seldom-used compile-time option does not necessarily provide a 
good model for a user-facing feature.

> In short, I'm also concerned about this change to make WAL file names no
> longer match up with LSNs and also about the odd stepping that you get
> as a result of this change when it comes to WAL file names.

I can't decide which way I like best.  I like the filenames 
corresponding to LSNs as they do now, but it seems like a straight 
sequence might be easier to understand.  Either way you need to know 
that different segment sizes mean different numbers of segments per 
lsn.xlogid.

Even now the correspondence is a bit tenuous.  I've always thought:

00000001000000010000000F

Should be:

00000001000000010F000000

I'm really excited to (hopefully) have this feature in v10.  I just want 
to be sure we discuss this as it will be a big change for tool authors 
and just about anybody who looks at WAL.

Thanks,
-- 
-David
david@pgmasters.net



Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Tue, Mar 21, 2017 at 9:04 AM, Stephen Frost <sfrost@snowman.net> wrote:
> In short, I'm also concerned about this change to make WAL file names no
> longer match up with LSNs and also about the odd stepping that you get
> as a result of this change when it comes to WAL file names.

OK, that's a bit surprising to me, but what do you want to do about
it?  If you take the approach that Beena did, then you lose the
correspondence with LSNs, which is admittedly not great but there are
already helper functions available to deal with LSN -> filename
mappings and I assume those will continue to work. If you take the
opposite approach, then WAL filenames stop being consecutive, which
seems to me to be far worse in terms of user and tool confusion.  Also
note that, both currently and with the patch, you can also reduce the
WAL segment size.  David's proposed naming scheme doesn't handle that
case, I think, and I think it would be all kinds of a bad idea to use
one file-naming approach for segments < 16MB and a separate approach
for segments > 16MB.  That's not making anything easier for users or
tool authors.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
David Steele
Date:
On 3/21/17 3:22 PM, Robert Haas wrote:
> On Tue, Mar 21, 2017 at 9:04 AM, Stephen Frost <sfrost@snowman.net> wrote:
>> In short, I'm also concerned about this change to make WAL file names no
>> longer match up with LSNs and also about the odd stepping that you get
>> as a result of this change when it comes to WAL file names.
>
> OK, that's a bit surprising to me, but what do you want to do about
> it?  If you take the approach that Beena did, then you lose the
> correspondence with LSNs, which is admittedly not great but there are
> already helper functions available to deal with LSN -> filename
> mappings and I assume those will continue to work. If you take the
> opposite approach, then WAL filenames stop being consecutive, which
> seems to me to be far worse in terms of user and tool confusion.

They are already non-consecutive.  Does 000000010000000200000000 really 
logically follow 0000000100000001000000FF?  Yeah, sort of, if you know 
the rules.

> Also
> note that, both currently and with the patch, you can also reduce the
> WAL segment size.  David's proposed naming scheme doesn't handle that
> case, I think, and I think it would be all kinds of a bad idea to use
> one file-naming approach for segments < 16MB and a separate approach
> for segments > 16MB.  That's not making anything easier for users or
> tool authors.

I believe it does handle that case, actually.  The minimum WAL segment 
size is 1MB so they would increase like:

000000010000000100000000
000000010000000100100000
000000010000000100200000
...
0000000100000001FFF00000

You could always calculate the next WAL file by adding 
(wal_seg_size_in_mb << 20) to the previous WAL file's LSN.  This would 
even work for WAL segments > 4GB.  Overall, I think this would make 
calculating WAL ranges simpler than it is now.

The biggest downside I can see is that this would change the naming 
scheme for the default of 16MB compared to previous versions of 
Postgres.  However, for all other wal-seg-size values changes would need 
to be made anyway.

-- 
-David
david@pgmasters.net



Re: [HACKERS] increasing the default WAL segment size

From
Peter Eisentraut
Date:
On 3/21/17 15:22, Robert Haas wrote:
> If you take the approach that Beena did, then you lose the
> correspondence with LSNs, which is admittedly not great but there are
> already helper functions available to deal with LSN -> filename
> mappings and I assume those will continue to work. If you take the
> opposite approach, then WAL filenames stop being consecutive, which
> seems to me to be far worse in terms of user and tool confusion.

Anecdotally, I think having the file numbers consecutive is very
important, for debugging and feel-good factor.

If you want to raise the segment size and preserve the LSN mapping, then
pick 256 MB as your next size.

I do think, however, that this has the potential of creating another
ongoing source of confusion similar to oid vs relfilenode, where the
numbers are often the same, except when they are not.  With hindsight, I
would have made the relfilenodes completely different from the OIDs.  We
chose to keep them (mostly) the same as the OIDs, for compatibility.  We
are seemingly making a similar kind of decision here.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Tue, Mar 21, 2017 at 6:02 PM, David Steele <david@pgmasters.net> wrote:
> The biggest downside I can see is that this would change the naming scheme
> for the default of 16MB compared to previous versions of Postgres.  However,
> for all other wal-seg-size values changes would need to be made anyway.

I think changing the naming convention for 16MB WAL segments, which is
still going to be what 99% of people use, is an awfully large
compatibility break for an awfully marginal benefit.  We've already
created quite a few incompatibilities in this release, and I'm not
entirely eager to just keep cranking them out at top speed.  Where
it's necessary to achieve forward progress in some area, sure, but
this feels gratuitous to me.  I agree that we might have picked your
scheme if we were starting from scratch, but I have a hard time
believing it's a good idea to do it now just because of this patch.
Changing the WAL segment size has been supported for a long time, and
I don't see the fact that it will now potentially be
initdb-configurable rather than configure-configurable as a sufficient
justification for whacking around the naming scheme -- even though I
don't love the naming scheme we've got.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
Stephen Frost
Date:
Robert,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Tue, Mar 21, 2017 at 6:02 PM, David Steele <david@pgmasters.net> wrote:
> > The biggest downside I can see is that this would change the naming scheme
> > for the default of 16MB compared to previous versions of Postgres.  However,
> > for all other wal-seg-size values changes would need to be made anyway.
>
> I think changing the naming convention for 16MB WAL segments, which is
> still going to be what 99% of people use, is an awfully large
> compatibility break for an awfully marginal benefit.

It seems extremely unlikely to me that we're going to actually see users
deviate from whatever we set the default to and so I'm not sure that
this is a real concern.  We aren't changing what 9.6 and below's naming
scheme is, just what PG10+ do, and PG10+ are going to have a different
default WAL size.

I realize the current patch still has the 16MB default even though a
rather large portion of the early discussion appeared in favor of
changing it to 64MB.  Once we've done that, I don't think it makes one
whit of difference what the naming scheme looks like when you're using
16MB sizes because essentially zero people are going to actually use
such a setting.

> We've already
> created quite a few incompatibilities in this release, and I'm not
> entirely eager to just keep cranking them out at top speed.

That position would seem to imply that you're in favor of keeping the
current default of 16MB, but that doesn't make sense given that you
started this discussion advocating to make it larger.  Changing your
position is certainly fine, but it'd be good to be more clear if that's
what you meant here or if you were just referring to the file naming
scheme but you do still want to increase the default size.

I'll admit that we might have a few more people using non-default sizes
once we make it an initdb-option (though I'm tempted to suggest that one
might be able to count them using their digits ;), but it seems very
unlikely that they would do so to reduce it back down to 16MB, so I'm
really not seeing the naming scheme change as a serious
backwards-incompatibility change.

Thanks!

Stephen

Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Tue, Mar 21, 2017 at 8:10 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> We've already
>> created quite a few incompatibilities in this release, and I'm not
>> entirely eager to just keep cranking them out at top speed.
>
> That position would seem to imply that you're in favor of keeping the
> current default of 16MB, but that doesn't make sense given that you
> started this discussion advocating to make it larger.  Changing your
> position is certainly fine, but it'd be good to be more clear if that's
> what you meant here or if you were just referring to the file naming
> scheme but you do still want to increase the default size.

To be honest, I'd sort of forgotten about the change which is the
nominal subject of this thread - I was more focused on the patch,
which makes it configurable.  I was definitely initially in favor of
raising the value, but I got cold feet, a bit, when Alvaro pointed out
that going to 64MB would require a substantial increase in
min_wal_size.  I'm not sure people with small installations will
appreciate seeing that value cranked up from 5 segments * 16MB = 80MB
to, say, 3 segments * 64MB = 192MB.  That's an extra 100+ MB of space
that doesn't really do anything for you.  And nobody's done any
benchmarking to see whether having only 3 segments is even a workable,
performant configuration, so maybe we'll end up with 5 * 64MB = 320MB
by default.

I'm a little worried that this whole question of changing the file
naming scheme is a diversion which will result in torpedoing any
chance of getting some kind of improvement here for v11.  I don't
think the patch is all that far from being committable but it's not
going to get there if we start redesigning the world around it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Tue, Mar 21, 2017 at 11:49 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I'm a little worried that this whole question of changing the file
> naming scheme is a diversion which will result in torpedoing any
> chance of getting some kind of improvement here for v11.  I don't
> think the patch is all that far from being committable but it's not
> going to get there if we start redesigning the world around it.

Ha.  A little Freudian slip there, since I obviously meant v10.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:
Hello,


On Wed, Mar 22, 2017 at 9:19 AM, Robert Haas <robertmhaas@gmail.com> wrote:

I'm a little worried that this whole question of changing the file
naming scheme is a diversion which will result in torpedoing any
chance of getting some kind of improvement here for v11.  I don't
think the patch is all that far from being committable but it's not
going to get there if we start redesigning the world around it.


As stated above, the default 16MB has not changed and so we can take this separately and not as part of this patch. 

PFA an updated patch which fixes a minor bug I found. It only increases the string size in pretty_wal_size function.
The 01-add-XLogSegmentOffset-macro.patch has also been rebased. 

--
Thank you, 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

Re: [HACKERS] increasing the default WAL segment size

From
Stephen Frost
Date:
Robert,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Tue, Mar 21, 2017 at 8:10 PM, Stephen Frost <sfrost@snowman.net> wrote:
> >> We've already
> >> created quite a few incompatibilities in this release, and I'm not
> >> entirely eager to just keep cranking them out at top speed.
> >
> > That position would seem to imply that you're in favor of keeping the
> > current default of 16MB, but that doesn't make sense given that you
> > started this discussion advocating to make it larger.  Changing your
> > position is certainly fine, but it'd be good to be more clear if that's
> > what you meant here or if you were just referring to the file naming
> > scheme but you do still want to increase the default size.
>
> To be honest, I'd sort of forgotten about the change which is the
> nominal subject of this thread - I was more focused on the patch,
> which makes it configurable.  I was definitely initially in favor of
> raising the value, but I got cold feet, a bit, when Alvaro pointed out
> that going to 64MB would require a substantial increase in
> min_wal_size.  I'm not sure people with small installations will
> appreciate seeing that value cranked up from 5 segments * 16MB = 80MB
> to, say, 3 segments * 64MB = 192MB.  That's an extra 100+ MB of space
> that doesn't really do anything for you.  And nobody's done any
> benchmarking to see whether having only 3 segments is even a workable,
> performant configuration, so maybe we'll end up with 5 * 64MB = 320MB
> by default.

The performance concern of having 3 segments is a red herring here if
we're talking about a default install because the default for
max_wal_size is 1G, not 192MB.  I do think increasing the default WAL
size would be valuable to do even if it does mean a default install will
take up a bit more space.

I didn't see much discussion of it, but if this is really a concern then
couldn't we set the default to be 2 segments worth instead of 3 also?
That would mean an increase from 80MB to 128MB in the default install if
you never touch more than 128MB during a checkpoint.

> I'm a little worried that this whole question of changing the file
> naming scheme is a diversion which will result in torpedoing any
> chance of getting some kind of improvement here for v11.  I don't
> think the patch is all that far from being committable but it's not
> going to get there if we start redesigning the world around it.

It's not my intent to 'torpedo' this patch but I'm pretty disappointed
that we're introducing yet another initdb-time option with, as far as I
can tell, no option to change it after the cluster has started (without
some serious hackery), and continuing to have a poor default, which is
what most users will end up with.

I really don't like these kinds of options.  I'd much rather have a
reasonable default that covers most cases and is less likely to be a
problem for most systems than have a minimal setting that's impossible
to change after you've got your data in the system.  As much as I'd like
everyone to talk to me before doing an initdb, that's pretty rare and
instead we end up having to break the bad news that they should have
known better and done the right thing at initdb time and, no, sorry,
there's no answer today but to dump out all of the data and load it into
a new cluster which was set up with the right initdb settings.

Thanks!

Stephen

Re: [HACKERS] increasing the default WAL segment size

From
Peter Eisentraut
Date:
On 3/22/17 05:44, Beena Emerson wrote:
> As stated above, the default 16MB has not changed and so we can take
> this separately and not as part of this patch. 

It's good to have that discussion separately, but if we're planning to
do it for PG10 (not saying we should), then we should have that
discussion very soon.  Especially if we would be shipping a default
configuration where the mapping of files to LSNs fails, which will
require giving users some time to adjust.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] increasing the default WAL segment size

From
Bruce Momjian
Date:
On Tue, Mar 21, 2017 at 11:49:30PM -0400, Robert Haas wrote:
> To be honest, I'd sort of forgotten about the change which is the
> nominal subject of this thread - I was more focused on the patch,
> which makes it configurable.  I was definitely initially in favor of
> raising the value, but I got cold feet, a bit, when Alvaro pointed out
> that going to 64MB would require a substantial increase in
> min_wal_size.  I'm not sure people with small installations will
> appreciate seeing that value cranked up from 5 segments * 16MB = 80MB
> to, say, 3 segments * 64MB = 192MB.  That's an extra 100+ MB of space
> that doesn't really do anything for you.  And nobody's done any
> benchmarking to see whether having only 3 segments is even a workable,
> performant configuration, so maybe we'll end up with 5 * 64MB = 320MB
> by default.

Maybe its time to have a documentation section listing suggested changes
for small installs so we can have more reasonable defaults.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] increasing the default WAL segment size

From
Peter Eisentraut
Date:
On 3/22/17 08:46, Stephen Frost wrote:
> It's not my intent to 'torpedo' this patch but I'm pretty disappointed
> that we're introducing yet another initdb-time option with, as far as I
> can tell, no option to change it after the cluster has started (without
> some serious hackery), and continuing to have a poor default, which is
> what most users will end up with.

I understand this concern, but what alternative do you have in mind?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] increasing the default WAL segment size

From
Stephen Frost
Date:
Peter,

* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
> On 3/22/17 08:46, Stephen Frost wrote:
> > It's not my intent to 'torpedo' this patch but I'm pretty disappointed
> > that we're introducing yet another initdb-time option with, as far as I
> > can tell, no option to change it after the cluster has started (without
> > some serious hackery), and continuing to have a poor default, which is
> > what most users will end up with.
>
> I understand this concern, but what alternative do you have in mind?

Changing the default to a more reasonable value would at least reduce
the issue.

I think it'd also be nice to have a way to change it post-initdb, but
that's less of an issue if we are at least setting it to a good default
to begin with instead of a minimal one.

Thanks!

Stephen

Re: [HACKERS] increasing the default WAL segment size

From
Kuntal Ghosh
Date:
On Wed, Mar 22, 2017 at 3:14 PM, Beena Emerson <memissemerson@gmail.com> wrote:
> PFA an updated patch which fixes a minor bug I found. It only increases the
> string size in pretty_wal_size function.
> The 01-add-XLogSegmentOffset-macro.patch has also been rebased.
Thanks for the updated versions. Here is a partial review of the patch:

In pg_standby.c and pg_waldump.c,
+ XLogPageHeader hdr = (XLogPageHeader) buf;
+ XLogLongPageHeader NewLongPage = (XLogLongPageHeader) hdr;
+
+ XLogSegSize = NewLongPage->xlp_seg_size;
It waits until the file is at least XLOG_BLCKSZ, then gets the
expected final size from XLogPageHeader. This looks really clean
compared to the previous approach.

+ * Verify that the min and max wal_size meet the minimum requirements.
Better to write min_wal_size and max_wal_size.

+ errmsg("Insufficient value for \"min_wal_size\"")));
"min_wal_size %d is too low" may be? Use lower case for error
messages. Same for max_wal_size.

+ /* Set the XLogSegSize */
+ XLogSegSize = ControlFile->xlog_seg_size;
+
A call to IsValidXLogSegSize() will be good after this, no?

+ /* Update variables using XLogSegSize */
+ check_wal_size();
The method name looks somewhat misleading compared to the comment for
it, doesn't it?

+ * allocating space and reading ControlFile.
s/and/for

+ {"TB", GUC_UNIT_MB, 1024 * 1024},
+ {"GB", GUC_UNIT_MB, 1024},
+ {"MB", GUC_UNIT_MB, 1},
+ {"kB", GUC_UNIT_MB, -1024},
@@ -2235,10 +2231,10 @@ static struct config_int ConfigureNamesInt[] = {"min_wal_size", PGC_SIGHUP, WAL_CHECKPOINTS,
gettext_noop("Setsthe minimum size to shrink the WAL to."), NULL,
 
- GUC_UNIT_XSEGS
+ GUC_UNIT_MB },
- &min_wal_size,
- 5, 2, INT_MAX,
+ &min_wal_size_mb,
+ DEFAULT_MIN_WAL_SEGS * 16, 2, INT_MAX, NULL, NULL, NULL },

@@ -2246,10 +2242,10 @@ static struct config_int ConfigureNamesInt[] = {"max_wal_size", PGC_SIGHUP, WAL_CHECKPOINTS,
gettext_noop("Setsthe WAL size that triggers a checkpoint."), NULL,
 
- GUC_UNIT_XSEGS
+ GUC_UNIT_MB },
- &max_wal_size,
- 64, 2, INT_MAX,
+ &max_wal_size_mb,
+ DEFAULT_MAX_WAL_SEGS * 16, 2, INT_MAX, NULL, assign_max_wal_size, NULL },
This patch introduces a new guc_unit having values in MB for
max_wal_size and min_wal_size. I'm not sure about the upper limit
which is set to INT_MAX for 32-bit systems as well. Is it needed to
define something like MAX_MEGABYTES similar to MAX_KILOBYTES?
It is worth mentioning that GUC_UNIT_KB can't be used in this case
since MAX_KILOBYTES is INT_MAX / 1024(<2GB) on 32-bit systems. That's
not a sufficient value for min_wal_size/max_wal_size.

While testing with pg_waldump, I got the following error.
bin/pg_waldump -p master/pg_wal/ -s 0/01000000
Floating point exception (core dumped)
Stack:
#0  0x00000000004039d6 in ReadPageInternal ()
#1  0x0000000000404c84 in XLogFindNextRecord ()
#2  0x0000000000401e08 in main ()
I think that the problem is in following code:
/* parse files as start/end boundaries, extract path if not specified */   if (optind < argc)
{
....
+ if (!RetrieveXLogSegSize(full_path))
...
}
In this case, RetrieveXLogSegSize is conditionally called. So, if the
condition is false, XLogSegSize will not be initialized.

I'm yet to review pg_basebackup module and I'll try to finish that ASAP.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Wed, Mar 22, 2017 at 8:46 AM, Stephen Frost <sfrost@snowman.net> wrote:
>> I was definitely initially in favor of
>> raising the value, but I got cold feet, a bit, when Alvaro pointed out
>> that going to 64MB would require a substantial increase in
>> min_wal_size.
>
> The performance concern of having 3 segments is a red herring here if
> we're talking about a default install because the default for
> max_wal_size is 1G, not 192MB.  I do think increasing the default WAL
> size would be valuable to do even if it does mean a default install will
> take up a bit more space.

min_wal_size isn't the same thing as max_wal_size.

> I didn't see much discussion of it, but if this is really a concern then
> couldn't we set the default to be 2 segments worth instead of 3 also?
> That would mean an increase from 80MB to 128MB in the default install if
> you never touch more than 128MB during a checkpoint.

Not sure.  Need testing.

>> I'm a little worried that this whole question of changing the file
>> naming scheme is a diversion which will result in torpedoing any
>> chance of getting some kind of improvement here for v11.  I don't
>> think the patch is all that far from being committable but it's not
>> going to get there if we start redesigning the world around it.
>
> It's not my intent to 'torpedo' this patch but I'm pretty disappointed
> that we're introducing yet another initdb-time option with, as far as I
> can tell, no option to change it after the cluster has started (without
> some serious hackery), and continuing to have a poor default, which is
> what most users will end up with.
>
> I really don't like these kinds of options.  I'd much rather have a
> reasonable default that covers most cases and is less likely to be a
> problem for most systems than have a minimal setting that's impossible
> to change after you've got your data in the system.  As much as I'd like
> everyone to talk to me before doing an initdb, that's pretty rare and
> instead we end up having to break the bad news that they should have
> known better and done the right thing at initdb time and, no, sorry,
> there's no answer today but to dump out all of the data and load it into
> a new cluster which was set up with the right initdb settings.

Well, right now, the alternative is to recompile the server, so I
think being able to make the change at initdb time is pretty [ insert
a word of your choice here ] great by comparison.  Now, I completely
agree that initdb-time configurability is inferior to server-restart
configurability which is obviously inferior to on-the-fly
reconfigurability, but we are not going to get either of those latter
two things in v10, so I think we should take the one we can get, which
is clearly better than what we've got now.   In the future, if
somebody is willing to put in the time and energy to allow this to be
changed via a pg_resetexlog-like procedure, or even on the fly by some
ALTER SYSTEM command, we can consider those changes then, but
everything this patch does will still be necessary.

On the topic of whether to also change the default, I'm not sure what
is best and will defer to others.  On the topic of whether to whack
around the file naming scheme, -1 from me.  This patch was posted
three months ago and nobody suggested that course of action until this
week.  Even though it is on a related topic, it is a conceptually
separate change that is previously undiscussed and on which we do not
have agreement.  Making those changes just before feature freeze isn't
fair to the patch authors or people who may not have time to pay
attention to this thread right this minute.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
Stephen Frost
Date:
Robert,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On the topic of whether to also change the default, I'm not sure what
> is best and will defer to others.  On the topic of whether to whack
> around the file naming scheme, -1 from me.  This patch was posted
> three months ago and nobody suggested that course of action until this
> week.  Even though it is on a related topic, it is a conceptually
> separate change that is previously undiscussed and on which we do not
> have agreement.  Making those changes just before feature freeze isn't
> fair to the patch authors or people who may not have time to pay
> attention to this thread right this minute.

While I understand that you'd like to separate the concerns between
changing the renaming scheme and changing the default and enabling this
option, I don't agree that they can or should be independently
considered.

This is, in my view, basically the only opportunity we will have to
change the naming scheme because once we make it an initdb option, while
I don't think very many people will use it, there will be people who
will and the various tool authors will also have to adjust to handle
those cases.  Chances are good that we will even see people start to
recommend using that initdb option, leading to more people using a
different default, at which point we simply are not going to be able to
consider changing the nameing scheme.

Therefore, I would much rather we take this opportunity to change the
naming scheme and the default at the same time to be more sane, because
if we have this patch as-is in PG10, we won't be able to do so in the
future without a great deal more pain.

I'm willing to forgo the ability to change the WAL size with just a
server restart for PG10 because that's something which can clearly be
added later without any concerns about backwards-compatibility, but the
same is not true regarding the naming scheme.

Thanks!

Stephen

Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Wed, Mar 22, 2017 at 12:22 PM, Stephen Frost <sfrost@snowman.net> wrote:
> While I understand that you'd like to separate the concerns between
> changing the renaming scheme and changing the default and enabling this
> option, I don't agree that they can or should be independently
> considered.

Well, I don't understand what you think is going to happen here.  Neither Beena nor any other contributor you don't employ is obliged to write a patch for those changes because you'd like them to get made, and Peter and I have already voted against including them.  If you or David want to write a patch for those changes, post it for discussion, and try to get consensus to commit it, that's of course your right.  But the patch will be more than three weeks after the feature freeze deadline and will have two committer votes against it from the outset.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] increasing the default WAL segment size

From
Stephen Frost
Date:
Robert,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Wed, Mar 22, 2017 at 12:22 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > While I understand that you'd like to separate the concerns between
> > changing the renaming scheme and changing the default and enabling this
> > option, I don't agree that they can or should be independently
> > considered.
>
> Well, I don't understand what you think is going to happen here.  Neither
> Beena nor any other contributor you don't employ is obliged to write a
> patch for those changes because you'd like them to get made, and Peter and
> I have already voted against including them.  If you or David want to write
> a patch for those changes, post it for discussion, and try to get consensus
> to commit it, that's of course your right.  But the patch will be more than
> three weeks after the feature freeze deadline and will have two committer
> votes against it from the outset.

This would clearly be an adjustment to the submitted patch, which
happens regularly during the review and commit process and is part of
the commitfest process, so I don't agree that holding it to new-feature
level is correct when it's a change which is entirely relevant to this
new feature, and one which a committer is asking to be included as part
of the change.  Nor do I feel particularly bad about asking for feature
authors to be prepared to rework parts of their feature based on
feedback during the commitfest process.

I would have liked to have realized this oddity with the WAL naming
scheme for not-16MB-WALs earlier too, but it's unfortunately not within
my abilities to change that.  That does not mean that we shouldn't be
cognizant of the impact that this new feature will have in exposing this
naming scheme, one which there seems to be agreement is bad, to users.

That said, David is taking a look at it to try and be helpful.

Vote-counting seems a bit premature given that there hasn't been any
particularly clear asking for votes.  Additionally, I believe Peter also
seemed concerned that the existing naming scheme which, if used with,
say, 64MB segments, wouldn't match LSNs either, in this post:
9795723f-b4dd-f9e9-62e4-ddaf6cd888f1@2ndquadrant.com

Thanks!

Stephen

Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Wed, Mar 22, 2017 at 12:51 PM, Stephen Frost <sfrost@snowman.net> wrote:
> This would clearly be an adjustment to the submitted patch, which
> happens regularly during the review and commit process and is part of
> the commitfest process, so I don't agree that holding it to new-feature
> level is correct when it's a change which is entirely relevant to this
> new feature, and one which a committer is asking to be included as part
> of the change.  Nor do I feel particularly bad about asking for feature
> authors to be prepared to rework parts of their feature based on
> feedback during the commitfest process.

Obviously, reworking the patch is an expected part of the CommitFest
process.  However, I disagree that what you're asking for can in any
way be fairly characterized that way.  You're not trying to make it do
the thing that it does better or differently.  You're trying to make
it do a second thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
Stephen Frost
Date:
Robert,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Wed, Mar 22, 2017 at 12:51 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > This would clearly be an adjustment to the submitted patch, which
> > happens regularly during the review and commit process and is part of
> > the commitfest process, so I don't agree that holding it to new-feature
> > level is correct when it's a change which is entirely relevant to this
> > new feature, and one which a committer is asking to be included as part
> > of the change.  Nor do I feel particularly bad about asking for feature
> > authors to be prepared to rework parts of their feature based on
> > feedback during the commitfest process.
>
> Obviously, reworking the patch is an expected part of the CommitFest
> process.  However, I disagree that what you're asking for can in any
> way be fairly characterized that way.  You're not trying to make it do
> the thing that it does better or differently.  You're trying to make
> it do a second thing.

I don't agree with the particularly narrow definition you're using in
this case to say that adding an option to initdb to change how big WAL
files are, which will also change how they're named (even though this
patch doesn't *specifically* do anything with the naming because there
was a configure-time switch which existed before) means that asking for
the WAL files names, which are already being changed, to be changed in a
different way, is really outside the scope and a new feature.

To put this in another light, had this issue been brought up post
feature-freeze, your definition would mean that we would only have the
option to either revert the patch entirely or to live with the poor
naming scheme.  For my 2c, in such a case, I would have voted to make
the change even post feature-freeze unless we were very close to
release as it's not really a new 'feature'.

Thankfully, that isn't the case here and we do have time to consider
changing it without having to worry about having a post feature-freeze
discussion about it.

Thanks!

Stephen

Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Wed, Mar 22, 2017 at 1:22 PM, Stephen Frost <sfrost@snowman.net> wrote:
> To put this in another light, had this issue been brought up post
> feature-freeze, your definition would mean that we would only have the
> option to either revert the patch entirely or to live with the poor
> naming scheme.

Yeah, and I absolutely agree with that.  In fact, I think it's
*already* past the time when we should be considering the changes you
want.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
Stephen Frost
Date:
Robert,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Wed, Mar 22, 2017 at 1:22 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > To put this in another light, had this issue been brought up post
> > feature-freeze, your definition would mean that we would only have the
> > option to either revert the patch entirely or to live with the poor
> > naming scheme.
>
> Yeah, and I absolutely agree with that.  In fact, I think it's
> *already* past the time when we should be considering the changes you
> want.

Then perhaps we do need to be thinking of moving this to PG11 instead of
exposing an option that users will start to use which will result in WAL
naming that'll be confusing and inconsistent.  I certainly don't think
it's a good idea to move forward exposing an option with a naming scheme
that's agreed to be bad.

Thanks!

Stephen

Re: [HACKERS] increasing the default WAL segment size

From
"David G. Johnston"
Date:
On Wed, Mar 22, 2017 at 9:51 AM, Stephen Frost <sfrost@snowman.net> wrote:
Robert,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Wed, Mar 22, 2017 at 12:22 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > While I understand that you'd like to separate the concerns between
> > changing the renaming scheme and changing the default and enabling this
> > option, I don't agree that they can or should be independently
> > considered.
>
> Well, I don't understand what you think is going to happen here.  Neither
> Beena nor any other contributor you don't employ is obliged to write a
> patch for those changes because you'd like them to get made, and Peter and
> I have already voted against including them.  If you or David want to write
> a patch for those changes, post it for discussion, and try to get consensus
> to commit it, that's of course your right.  But the patch will be more than
> three weeks after the feature freeze deadline and will have two committer
> votes against it from the outset.

This would clearly be an adjustment to the submitted patch, which
happens regularly during the review and commit process and is part of
the commitfest process, so I don't agree that holding it to new-feature
level is correct when it's a change which is entirely relevant to this
new feature, and one which a committer is asking to be included as part
of the change.  Nor do I feel particularly bad about asking for feature
authors to be prepared to rework parts of their feature based on
feedback during the commitfest process.

​Maybe it can be fit in as part of the overall patch set but wouldn't placing it either:

First. changing the name behavior and use the existing configure-time ​knob to test it out

or

Second. commit the existing patch relying on the existing behavior and then implement the rename changes using the new initdb-time knob to test it out. 

​in a series make reasoning and discussing the change considerably easier?​


I would have liked to have realized this oddity with the WAL naming
scheme for not-16MB-WALs earlier too, but it's unfortunately not within
my abilities to change that.  That does not mean that we shouldn't be
cognizant of the impact that this new feature will have in exposing this
naming scheme, one which there seems to be agreement is bad, to users.

That said, David is taking a look at it to try and be helpful.

Vote-counting seems a bit premature given that there hasn't been any
particularly clear asking for votes.  Additionally, I believe Peter also
seemed concerned that the existing naming scheme which, if used with,
say, 64MB segments, wouldn't match LSNs either, in this post:
9795723f-b4dd-f9e9-62e4-ddaf6cd888f1@2ndquadrant.com

​While my DBA skills aren't that great I would think that having a system that relies upon the DBA learning how to mentally map between LSN IDs and WAL​ files is a failure in UX in the first place.  The hacker-DBA might get a kick out of being able to operate efficiently with that knowledge and level of skill but the typical DBA would rather have something like "pg_wal --lsn ####" that they can rely upon.  I would think tool builders would likewise rather rely on us to tell them where to go look instead of matching up two strings.

This kinda reminds me of the discussion regarding our version number change.  We are going to expose broken tools that rely on the decimal version number instead of the official integer one.  Here we expose tools that rely on the equivalence between LSN and WAL filenames when using 16MB WAL files.  What I haven't seen defined here is how those tools should be behaving - i.e., what is our supported API.  If we lack an official way of working with these values then maybe we shouldn't give users an initdb-time way to change the WAL file size.

For the uninformed like myself an actual concrete example of an actual program that would be affected would be helpful.

David J.

Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Wed, Mar 22, 2017 at 1:49 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Robert Haas (robertmhaas@gmail.com) wrote:
>> On Wed, Mar 22, 2017 at 1:22 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> > To put this in another light, had this issue been brought up post
>> > feature-freeze, your definition would mean that we would only have the
>> > option to either revert the patch entirely or to live with the poor
>> > naming scheme.
>>
>> Yeah, and I absolutely agree with that.  In fact, I think it's
>> *already* past the time when we should be considering the changes you
>> want.
>
> Then perhaps we do need to be thinking of moving this to PG11 instead of
> exposing an option that users will start to use which will result in WAL
> naming that'll be confusing and inconsistent.  I certainly don't think
> it's a good idea to move forward exposing an option with a naming scheme
> that's agreed to be bad.

I'm not sure there is any such agreement.  I agree that the naming
scheme for WAL files probably isn't the greatest and that David's
proposal is probably better, but we've had that naming scheme for many
years, and I don't accept that making a previously-configure-time
option initdb-time means that it's suddenly necessary to break
everything for people who continue to use a 16MB WAL size.  I really
think that is very unlikely to be a majority position, no matter how
firmly you and David hold to it.   It is possible that a majority of
people will agree that such a change should be made, but it seems very
remote that a majority of people will agree that it has to (or even
should be) the same commit that improves the configurability.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
Stephen Frost
Date:
Robert,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Wed, Mar 22, 2017 at 1:49 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > Then perhaps we do need to be thinking of moving this to PG11 instead of
> > exposing an option that users will start to use which will result in WAL
> > naming that'll be confusing and inconsistent.  I certainly don't think
> > it's a good idea to move forward exposing an option with a naming scheme
> > that's agreed to be bad.
>
> I'm not sure there is any such agreement.  I agree that the naming
> scheme for WAL files probably isn't the greatest and that David's
> proposal is probably better, but we've had that naming scheme for many
> years, and I don't accept that making a previously-configure-time
> option initdb-time means that it's suddenly necessary to break
> everything for people who continue to use a 16MB WAL size.

Apologies, I completely forgot to bring up how the discussion has
evolved regarding the 16MB case even though we had moved past it in my
head.  Let me try to set that right here.

One of the reasons to go with the LSN is that we would actually be
maintaining what happens when the WAL files are 16MB in size.

David's initial expectation was this for 64MB WAL files:

000000010000000000000040
000000010000000000000080
0000000100000000000000CO
000000010000000100000000

Which both matches the LSN *and* keeps the file names the same when
they're 16MB.  This is what David's looking at writing a patch for and
is what I think we should be considering.  This avoids breaking
compatibility for people who choose to continue using 16MB (assuming
we switch the default to 64MB, which I am still hopeful we will do).

David had offered up another idea which would change the WAL naming for
all sizes, but he and I chatted about it and it seemed like it'd make
more sense to maintain the 16MB filenames and then to use the LSN for
other sizes also in the same manner.

Regardless of which approach we end up using, I do think we need a
formal function for converting WAL file names into LSNs and
documentation included which spells out exactly how that's done.  This
is obviously important to backup tools which need to make sure that
there aren't any gaps in the WAL stream and also need to figure out
where the LSN returned by pg_start_backup() is.  We have a function for
the latter already, but no documentation explaining how it works, which
I believe we should as tool authors need to implement this in their own
code since they can't always assume access to a PG server is available.

Thanks!

Stephen

Re: [HACKERS] increasing the default WAL segment size

From
David Steele
Date:
On 3/22/17 3:09 PM, Stephen Frost wrote:
> * Robert Haas (robertmhaas@gmail.com) wrote:
>> On Wed, Mar 22, 2017 at 1:49 PM, Stephen Frost <sfrost@snowman.net> wrote:
>>> Then perhaps we do need to be thinking of moving this to PG11 instead of
>>> exposing an option that users will start to use which will result in WAL
>>> naming that'll be confusing and inconsistent.  I certainly don't think
>>> it's a good idea to move forward exposing an option with a naming scheme
>>> that's agreed to be bad.
>>
>
> One of the reasons to go with the LSN is that we would actually be
> maintaining what happens when the WAL files are 16MB in size.
>
> David's initial expectation was this for 64MB WAL files:
>
> 000000010000000000000040
> 000000010000000000000080
> 0000000100000000000000CO
> 000000010000000100000000

This is the 1GB sequence, actually, but idea would be the same for 64MB 
files.

-- 
-David
david@pgmasters.net



Re: [HACKERS] increasing the default WAL segment size

From
Stephen Frost
Date:
* David Steele (david@pgmasters.net) wrote:
> On 3/22/17 3:09 PM, Stephen Frost wrote:
> >* Robert Haas (robertmhaas@gmail.com) wrote:
> >>On Wed, Mar 22, 2017 at 1:49 PM, Stephen Frost <sfrost@snowman.net> wrote:
> >>>Then perhaps we do need to be thinking of moving this to PG11 instead of
> >>>exposing an option that users will start to use which will result in WAL
> >>>naming that'll be confusing and inconsistent.  I certainly don't think
> >>>it's a good idea to move forward exposing an option with a naming scheme
> >>>that's agreed to be bad.
> >>
> >
> >One of the reasons to go with the LSN is that we would actually be
> >maintaining what happens when the WAL files are 16MB in size.
> >
> >David's initial expectation was this for 64MB WAL files:
> >
> >000000010000000000000040
> >000000010000000000000080
> >0000000100000000000000CO
> >000000010000000100000000
>
> This is the 1GB sequence, actually, but idea would be the same for
> 64MB files.

Ah, right, sorry.

Thanks!

Stephen

Re: [HACKERS] increasing the default WAL segment size

From
Peter Eisentraut
Date:
On 3/22/17 15:09, Stephen Frost wrote:
> David's initial expectation was this for 64MB WAL files:
> 
> 000000010000000000000040
> 000000010000000000000080
> 0000000100000000000000CO
> 000000010000000100000000
> 
> Which both matches the LSN *and* keeps the file names the same when
> they're 16MB.  This is what David's looking at writing a patch for and
> is what I think we should be considering.  This avoids breaking
> compatibility for people who choose to continue using 16MB (assuming
> we switch the default to 64MB, which I am still hopeful we will do).

The question is, which property is more useful to preserve: matching
LSN, or having a mostly consecutive numbering.

Actually, I would really really like to have both, but if I had to pick
one, I'd lean 55% toward consecutive numbering.

For the issue at hand, I think it's fine to proceed with the naming
schema that the existing compile-time option gives you.

In fact, that would flush out some of the tools that look directly at
the file names and interpret them, thus preserving the option to move to
a more radically different format.

If changing WAL sizes catches on, I do think we should keep thinking
about a new format for a future release, because debugging will
otherwise become a bit wild.  I'm thinking something like
   {integer timeline}_{integer seq number}_{hex lsn}

might address various interests.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] increasing the default WAL segment size

From
Peter Eisentraut
Date:
On 3/22/17 15:37, Peter Eisentraut wrote:
> If changing WAL sizes catches on, I do think we should keep thinking
> about a new format for a future release,

I think that means that I'm skeptical about changing the default size
right now.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] increasing the default WAL segment size

From
Robert Haas
Date:
On Wed, Mar 22, 2017 at 3:24 PM, David Steele <david@pgmasters.net> wrote:
>> One of the reasons to go with the LSN is that we would actually be
>> maintaining what happens when the WAL files are 16MB in size.
>>
>> David's initial expectation was this for 64MB WAL files:
>>
>> 000000010000000000000040
>> 000000010000000000000080
>> 0000000100000000000000CO
>> 000000010000000100000000
>
>
> This is the 1GB sequence, actually, but idea would be the same for 64MB
> files.

Wait, really?  I thought you abandoned this approach because there's
then no principled way to handle WAL segments of less than the default
size.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
Stephen Frost
Date:
* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
> The question is, which property is more useful to preserve: matching
> LSN, or having a mostly consecutive numbering.
>
> Actually, I would really really like to have both, but if I had to pick
> one, I'd lean 55% toward consecutive numbering.

> For the issue at hand, I think it's fine to proceed with the naming
> schema that the existing compile-time option gives you.

What I don't particularly like about that is that it's *not* actually
consecutive, you end up with this:

000000010000000000000001
000000010000000000000002
000000010000000000000003
000000010000000100000000

Which is part of what I don't particularly like about this approach.

> In fact, that would flush out some of the tools that look directly at
> the file names and interpret them, thus preserving the option to move to
> a more radically different format.

This doesn't make a lot of sense to me.  If we get people to change to
using larger WAL segments and the tools are modified to understand the
pseudo-consecutive format, and then you want to change it on them again
in another release or two?  I'm generally a fan of not feeling too bad
breaking backwards compatibility, but it seems pretty rough even to me
to do so immediately.

This is exactly why I think it'd be better to work out a good naming
scheme now that actually makes sense and that we'll be able to stick
with for a while instead of rushing to get this ability in now, when
we'll have people actually starting to use it and then try to change it.

> If changing WAL sizes catches on, I do think we should keep thinking
> about a new format for a future release, because debugging will
> otherwise become a bit wild.  I'm thinking something like
>
>     {integer timeline}_{integer seq number}_{hex lsn}
>
> might address various interests.

Right, I'd rather not have debugging WAL files become a bit wild.

If we can't work out a sensible approach to naming that we expect to
last us for at least a couple of releases for different sizes of WAL
files, then I don't think we should rush to encourage users to use
different sizes of WAL files.

Thanks!

Stephen

Re: [HACKERS] increasing the default WAL segment size

From
David Steele
Date:
On 3/22/17 3:39 PM, Peter Eisentraut wrote:
> On 3/22/17 15:37, Peter Eisentraut wrote:
>> If changing WAL sizes catches on, I do think we should keep thinking
>> about a new format for a future release,
>
> I think that means that I'm skeptical about changing the default size
> right now.

I think if we don't change the default size it's very unlikely I would 
use alternate WAL segment sizes or recommend that anyone else does, at 
least in v10.

I simply don't think it would get the level of testing required to be 
production worthy and I doubt that most tool writers would be quick to 
add support for a feature that very few people (if any) use.

-- 
-David
david@pgmasters.net



Re: [HACKERS] increasing the default WAL segment size

From
David Steele
Date:
Hi Robert,

On 3/22/17 3:45 PM, Robert Haas wrote:
> On Wed, Mar 22, 2017 at 3:24 PM, David Steele <david@pgmasters.net> wrote:
>>> One of the reasons to go with the LSN is that we would actually be
>>> maintaining what happens when the WAL files are 16MB in size.
>>>
>>> David's initial expectation was this for 64MB WAL files:
>>>
>>> 000000010000000000000040
>>> 000000010000000000000080
>>> 0000000100000000000000CO
>>> 000000010000000100000000
>>
>>
>> This is the 1GB sequence, actually, but idea would be the same for 64MB
>> files.
>
> Wait, really?  I thought you abandoned this approach because there's
> then no principled way to handle WAL segments of less than the default
> size.

I did say that, but I thought I had hit on a compromise.

But, as I originally pointed out the hex characters in the filename are 
not aligned correctly for > 8 bits (< 16MB segments) and using different 
alignments just made it less consistent.

It would be OK if we were willing to drop the 1,2,4,8 segment sizes 
because then the alignment would make sense and not change the current 
16MB sequence.

Even then, there are some interesting side effects.  For 1GB segments 
the "0000000100000001000000C0" segment would include LSNs 1/C0000000 
through 1/FFFFFFFF.  This is correct but is not an obvious filename to 
LSN mapping, at least for LSNs that appear later in the segment.

-- 
-David
david@pgmasters.net



Re: [HACKERS] increasing the default WAL segment size

From
Stephen Frost
Date:
David,

* David Steele (david@pgmasters.net) wrote:
> On 3/22/17 3:45 PM, Robert Haas wrote:
> >On Wed, Mar 22, 2017 at 3:24 PM, David Steele <david@pgmasters.net> wrote:
> >>>One of the reasons to go with the LSN is that we would actually be
> >>>maintaining what happens when the WAL files are 16MB in size.
> >>>
> >>>David's initial expectation was this for 64MB WAL files:
> >>>
> >>>000000010000000000000040
> >>>000000010000000000000080
> >>>0000000100000000000000CO
> >>>000000010000000100000000
> >>
> >>
> >>This is the 1GB sequence, actually, but idea would be the same for 64MB
> >>files.
> >
> >Wait, really?  I thought you abandoned this approach because there's
> >then no principled way to handle WAL segments of less than the default
> >size.
>
> I did say that, but I thought I had hit on a compromise.

Strikes me as one, at least.

> But, as I originally pointed out the hex characters in the filename
> are not aligned correctly for > 8 bits (< 16MB segments) and using
> different alignments just made it less consistent.
>
> It would be OK if we were willing to drop the 1,2,4,8 segment sizes
> because then the alignment would make sense and not change the
> current 16MB sequence.

For my 2c, at least, it seems extremely unlikely that people are using
smaller-than-16MB segments.  Also, we don't have to actually drop
support for those sizes, just handle the numbering differently, if we
feel like they're useful enough to keep- in particular I was thinking we
could make the filename one digit longer, or shift the numbers up one
position, but my general feeling is that it wouldn't ever be an
exercised use-case and therefore we should just drop support for them.

Perhaps I'm being overly paranoid, but I share David's concern about
non-standard/non-default WAL sizes being a serious risk due to lack of
exposure for those code paths, which is another reason that we should
change the default if we feel it's valuable to have a large WAL segment,
not just create this option which users can set at initdb time but which
we very rarely actually test to ensure it's working.

With any of these we need to have some buildfarm systems which are at
*least* running our regression tests against the different options, if
we would consider telling users to use them.

> Even then, there are some interesting side effects.  For 1GB
> segments the "0000000100000001000000C0" segment would include LSNs
> 1/C0000000 through 1/FFFFFFFF.  This is correct but is not an
> obvious filename to LSN mapping, at least for LSNs that appear later
> in the segment.

That doesn't seem unreasonable to me.  If we're going to use the
starting LSN of the segment then it's going to skip when you start
varying the size of the segment.  Even keeping the current scheme we end
up with skipping happening, it just different skipping and goes
"1, 2, 3, skip to the next!" where how high the count goes depends on
the size.  With this approach, we're consistently skipping the same
amount which is exactly the size divided by 16MB, always.

I do also like Peter's suggestion also of using separator between the
components of the WAL filename, but that would change the naming for
everyone, which is a concern I can understand us wishing to avoid.

From a user-experience point of view, keeping the mapping from the WAL
filename to the starting LSN is quite nice, even if this change might
complicate the backend code a bit.

Thanks!

Stephen

Re: [HACKERS] increasing the default WAL segment size

From
Peter Eisentraut
Date:
On 3/22/17 17:33, David Steele wrote:
> I think if we don't change the default size it's very unlikely I would 
> use alternate WAL segment sizes or recommend that anyone else does, at 
> least in v10.
> 
> I simply don't think it would get the level of testing required to be 
> production worthy

I think we could tweak the test harnesses to run all the tests with
different segment sizes.  That would get pretty good coverage.

More generally, the methodology that we should not add an option unless
we also change the default because the option would otherwise not get
enough testing is a bit dubious.

> and I doubt that most tool writers would be quick to 
> add support for a feature that very few people (if any) use.

I'm not one of those tool writers, although I have written my share of
DBA scripts over the years, but I wonder why those tools would really
care.  They are handed files with predetermined names to archive, and
for restore files with predetermined names are requested back from them.What else do they need?  If something is
missingthat requires them to
 
parse file names, then maybe that should be added.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] increasing the default WAL segment size

From
Stephen Frost
Date:
Peter,

* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
> On 3/22/17 17:33, David Steele wrote:
> > and I doubt that most tool writers would be quick to
> > add support for a feature that very few people (if any) use.
>
> I'm not one of those tool writers, although I have written my share of
> DBA scripts over the years, but I wonder why those tools would really
> care.  They are handed files with predetermined names to archive, and
> for restore files with predetermined names are requested back from them.
>  What else do they need?  If something is missing that requires them to
> parse file names, then maybe that should be added.

PG backup technology has come a fair ways from that simple
characterization of it. :)

The backup tools need to also get the LSN from the pg_stop_backup and
verify that they have the WAL file associated with that LSN.  They also
need to make sure that they have all of the WAL files between the
starting LSN and the ending LSN.  Doing that requires understanding how
the files are named to make sure there aren't any missing.

David will probably point out other reasons that the backup tools need
to understand the file naming, but those are ones I know of off-hand.

Thanks!

Stephen

Re: [HACKERS] increasing the default WAL segment size

From
Jeff Janes
Date:
On Thu, Mar 23, 2017 at 1:45 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 3/22/17 17:33, David Steele wrote:

> and I doubt that most tool writers would be quick to
> add support for a feature that very few people (if any) use.

I'm not one of those tool writers, although I have written my share of
DBA scripts over the years, but I wonder why those tools would really
care.  They are handed files with predetermined names to archive, and
for restore files with predetermined names are requested back from them.
 What else do they need?  If something is missing that requires them to
parse file names, then maybe that should be added.


I have a pg_restore which predicts the file 5 files ahead of the one it was asked for, and initiates a pre-fetch and decompression of it. Then it delivers the file it was asked for, either by pulling it out of the pre-staging area set up by the N-5th invocation, or by going directly to the archive to get it.  This speeds up play-back dramatically when the files are stored compressed and non-local.

That is why I need to know how the files are numbered.  I don't think that that makes much of a difference, though.  Any change is going to break that, no matter which change.  Then I'll fix it.  

If we are going to break it, I'd prefer to just do away with the 'segment' thing altogether.  You have timelines, and you have files.  That's it.

Cheers,

Jeff

Re: [HACKERS] increasing the default WAL segment size

From
Peter Eisentraut
Date:
On 3/23/17 16:58, Stephen Frost wrote:
> The backup tools need to also get the LSN from the pg_stop_backup and
> verify that they have the WAL file associated with that LSN.

There is a function for that.

> They also
> need to make sure that they have all of the WAL files between the
> starting LSN and the ending LSN.  Doing that requires understanding how
> the files are named to make sure there aren't any missing.

There is not a function for that, but there could be one.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] increasing the default WAL segment size

From
Peter Eisentraut
Date:
On 3/23/17 21:47, Jeff Janes wrote:
> I have a pg_restore which predicts the file 5 files ahead of the one it
> was asked for, and initiates a pre-fetch and decompression of it. Then
> it delivers the file it was asked for, either by pulling it out of the
> pre-staging area set up by the N-5th invocation, or by going directly to
> the archive to get it.  This speeds up play-back dramatically when the
> files are stored compressed and non-local.

Yeah, some better support for prefetching would be necessary to avoid
having to have any knowledge of the file naming.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: increasing the default WAL segment size

From
Stephen Frost
Date:
Peter,

* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
> There is a function for that.
[...]
> There is not a function for that, but there could be one.

I'm not sure you've really considered what you're suggesting here.

We need to to make sure we have every file between two LSNs.  Yes, we
could step the LSN forward one byte at a time, calling the appropriate
function for every single byte, to make sure that we have that file, but
that really isn't a reasonable approach.  Nor would it be reasonable if
we go on the assumption that WAL files can't be less than 1MB.

Beyond that, this also bakes in an assumption that we would then require
access to a database (of a current enough version to have the functions
needed too!) to connect to and run these functions, which is a poor
design.  If the user is using a remote system to gather the WAL on, that
system may not have easy access to PG.  Further, backup tools will want
to do things like off-line verification that the backup is complete,
perhaps in another environment entirely which doesn't have PG, or maybe
where what they're trying to do is make sure that a given backup is good
before starting a restore to bring PG back up.

Also, given that one of the things we're talking about here is
specifically that we want to be able to change the WAL size for
different databases, you would have to make sure that the database
you're running these functions on uses the same WAL file size as the one
which is being backed up.

No, I don't agree that we can claim the LSN -> WAL filename mapping is
an internal PG detail that we can whack around because there are
functions to calculate the answer.  External utilities need to be able
to perform that translation and we need to document for them how to do
so correctly.

Thanks!

Stephen

Re: increasing the default WAL segment size

From
David Steele
Date:
On 3/24/17 12:27 AM, Peter Eisentraut wrote:
> On 3/23/17 16:58, Stephen Frost wrote:
>> The backup tools need to also get the LSN from the pg_stop_backup and
>> verify that they have the WAL file associated with that LSN.
>
> There is a function for that.
>
>> They also
>> need to make sure that they have all of the WAL files between the
>> starting LSN and the ending LSN.  Doing that requires understanding how
>> the files are named to make sure there aren't any missing.
>
> There is not a function for that, but there could be one.

A function would be nice, but tools often cannot depend on the database 
being operational so it's still necessary to re-implement them.  Having 
a sane sequence in the WAL makes that easier.

-- 
-David
david@pgmasters.net



Re: increasing the default WAL segment size

From
Stephen Frost
Date:
Jeff,

* Jeff Janes (jeff.janes@gmail.com) wrote:
> On Thu, Mar 23, 2017 at 1:45 PM, Peter Eisentraut <
> peter.eisentraut@2ndquadrant.com> wrote:
> > On 3/22/17 17:33, David Steele wrote:
> > > and I doubt that most tool writers would be quick to
> > > add support for a feature that very few people (if any) use.
> >
> > I'm not one of those tool writers, although I have written my share of
> > DBA scripts over the years, but I wonder why those tools would really
> > care.  They are handed files with predetermined names to archive, and
> > for restore files with predetermined names are requested back from them.
> >  What else do they need?  If something is missing that requires them to
> > parse file names, then maybe that should be added.
>
> I have a pg_restore which predicts the file 5 files ahead of the one it was
> asked for, and initiates a pre-fetch and decompression of it. Then it
> delivers the file it was asked for, either by pulling it out of the
> pre-staging area set up by the N-5th invocation, or by going directly to
> the archive to get it.  This speeds up play-back dramatically when the
> files are stored compressed and non-local.

Ah, yes, that is on our road-map for pgBackrest to do also, along with
parallel WAL fetch which also needs to figure out the WAL names before
being asked for them.

We do already have parallel push, which also needs to figure out what
the upcoming file names are going to be so we can find them and push
them when they're indicated as ready in archive_status.  Perhaps we
could just push whatever is ready and remember everything that was
pushed for when PG asks, but that is really not ideal.

> That is why I need to know how the files are numbered.  I don't think that
> that makes much of a difference, though.  Any change is going to break
> that, no matter which change.  Then I'll fix it.

Right, but the discussion here is actually about the idea that we're
going to encourage people to use the initdb-time option to change the
WAL size, meaning you'll need to deal with different WAL sizes and
different naming due to that, and then we're going to turn around in the
very next release and break the naming, meaning you'll have to adjust
your tools first for the different possible WAL sizes in PG10 and then
adjust again for the different naming in PG11.

I'm trying to suggest that if we're going to do this that, perhaps, we
should try to make both the changes in one release instead of across
two.

> If we are going to break it, I'd prefer to just do away with the 'segment'
> thing altogether.  You have timelines, and you have files.  That's it.

I'm not sure I follow this proposal.  We have to know which WAL file has
which LSN in it, how do you do that with just 'timelines and files'?

Thanks!

Stephen

Re: increasing the default WAL segment size

From
David Steele
Date:
On 3/23/17 4:45 PM, Peter Eisentraut wrote:
> On 3/22/17 17:33, David Steele wrote:
>> I think if we don't change the default size it's very unlikely I would
>> use alternate WAL segment sizes or recommend that anyone else does, at
>> least in v10.
>>
>> I simply don't think it would get the level of testing required to be
>> production worthy
>
> I think we could tweak the test harnesses to run all the tests with
> different segment sizes.  That would get pretty good coverage.

I would want to see 1,16,64 at a minimum.  More might be nice but that 
gets a bit ridiculous at some point.  I would be fine with different 
critters having different defaults.  I don't think that each critter 
needs to test each value.

> More generally, the methodology that we should not add an option unless
> we also change the default because the option would otherwise not get
> enough testing is a bit dubious.

Generally, I would agree, but I think this is a special case.  This 
option has been around for a long time and we are just now exposing it 
in a way that's useful to end users.  It's easy to see how various 
assumptions may have arisen around the default and led to code that is 
not quite right when using different values.  Even if that's not true in 
the backend code, it might affect bin, and certainly affects third party 
tools.

-- 
-David
david@pgmasters.net



Re: increasing the default WAL segment size

From
Beena Emerson
Date:
Hello,

On Wed, Mar 22, 2017 at 9:41 PM, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
On Wed, Mar 22, 2017 at 3:14 PM, Beena Emerson <memissemerson@gmail.com> wrote:
> PFA an updated patch which fixes a minor bug I found. It only increases the
> string size in pretty_wal_size function.
> The 01-add-XLogSegmentOffset-macro.patch has also been rebased.
Thanks for the updated versions. Here is a partial review of the patch:

In pg_standby.c and pg_waldump.c,
+ XLogPageHeader hdr = (XLogPageHeader) buf;
+ XLogLongPageHeader NewLongPage = (XLogLongPageHeader) hdr;
+
+ XLogSegSize = NewLongPage->xlp_seg_size;
It waits until the file is at least XLOG_BLCKSZ, then gets the
expected final size from XLogPageHeader. This looks really clean
compared to the previous approach.

thank you for testing. PFA the rebased patch incorporating your comments.
 

+ * Verify that the min and max wal_size meet the minimum requirements.
Better to write min_wal_size and max_wal_size.

Updated wherever applicable.
 

+ errmsg("Insufficient value for \"min_wal_size\"")));
"min_wal_size %d is too low" may be? Use lower case for error
messages. Same for max_wal_size.

updated.
 

+ /* Set the XLogSegSize */
+ XLogSegSize = ControlFile->xlog_seg_size;
+
A call to IsValidXLogSegSize() will be good after this, no?

Is it necessary? We already have the CRC check for ControlFiles. Is that not enough?
 

+ /* Update variables using XLogSegSize */
+ check_wal_size();
The method name looks somewhat misleading compared to the comment for
it, doesn't it?

The method name is correct since it only checks if the value provided is sufficient (at least 2 segment size). I have updated the comment to say Check and update variables dependent on XLogSegSize
  
This patch introduces a new guc_unit having values in MB for
max_wal_size and min_wal_size. I'm not sure about the upper limit
which is set to INT_MAX for 32-bit systems as well. Is it needed to
define something like MAX_MEGABYTES similar to MAX_KILOBYTES?
It is worth mentioning that GUC_UNIT_KB can't be used in this case
since MAX_KILOBYTES is INT_MAX / 1024(<2GB) on 32-bit systems. That's
not a sufficient value for min_wal_size/max_wal_size.

You are right. I can use the same value as upper limit for GUC_UNIT_MB, we could probably change the name of MAX_KILOBYTES to something more general for both GUC_UNIT_MB and GUC_UNIT_KB. The max size in 32-bit systems would be 2TB.
 
While testing with pg_waldump, I got the following error.
bin/pg_waldump -p master/pg_wal/ -s 0/01000000
Floating point exception (core dumped)
Stack:
#0  0x00000000004039d6 in ReadPageInternal ()
#1  0x0000000000404c84 in XLogFindNextRecord ()
#2  0x0000000000401e08 in main ()
I think that the problem is in following code:
/* parse files as start/end boundaries, extract path if not specified */
    if (optind < argc)
{
....
+ if (!RetrieveXLogSegSize(full_path))
...
}
In this case, RetrieveXLogSegSize is conditionally called. So, if the
condition is false, XLogSegSize will not be initialized.


pg_waldump code has been updated. 
 



--

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

Re: increasing the default WAL segment size

From
Robert Haas
Date:
On Wed, Mar 22, 2017 at 6:05 PM, David Steele <david@pgmasters.net> wrote:
>> Wait, really?  I thought you abandoned this approach because there's
>> then no principled way to handle WAL segments of less than the default
>> size.
>
> I did say that, but I thought I had hit on a compromise.
>
> But, as I originally pointed out the hex characters in the filename are not
> aligned correctly for > 8 bits (< 16MB segments) and using different
> alignments just made it less consistent.

I don't think I understand what the compromise is.  Are you saying we
should have one rule for segments < 16MB and another rule for segments
> 16MB?  I think using two different rules for file naming depending
on the segment size will be a negative for both tool authors and
ordinary users.

> It would be OK if we were willing to drop the 1,2,4,8 segment sizes because
> then the alignment would make sense and not change the current 16MB
> sequence.

Well, that is true.  But the thing I'm trying to do here is to keep
this patch down to what actually needs to be changed in order to
accomplish the original purchase, not squeeze more and more changes
into it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: increasing the default WAL segment size

From
David Steele
Date:
Hi Robert,

On 3/24/17 3:00 PM, Robert Haas wrote:
> On Wed, Mar 22, 2017 at 6:05 PM, David Steele <david@pgmasters.net> wrote:
>>> Wait, really?  I thought you abandoned this approach because there's
>>> then no principled way to handle WAL segments of less than the default
>>> size.
>>
>> I did say that, but I thought I had hit on a compromise.
>>
>> But, as I originally pointed out the hex characters in the filename are not
>> aligned correctly for > 8 bits (< 16MB segments) and using different
>> alignments just made it less consistent.
>
> I don't think I understand what the compromise is.  Are you saying we
> should have one rule for segments < 16MB and another rule for segments
>> 16MB?  I think using two different rules for file naming depending
> on the segment size will be a negative for both tool authors and
> ordinary users.

Sorry for the confusion, I meant to say that if we want to keep LSNs in 
the filenames and not change alignment for the current default, then we 
would need to drop support for segment sizes < 16MB (more or less what I 
said below).  Bad editing on my part.

>> It would be OK if we were willing to drop the 1,2,4,8 segment sizes because
>> then the alignment would make sense and not change the current 16MB
>> sequence.
>
> Well, that is true.  But the thing I'm trying to do here is to keep
> this patch down to what actually needs to be changed in order to
> accomplish the original purchase, not squeeze more and more changes
> into it.

Attached is a patch to be applied on top of Beena's v8 patch that 
preserves LSNs in the file naming for all segment sizes.  It's not quite 
complete because it doesn't modify the lower size limit everywhere, but 
I think it's enough so you can see what I'm getting at.  This passes 
check-world and I've poked at it in other segment sizes as well manually.

Behavior for the current default of 16MB is unchanged, and all other 
sizes go through a logical progression.

1GB:
000000010000000000000040
000000010000000000000080
0000000100000000000000C0
000000010000000100000000

256GB:

000000010000000000000010
000000010000000000000020
000000010000000000000030
...
0000000100000000000000E0
0000000100000000000000F0
000000010000000100000000

64GB:

000000010000000100000004
000000010000000100000008
00000001000000010000000C
...
0000000100000001000000F8
0000000100000001000000FC
000000010000000100000000

I believe that maintaining an easy correspondence between LSN and 
filename is important.  The cluster will not always be up to help with 
these calculations and tools that do the job may not be present or may 
have issues.

I'm happy to merge this with Beena's patch (and tidy my patch up) if 
this looks like an improvement to everyone.

-- 
-David
david@pgmasters.net

Attachment

Re: increasing the default WAL segment size

From
Peter Eisentraut
Date:
On 3/24/17 19:13, David Steele wrote:
> Behavior for the current default of 16MB is unchanged, and all other 
> sizes go through a logical progression.

Just at a glance, without analyzing the math behind it, this scheme
seems super confusing.

> 
> 1GB:
> 000000010000000000000040
> 000000010000000000000080
> 0000000100000000000000C0
> 000000010000000100000000
> 
> 256GB:
> 
> 000000010000000000000010
> 000000010000000000000020
> 000000010000000000000030
> ...
> 0000000100000000000000E0
> 0000000100000000000000F0
> 000000010000000100000000
> 
> 64GB:
> 
> 000000010000000100000004
> 000000010000000100000008
> 00000001000000010000000C
> ...
> 0000000100000001000000F8
> 0000000100000001000000FC
> 000000010000000100000000


-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: increasing the default WAL segment size

From
Peter Eisentraut
Date:
On 3/24/17 08:18, Stephen Frost wrote:
> Peter,
> 
> * Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
>> There is a function for that.
> [...]
>> There is not a function for that, but there could be one.
> 
> I'm not sure you've really considered what you're suggesting here.

Create a set-returning function that returns all the to-be-expected file
names between two LSNs.

> Beyond that, this also bakes in an assumption that we would then require
> access to a database

That is a good point, but then any change to the naming whatsoever will
create trouble.  Then we might as well choose which specific trouble.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: increasing the default WAL segment size

From
Stephen Frost
Date:
Peter,

* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
> On 3/24/17 19:13, David Steele wrote:
> > Behavior for the current default of 16MB is unchanged, and all other
> > sizes go through a logical progression.
>
> Just at a glance, without analyzing the math behind it, this scheme
> seems super confusing.

Compared to:

1GB:
000000010000000000000001
000000010000000000000002
000000010000000000000003
000000010000000100000000

...?

Having the naming no longer match the LSN and also, seemingly, jump
randomly, strikes me as very confusing.  At least with the LSN-based
approach, we aren't jumping randomly but exactly in-line with what the
starting LSN of the file is, and always by the same amount (in hex).

Thanks!

Stephen

Re: increasing the default WAL segment size

From
Stephen Frost
Date:
Peter,

* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
> On 3/24/17 08:18, Stephen Frost wrote:
> > Beyond that, this also bakes in an assumption that we would then require
> > access to a database
>
> That is a good point, but then any change to the naming whatsoever will
> create trouble.  Then we might as well choose which specific trouble.

Right, and I'd rather we work that out before we start encouraging users
to change their WAL segment size, which is what this patch will do.

Personally, I'm alright with the patch David has produced, which is
pretty small, maintains the same names when 16MB segments are used, and
is pretty straight-forward to reason about.  I do think it'll need added
documentation to clarify how WAL segment names are calculated and
perhaps another function which returns the size of WAL segments on a
given cluster (I don't think we have that..?), and, ideally, added
regression tests or buildfarm animals which try different sizes.

On the other hand, I don't have any particular issue with the naming
scheme you proposed up-thread, which uses proper separators between the
components of a WAL filename, but that would change what happens with
16MB WAL segments today.

I'm still of the opinion that we should be changing the default to 64MB
for WAL segments.

Thanks!

Stephen

Re: increasing the default WAL segment size

From
Peter Eisentraut
Date:
At this point, I suggest splitting this patch up into several
potentially less controversial pieces.

One big piece is that we currently don't support segment sizes larger
than 64 GB, for various internal arithmetic reasons.  Your patch appears
to address that.  So I suggest isolating that.  Assuming it works
correctly, I think there would be no great concern about it.

The next piece would be making the various tools aware of varying
segment sizes without having to rely on a built-in value.

The third piece would then be the rest that allows you to set the size
at initdb

If we take these in order, we would make it easier to test various sizes
and see if there are any more unforeseen issues when changing sizes.  It
would also make it easier to do performance testing so we can address
the original question of what the default size should be.

One concern I have is that your patch does not contain any tests.  There
should probably be lots of tests.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: increasing the default WAL segment size

From
Simon Riggs
Date:
On 25 March 2017 at 17:02, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> At this point, I suggest splitting this patch up into several
> potentially less controversial pieces.
>
> One big piece is that we currently don't support segment sizes larger
> than 64 GB, for various internal arithmetic reasons.  Your patch appears
> to address that.  So I suggest isolating that.  Assuming it works
> correctly, I think there would be no great concern about it.

+1

> The next piece would be making the various tools aware of varying
> segment sizes without having to rely on a built-in value.

Hmm

> The third piece would then be the rest that allows you to set the size
> at initdb
>
> If we take these in order, we would make it easier to test various sizes
> and see if there are any more unforeseen issues when changing sizes.  It
> would also make it easier to do performance testing so we can address
> the original question of what the default size should be.
>
> One concern I have is that your patch does not contain any tests.  There
> should probably be lots of tests.

This is looking like a reject in its current form.

Changing WAL filename to a new form seems best plan, but we don't have
time to do that and get it right, especially with no tests.

My summary of useful requirements would be
* Files smaller than 16MB and larger than 16MB are desirable
* LSN <-> filename mapping must be clear
* New filename format best for debugging and clarity

My proposal from here is that we allow only one new size in this
release, to minimize the splash zone. Keep the filename format as it
is now, using David's suggestion. Suggest adding 1GB as the only
additional option, which continues the idea of having 1GB as the max
filesize.

New filename format can come in PG11 allowing much wider range of WAL
filesizes, bigger and smaller.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: increasing the default WAL segment size

From
Beena Emerson
Date:
Hello,

On Sat, Mar 25, 2017 at 10:32 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
At this point, I suggest splitting this patch up into several
potentially less controversial pieces.

One big piece is that we currently don't support segment sizes larger
than 64 GB, for various internal arithmetic reasons.  Your patch appears
to address that.  So I suggest isolating that.  Assuming it works
correctly, I think there would be no great concern about it.

The next piece would be making the various tools aware of varying
segment sizes without having to rely on a built-in value.

The third piece would then be the rest that allows you to set the size
at initdb

If we take these in order, we would make it easier to test various sizes
and see if there are any more unforeseen issues when changing sizes.  It
would also make it easier to do performance testing so we can address
the original question of what the default size should be.

PFA the patches divided into 3 parts:

02-increase-max-wal-segsize.patch - Increases the wal-segsize and changes the internal representation of max_wal_size and min_wal_size to mb. 

03-modify-tools.patch - Makes XLogSegSize into a variable, currently set as XLOG_SEG_SIZE and modifies the tools to fetch the size instead of using inbuilt value.

04-initdb-walsegsize.patch - Adds the initdb option to set wal-segsize and make related changes. Update pg_test_fsync to use DEFAULT_XLOG_SEG_SIZE instead of XLOG_SEG_SIZE
 
One concern I have is that your patch does not contain any tests.  There
should probably be lots of tests.

05-initdb_tests.patch adds tap tests to initialize cluster with different wal_segment_size and then check the config values. What other tests do you have in mind? Checking the various tools? 

--
Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

Re: increasing the default WAL segment size

From
Kuntal Ghosh
Date:
On Tue, Mar 28, 2017 at 1:06 AM, Beena Emerson <memissemerson@gmail.com> wrote:
> On Sat, Mar 25, 2017 at 10:32 PM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>>
>> At this point, I suggest splitting this patch up into several
>> potentially less controversial pieces.
>>
>> One big piece is that we currently don't support segment sizes larger
>> than 64 GB, for various internal arithmetic reasons.  Your patch appears
>> to address that.  So I suggest isolating that.  Assuming it works
>> correctly, I think there would be no great concern about it.
>>
>> The next piece would be making the various tools aware of varying
>> segment sizes without having to rely on a built-in value.
>>
>> The third piece would then be the rest that allows you to set the size
>> at initdb
>>
>> If we take these in order, we would make it easier to test various sizes
>> and see if there are any more unforeseen issues when changing sizes.  It
>> would also make it easier to do performance testing so we can address
>> the original question of what the default size should be.
>
>
> PFA the patches divided into 3 parts:
Thanks for splitting the patches.
01-add-XLogSegmentOffset-macro.patch is same as before and it looks good.

> 02-increase-max-wal-segsize.patch - Increases the wal-segsize and changes
> the internal representation of max_wal_size and min_wal_size to mb.
looks good.

> 03-modify-tools.patch - Makes XLogSegSize into a variable, currently set as
> XLOG_SEG_SIZE and modifies the tools to fetch the size instead of using
> inbuilt value.
Several methods are declared and defined in different tools to fetch
the size of wal-seg-size.
In pg_standby.c,
RetrieveXLogSegSize() - /* Set XLogSegSize from the WAL file header */

In pg_basebackup/streamutil.c,on behaRetrieveXLogSegSize(PGconn *conn) - /* set XLogSegSize using
SHOW wal_segment_size */

In pg_waldump.c,
ReadXLogFromDir(char *archive_loc)
RetrieveXLogSegSize(char *archive_path) /* Scan through the archive
location to set XLogSegsize from the first WAL file */

IMHO, it's better to define a single method in xlog.c and based on the
different strategy, it can retrieve the XLogSegsize on behalf of
different modules. I've suggested the same in my first set review and
I'll still vote for it. For example, in xlog.c, you can define
something as following:
bool RetrieveXLogSegSize(RetrieveStrategy rs, void* ptr)

Now based on the RetrieveStrategy(say Conn, File, Dir), you can cast
the void pointer to the appropriate type. So, when a new tool needs to
retrieve XLogSegSize, it can just call this function instead of
defining a new RetrieveXLogSegSize method.

It's just a suggestion from my side. Is there anything I'm missing
which can cause the aforesaid approach not to be working?
Apart from that, I've nothing to add here.

> 04-initdb-walsegsize.patch - Adds the initdb option to set wal-segsize and
> make related changes. Update pg_test_fsync to use DEFAULT_XLOG_SEG_SIZE
> instead of XLOG_SEG_SIZE
looks good.

>>
>> One concern I have is that your patch does not contain any tests.  There
>> should probably be lots of tests.
>
>
> 05-initdb_tests.patch adds tap tests to initialize cluster with different
> wal_segment_size and then check the config values. What other tests do you
> have in mind? Checking the various tools?
Nothing from me to add here.

I've nothing to add here for the attached set of patches. On top of
these, David's patch can be used for preserving LSNs in the file
naming for all segment sizes.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



Re: increasing the default WAL segment size

From
Beena Emerson
Date:
Hello,

Thanks for testing my patch.

On 30 Mar 2017 15:10, "Kuntal Ghosh" <kuntalghosh.2007@gmail.com> wrote:
On Tue, Mar 28, 2017 at 1:06 AM, Beena Emerson <memissemerson@gmail.com> wrote:
> On Sat, Mar 25, 2017 at 10:32 PM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>>
>> At this point, I suggest splitting this patch up into several
>> potentially less controversial pieces.
>>
>> One big piece is that we currently don't support segment sizes larger
>> than 64 GB, for various internal arithmetic reasons.  Your patch appears
>> to address that.  So I suggest isolating that.  Assuming it works
>> correctly, I think there would be no great concern about it.
>>
>> The next piece would be making the various tools aware of varying
>> segment sizes without having to rely on a built-in value.
>>
>> The third piece would then be the rest that allows you to set the size
>> at initdb
>>
>> If we take these in order, we would make it easier to test various sizes
>> and see if there are any more unforeseen issues when changing sizes.  It
>> would also make it easier to do performance testing so we can address
>> the original question of what the default size should be.
>
>
> PFA the patches divided into 3 parts:
Thanks for splitting the patches.
01-add-XLogSegmentOffset-macro.patch is same as before and it looks good.

> 02-increase-max-wal-segsize.patch - Increases the wal-segsize and changes
> the internal representation of max_wal_size and min_wal_size to mb.
looks good.

> 03-modify-tools.patch - Makes XLogSegSize into a variable, currently set as
> XLOG_SEG_SIZE and modifies the tools to fetch the size instead of using
> inbuilt value.
Several methods are declared and defined in different tools to fetch
the size of wal-seg-size.
In pg_standby.c,
RetrieveXLogSegSize() - /* Set XLogSegSize from the WAL file header */

In pg_basebackup/streamutil.c,
 on behaRetrieveXLogSegSize(PGconn *conn) - /* set XLogSegSize using
SHOW wal_segment_size */

In pg_waldump.c,
ReadXLogFromDir(char *archive_loc)
RetrieveXLogSegSize(char *archive_path) /* Scan through the archive
location to set XLogSegsize from the first WAL file */

IMHO, it's better to define a single method in xlog.c and based on the
different strategy, it can retrieve the XLogSegsize on behalf of
different modules. I've suggested the same in my first set review and
I'll still vote for it. For example, in xlog.c, you can define
something as following:
bool RetrieveXLogSegSize(RetrieveStrategy rs, void* ptr)

Now based on the RetrieveStrategy(say Conn, File, Dir), you can cast
the void pointer to the appropriate type. So, when a new tool needs to
retrieve XLogSegSize, it can just call this function instead of
defining a new RetrieveXLogSegSize method.

It's just a suggestion from my side. Is there anything I'm missing
which can cause the aforesaid approach not to be working?
Apart from that, I've nothing to add here.



I do not think a generalised function is a good idea. Besides, I feel the respective approaches are best kept in the modules used also because the internal code is not easily accessible by utils.


> 04-initdb-walsegsize.patch - Adds the initdb option to set wal-segsize and
> make related changes. Update pg_test_fsync to use DEFAULT_XLOG_SEG_SIZE
> instead of XLOG_SEG_SIZE
looks good.

>>
>> One concern I have is that your patch does not contain any tests.  There
>> should probably be lots of tests.
>
>
> 05-initdb_tests.patch adds tap tests to initialize cluster with different
> wal_segment_size and then check the config values. What other tests do you
> have in mind? Checking the various tools?
Nothing from me to add here.

I've nothing to add here for the attached set of patches. On top of
these, David's patch can be used for preserving LSNs in the file
naming for all segment sizes.

Re: increasing the default WAL segment size

From
Kuntal Ghosh
Date:
On Fri, Mar 31, 2017 at 10:40 AM, Beena Emerson <memissemerson@gmail.com> wrote:
> On 30 Mar 2017 15:10, "Kuntal Ghosh" <kuntalghosh.2007@gmail.com> wrote:

>> 03-modify-tools.patch - Makes XLogSegSize into a variable, currently set
>> as
>> XLOG_SEG_SIZE and modifies the tools to fetch the size instead of using
>> inbuilt value.
> Several methods are declared and defined in different tools to fetch
> the size of wal-seg-size.
> In pg_standby.c,
> RetrieveXLogSegSize() - /* Set XLogSegSize from the WAL file header */
>
> In pg_basebackup/streamutil.c,
>  on behaRetrieveXLogSegSize(PGconn *conn) - /* set XLogSegSize using
> SHOW wal_segment_size */
>
> In pg_waldump.c,
> ReadXLogFromDir(char *archive_loc)
> RetrieveXLogSegSize(char *archive_path) /* Scan through the archive
> location to set XLogSegsize from the first WAL file */
>
> IMHO, it's better to define a single method in xlog.c and based on the
> different strategy, it can retrieve the XLogSegsize on behalf of
> different modules. I've suggested the same in my first set review and
> I'll still vote for it. For example, in xlog.c, you can define
> something as following:
> bool RetrieveXLogSegSize(RetrieveStrategy rs, void* ptr)
>
> Now based on the RetrieveStrategy(say Conn, File, Dir), you can cast
> the void pointer to the appropriate type. So, when a new tool needs to
> retrieve XLogSegSize, it can just call this function instead of
> defining a new RetrieveXLogSegSize method.
>
> It's just a suggestion from my side. Is there anything I'm missing
> which can cause the aforesaid approach not to be working?
> Apart from that, I've nothing to add here.
>
>
>
> I do not think a generalised function is a good idea. Besides, I feel the
> respective approaches are best kept in the modules used also because the
> internal code is not easily accessible by utils.
>
Ahh, I wonder what the reason can be. Anyway, I'll leave that decision
for the committer. I'm moving the status to Ready for committer.

I've only tested the patch in my 64-bit linux system. It needs some
testing on other environment settings.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



Re: increasing the default WAL segment size

From
Beena Emerson
Date:
Hello,

On Fri, Mar 31, 2017 at 11:20 AM, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
On Fri, Mar 31, 2017 at 10:40 AM, Beena Emerson <memissemerson@gmail.com> wrote:
> On 30 Mar 2017 15:10, "Kuntal Ghosh" <kuntalghosh.2007@gmail.com> wrote:

> I do not think a generalised function is a good idea. Besides, I feel the
> respective approaches are best kept in the modules used also because the
> internal code is not easily accessible by utils.
>
Ahh, I wonder what the reason can be. Anyway, I'll leave that decision
for the committer. I'm moving the status to Ready for committer.


Thank you.


--

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: increasing the default WAL segment size

From
Simon Riggs
Date:
On 27 March 2017 at 15:36, Beena Emerson <memissemerson@gmail.com> wrote:

> 02-increase-max-wal-segsize.patch - Increases the wal-segsize and changes
> the internal representation of max_wal_size and min_wal_size to mb.

Committed first part to allow internal representation change (only).

No commitment yet to increasing wal-segsize in the way this patch has it.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: increasing the default WAL segment size

From
Amit Kapila
Date:
On Wed, Apr 5, 2017 at 3:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 27 March 2017 at 15:36, Beena Emerson <memissemerson@gmail.com> wrote:
>
>> 02-increase-max-wal-segsize.patch - Increases the wal-segsize and changes
>> the internal representation of max_wal_size and min_wal_size to mb.
>
> Committed first part to allow internal representation change (only).
>
> No commitment yet to increasing wal-segsize in the way this patch has it.
>

What part of patch you don't like and do you have any suggestions to
improve the same?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: increasing the default WAL segment size

From
Peter Eisentraut
Date:
On 4/4/17 22:47, Amit Kapila wrote:
>> Committed first part to allow internal representation change (only).
>>
>> No commitment yet to increasing wal-segsize in the way this patch has it.
>>
> 
> What part of patch you don't like and do you have any suggestions to
> improve the same?

I think there are still some questions and disagreements about how it
should behave.

I suggest the next step is to dial up the allowed segment size in
configure and run some tests about what a reasonable maximum value could
be.  I did a little bit of that, but somewhere around 256 MB, things got
really slow.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: increasing the default WAL segment size

From
Beena Emerson
Date:
Hello,

On Wed, Apr 5, 2017 at 9:29 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 4/4/17 22:47, Amit Kapila wrote:
>> Committed first part to allow internal representation change (only).
>>
>> No commitment yet to increasing wal-segsize in the way this patch has it.
>>
>
> What part of patch you don't like and do you have any suggestions to
> improve the same?

I think there are still some questions and disagreements about how it
should behave.

The  WALfilename - LSN mapping disruption for higher values you mean? Is there anything else I have missed?
 

I suggest the next step is to dial up the allowed segment size in
configure and run some tests about what a reasonable maximum value could
be.  I did a little bit of that, but somewhere around 256 MB, things got
really slow.

Would it be better if just increase the limit to 128MB for now?
In next we can change the WAL file name format and expand the range?

--

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: increasing the default WAL segment size

From
Simon Riggs
Date:
On 4 April 2017 at 22:47, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Apr 5, 2017 at 3:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 27 March 2017 at 15:36, Beena Emerson <memissemerson@gmail.com> wrote:
>>
>>> 02-increase-max-wal-segsize.patch - Increases the wal-segsize and changes
>>> the internal representation of max_wal_size and min_wal_size to mb.
>>
>> Committed first part to allow internal representation change (only).
>>
>> No commitment yet to increasing wal-segsize in the way this patch has it.
>>
>
> What part of patch you don't like and do you have any suggestions to
> improve the same?

The only part of the patch uncommitted was related to choice of WAL
file size in the config file.

I've already made suggestions on that upthread.

I'm now looking at patch 03-modify-tools.patch

* Peter's "lack of tests" comment still applies
* I think we should remove pg_standby in this release, so we don't
have to care about it
* If we change pg_resetwal then it should allow changing XLogSegSize also
* "coulnot access the archive location"

03 looks mostly OK
04 is much more of a mess
* Lots of comments and notes pre-judge what the limits and
configurability are, so its hard to commit the patches without
committing to the basic assumptions. Please look at removing all
assumptions about what the values/options are, so we can change them
later

05 adds various tests but I don't think adds enough value to commit

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: increasing the default WAL segment size

From
Simon Riggs
Date:
On 5 April 2017 at 06:04, Beena Emerson <memissemerson@gmail.com> wrote:

>> >> No commitment yet to increasing wal-segsize in the way this patch has
>> >> it.
>> >>
>> >
>> > What part of patch you don't like and do you have any suggestions to
>> > improve the same?
>>
>> I think there are still some questions and disagreements about how it
>> should behave.
>
>
> The  WALfilename - LSN mapping disruption for higher values you mean? Is
> there anything else I have missed?

I see various issues raised but not properly addressed

1. we would need to drop support for segment sizes < 16MB unless we
adopt a new incompatible filename format.
I think at 16MB the naming should be the same as now and that
WALfilename -> LSN is very important.
For this release, I think we should just allow >= 16MB and avoid the
issue thru lack of time.

2. It's not clear to me the advantage of being able to pick varying
filesizes. I see great disadvantage in having too many options, which
greatly increases the chance of incompatibility, annoyance and
breakage. I favour a small number of values that have been shown by
testing to be sweet spots in performance and usability. (1GB has been
suggested)

3. New file allocation has been a problem raised with this patch for
some months now.

Lack of clarity and/or movement on these issues is very, very close to
getting the patch rejected now. Let's not approach this with the
viewpoint that I or others want it to be rejected, lets work forwards
and get some solid changes that will improve the world without causing
problems.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: increasing the default WAL segment size

From
Peter Eisentraut
Date:
On 4/5/17 06:04, Beena Emerson wrote:
>     I suggest the next step is to dial up the allowed segment size in
>     configure and run some tests about what a reasonable maximum value could
>     be.  I did a little bit of that, but somewhere around 256 MB, things got
>     really slow.
> 
> 
> Would it be better if just increase the limit to 128MB for now?
> In next we can change the WAL file name format and expand the range?

I don't think me saying it felt a bit slow around 256 MB is a proper
technical analysis that should lead to the conclusion that that upper
limit should be 128 MB. ;-)

This tells me that there is a lot of explore and test here before we
should let it loose on users.

I think the best we should do now is spend a bit of time exploring
whether/how larger values of segment size behave, and bump the hardcoded
configure limit if we get positive results.  Everything else should
probably be postponed.

(Roughly speaking, to get started, this would mean compiling with
--with-wal-segsize 16, 32, 64, 128, 256, run make check-world both
sequentially and in parallel, and take note of a) passing, b) run time,
c) disk space.)

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: increasing the default WAL segment size

From
Simon Riggs
Date:
On 5 April 2017 at 08:36, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 4/5/17 06:04, Beena Emerson wrote:
>>     I suggest the next step is to dial up the allowed segment size in
>>     configure and run some tests about what a reasonable maximum value could
>>     be.  I did a little bit of that, but somewhere around 256 MB, things got
>>     really slow.
>>
>>
>> Would it be better if just increase the limit to 128MB for now?
>> In next we can change the WAL file name format and expand the range?
>
> I don't think me saying it felt a bit slow around 256 MB is a proper
> technical analysis that should lead to the conclusion that that upper
> limit should be 128 MB. ;-)
>
> This tells me that there is a lot of explore and test here before we
> should let it loose on users.

Agreed

> I think the best we should do now is spend a bit of time exploring
> whether/how larger values of segment size behave, and bump the hardcoded
> configure limit if we get positive results.  Everything else should
> probably be postponed.
>
> (Roughly speaking, to get started, this would mean compiling with
> --with-wal-segsize 16, 32, 64, 128, 256, run make check-world both
> sequentially and in parallel, and take note of a) passing, b) run time,
> c) disk space.)

I've committed the rest of Beena's patch to allow this testing to
occur up to 1024MB.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:
Hello,

On Wed, Apr 5, 2017 at 4:59 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 5 April 2017 at 06:04, Beena Emerson <memissemerson@gmail.com> wrote:

I see various issues raised but not properly addressed

1. we would need to drop support for segment sizes < 16MB unless we
adopt a new incompatible filename format.
I think at 16MB the naming should be the same as now and that
WALfilename -> LSN is very important.
For this release, I think we should just allow >= 16MB and avoid the
issue thru lack of time.

2. It's not clear to me the advantage of being able to pick varying
filesizes. I see great disadvantage in having too many options, which
greatly increases the chance of incompatibility, annoyance and
breakage. I favour a small number of values that have been shown by
testing to be sweet spots in performance and usability. (1GB has been
suggested)

Does the options 16, 64 and 1024 seem good. 
We can remove sizes below 16 as most have agreed and as per the discussion, 64MB and 1GB seems favoured. We could probably allow 32MB since it was an already allowed size? 


3. New file allocation has been a problem raised with this patch for
some months now.

This did not seem to be an open issue, at least there was not many disagreements. Higher the server load, more WAL generated. For the same load, the frequency of file allocation reduces for higher wal-segsize values. Overall it is either filling up many files or few larger files.

--

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:
Hello,

On Wed, Apr 5, 2017 at 6:06 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:e format and expand the range?

I don't think me saying it felt a bit slow around 256 MB is a proper
technical analysis that should lead to the conclusion that that upper
limit should be 128 MB. ;-)

I ran a couple of tests for 16MB and 1GB and found less than 4% performance dip. I am currently running test to check consistency of the results and for various sizes. I will update soon.  

--

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] increasing the default WAL segment size

From
Peter Eisentraut
Date:
On 4/6/17 07:13, Beena Emerson wrote:
> Does the options 16, 64 and 1024 seem good. 
> We can remove sizes below 16 as most have agreed and as per the
> discussion, 64MB and 1GB seems favoured. We could probably allow 32MB
> since it was an already allowed size? 

I don't see the need to remove any options right now.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] increasing the default WAL segment size

From
David Steele
Date:
On 4/5/17 7:29 AM, Simon Riggs wrote:
> On 5 April 2017 at 06:04, Beena Emerson <memissemerson@gmail.com> wrote:
>>
>> The  WALfilename - LSN mapping disruption for higher values you mean? Is
>> there anything else I have missed?
> 
> I see various issues raised but not properly addressed
> 
> 1. we would need to drop support for segment sizes < 16MB unless we
> adopt a new incompatible filename format.
> I think at 16MB the naming should be the same as now and that
> WALfilename -> LSN is very important.
> For this release, I think we should just allow >= 16MB and avoid the
> issue thru lack of time.

+1.

> 2. It's not clear to me the advantage of being able to pick varying
> filesizes. I see great disadvantage in having too many options, which
> greatly increases the chance of incompatibility, annoyance and
> breakage. I favour a small number of values that have been shown by
> testing to be sweet spots in performance and usability. (1GB has been
> suggested)

I'm in favor of 16,64,256,1024.

> 3. New file allocation has been a problem raised with this patch for
> some months now.

I've been playing around with this and I don't think short tests show
larger sizes off to advantage.  Larger segments will definitely perform
more poorly until Postgres starts recycling WAL.  Once that happens I
think performance differences should be negligible, though of course
this needs to be verified with longer-running tests.

If archive_timeout is set then there will also be additional time taken
to zero out potentially larger unused parts of the segment.  I don't see
this as an issue, however, because if archive_timeout is being triggered
then the system is very likely under lower than usual load.

> Lack of clarity and/or movement on these issues is very, very close to
> getting the patch rejected now. Let's not approach this with the
> viewpoint that I or others want it to be rejected, lets work forwards
> and get some solid changes that will improve the world without causing
> problems.

I would definitely like to see this go in, though I agree with Peter
that we need a lot more testing.  I'd like to see some build farm
animals testing with different sizes at the very least, even if there's
no time to add new tests.

-- 
-David
david@pgmasters.net



Re: [HACKERS] increasing the default WAL segment size

From
Tomas Vondra
Date:
On 04/06/2017 08:33 PM, David Steele wrote:
> On 4/5/17 7:29 AM, Simon Riggs wrote:
>> On 5 April 2017 at 06:04, Beena Emerson <memissemerson@gmail.com> wrote:
>>>
>>> The  WALfilename - LSN mapping disruption for higher values you mean? Is
>>> there anything else I have missed?
>>
>> I see various issues raised but not properly addressed
>>
>> 1. we would need to drop support for segment sizes < 16MB unless we
>> adopt a new incompatible filename format.
>> I think at 16MB the naming should be the same as now and that
>> WALfilename -> LSN is very important.
>> For this release, I think we should just allow >= 16MB and avoid the
>> issue thru lack of time.
>
> +1.
>
>> 2. It's not clear to me the advantage of being able to pick varying
>> filesizes. I see great disadvantage in having too many options, which
>> greatly increases the chance of incompatibility, annoyance and
>> breakage. I favour a small number of values that have been shown by
>> testing to be sweet spots in performance and usability. (1GB has been
>> suggested)
>
> I'm in favor of 16,64,256,1024.
>

I don't see a particular reason for this, TBH. The sweet spots will be 
likely dependent hardware / OS configuration etc. Assuming there 
actually are sweet spots - no one demonstrated that yet.

Also, I don't see how supporting additional WAL sizes increases chance 
of incompatibility. We already allow that, so either the tools (e.g. 
backup solutions) assume WAL segments are always 16MB (in which case are 
essentially broken) or support valid file sizes (in which case they 
should have no issues with the new ones).

If we're going to do this, I'm in favor of deciding some reasonable 
upper limit (say, 1GB or 2GB sounds good), and allowing all 2^n values 
up to that limit.

>> 3. New file allocation has been a problem raised with this patch for
>> some months now.
>
> I've been playing around with this and I don't think short tests show
> larger sizes off to advantage.  Larger segments will definitely perform
> more poorly until Postgres starts recycling WAL.  Once that happens I
> think performance differences should be negligible, though of course
> this needs to be verified with longer-running tests.
>

I'm willing to do some extensive performance testing on the patch. I 
don't see how that could happen in the next few day (before the feature 
freeze), particularly considering we're interested in long tests.

The question however is whether we need to do this testing when we don't 
actually change the default (at least the patch submitted on 3/27 does 
seem to keep the 16MB). I assume people specifying a custom value when 
calling initdb are expected to know what they are doing (and I don't see 
how we can prevent distros from choosing a bad value in their packages - 
they could already do that with configure-time option).

> If archive_timeout is set then there will also be additional time taken
> to zero out potentially larger unused parts of the segment.  I don't see
> this as an issue, however, because if archive_timeout is being triggered
> then the system is very likely under lower than usual load.
>
>> Lack of clarity and/or movement on these issues is very, very close to
>> getting the patch rejected now. Let's not approach this with the
>> viewpoint that I or others want it to be rejected, lets work forwards
>> and get some solid changes that will improve the world without causing
>> problems.
>
> I would definitely like to see this go in, though I agree with Peter
> that we need a lot more testing.  I'd like to see some build farm
> animals testing with different sizes at the very least, even if there's
> no time to add new tests.
>

Do we actually have any infrastructure for that? Or do you plan to add 
some new animals with different WAL segment sizes?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] increasing the default WAL segment size

From
David Steele
Date:
On 4/6/17 5:05 PM, Tomas Vondra wrote:
> On 04/06/2017 08:33 PM, David Steele wrote:
>> On 4/5/17 7:29 AM, Simon Riggs wrote:
>>
>>> 2. It's not clear to me the advantage of being able to pick varying
>>> filesizes. I see great disadvantage in having too many options, which
>>> greatly increases the chance of incompatibility, annoyance and
>>> breakage. I favour a small number of values that have been shown by
>>> testing to be sweet spots in performance and usability. (1GB has been
>>> suggested)
>>
>> I'm in favor of 16,64,256,1024.
>>
> 
> I don't see a particular reason for this, TBH. The sweet spots will be
> likely dependent hardware / OS configuration etc. Assuming there
> actually are sweet spots - no one demonstrated that yet.

Fair enough, but my feeling is that this patch has never been about
server performance, per se.  Rather, is is about archive management and
trying to stem the tide of WAL as servers get bigger and busier.
Generally, archive commands have to make a remote connection to offload
WAL and that has a cost per segment.

> Also, I don't see how supporting additional WAL sizes increases chance
> of incompatibility. We already allow that, so either the tools (e.g.
> backup solutions) assume WAL segments are always 16MB (in which case are
> essentially broken) or support valid file sizes (in which case they
> should have no issues with the new ones).

I don't see how a compile-time option counts as "supporting that" in
practice.  How many people in the field are running custom builds of
Postgres?  And of those, how many have changed the WAL segment size?
I've never encountered a non-standard segment size or talked to anyone
who has.  I'm not saying it has *never* happened but I would venture to
say it's rare.

> If we're going to do this, I'm in favor of deciding some reasonable
> upper limit (say, 1GB or 2GB sounds good), and allowing all 2^n values
> up to that limit.

I'm OK with that.  I'm also OK with providing a few reasonable choices.
I guess that means I'll just go with the majority opinion.

>>> 3. New file allocation has been a problem raised with this patch for
>>> some months now.
>>
>> I've been playing around with this and I don't think short tests show
>> larger sizes off to advantage.  Larger segments will definitely perform
>> more poorly until Postgres starts recycling WAL.  Once that happens I
>> think performance differences should be negligible, though of course
>> this needs to be verified with longer-running tests.
>>
> I'm willing to do some extensive performance testing on the patch. I
> don't see how that could happen in the next few day (before the feature
> freeze), particularly considering we're interested in long tests.

Cool.  I've been thinking about how to do some meaningful tests for this
(mostly pgbench related).  I'd like to hear what you are thinking.

> The question however is whether we need to do this testing when we don't
> actually change the default (at least the patch submitted on 3/27 does
> seem to keep the 16MB). I assume people specifying a custom value when
> calling initdb are expected to know what they are doing (and I don't see
> how we can prevent distros from choosing a bad value in their packages -
> they could already do that with configure-time option).

Just because we don't change the default doesn't mean that others won't.I still think testing for sizes other than 16MB
isseverely lacking and
 
I don't believe caveat emptor is the way to go.

> Do we actually have any infrastructure for that? Or do you plan to add
> some new animals with different WAL segment sizes?

I don't have plans to add animals.  I think we'd need a way to tell
'make check' to use a different segment size for tests and then
hopefully reconfigure some of the existing animals.

-- 
-David
david@pgmasters.net



Re: [HACKERS] increasing the default WAL segment size

From
Tomas Vondra
Date:
On 04/06/2017 11:45 PM, David Steele wrote:
> On 4/6/17 5:05 PM, Tomas Vondra wrote:
>> On 04/06/2017 08:33 PM, David Steele wrote:
>>> On 4/5/17 7:29 AM, Simon Riggs wrote:
>>>
>>>> 2. It's not clear to me the advantage of being able to pick varying
>>>> filesizes. I see great disadvantage in having too many options, which
>>>> greatly increases the chance of incompatibility, annoyance and
>>>> breakage. I favour a small number of values that have been shown by
>>>> testing to be sweet spots in performance and usability. (1GB has been
>>>> suggested)
>>>
>>> I'm in favor of 16,64,256,1024.
>>>
>>
>> I don't see a particular reason for this, TBH. The sweet spots will be
>> likely dependent hardware / OS configuration etc. Assuming there
>> actually are sweet spots - no one demonstrated that yet.
>
> Fair enough, but my feeling is that this patch has never been about
> server performance, per se.  Rather, is is about archive management and
> trying to stem the tide of WAL as servers get bigger and busier.
> Generally, archive commands have to make a remote connection to offload
> WAL and that has a cost per segment.
>

Perhaps, although Robert also mentioned that the fsync at the end of 
each WAL segment is noticeable. But the thread is a bit difficult to 
follow, different people have different ideas about the motivation of 
the patch, etc.

>> Also, I don't see how supporting additional WAL sizes increases chance
>> of incompatibility. We already allow that, so either the tools (e.g.
>> backup solutions) assume WAL segments are always 16MB (in which case are
>> essentially broken) or support valid file sizes (in which case they
>> should have no issues with the new ones).
>
> I don't see how a compile-time option counts as "supporting that" in
> practice.  How many people in the field are running custom builds of
> Postgres?  And of those, how many have changed the WAL segment size?
> I've never encountered a non-standard segment size or talked to anyone
> who has.  I'm not saying it has *never* happened but I would venture to
> say it's rare.
>

I agree it's rare, but I don't think that means we can just consider the 
option as 'unsupported'. We're even mentioning it in the docs as a valid 
way to customize granularity of the WAL archival.

I certainly know people who run custom builds, and some of them run with 
custom WAL segment size. Some of them are our customers, some are not. 
And yes, some of them actually patched the code to allow 256MB WAL segments.

>> If we're going to do this, I'm in favor of deciding some reasonable
>> upper limit (say, 1GB or 2GB sounds good), and allowing all 2^n values
>> up to that limit.
>
> I'm OK with that.  I'm also OK with providing a few reasonable choices.
> I guess that means I'll just go with the majority opinion.
>
>>>> 3. New file allocation has been a problem raised with this patch for
>>>> some months now.
>>>
>>> I've been playing around with this and I don't think short tests show
>>> larger sizes off to advantage.  Larger segments will definitely perform
>>> more poorly until Postgres starts recycling WAL.  Once that happens I
>>> think performance differences should be negligible, though of course
>>> this needs to be verified with longer-running tests.
>>>
>> I'm willing to do some extensive performance testing on the patch. I
>> don't see how that could happen in the next few day (before the feature
>> freeze), particularly considering we're interested in long tests.
>
> Cool.  I've been thinking about how to do some meaningful tests for this
> (mostly pgbench related).  I'd like to hear what you are thinking.
>

My plan was to do some pgbench tests with different workloads, scales 
(in shared buffers, in RAM, exceeds RAM), and different storage 
configurations (SSD vs. HDD, WAL/datadir on the same/different 
device/fs, possibly also ext4/xfs).

>> The question however is whether we need to do this testing when we don't
>> actually change the default (at least the patch submitted on 3/27 does
>> seem to keep the 16MB). I assume people specifying a custom value when
>> calling initdb are expected to know what they are doing (and I don't see
>> how we can prevent distros from choosing a bad value in their packages -
>> they could already do that with configure-time option).
>
> Just because we don't change the default doesn't mean that others won't.
>  I still think testing for sizes other than 16MB is severely lacking and
> I don't believe caveat emptor is the way to go.
>

Aren't you mixing regression and performance testing? I agree we need to 
be sure all segment sizes are handled correctly, no argument here.

>> Do we actually have any infrastructure for that? Or do you plan to add
>> some new animals with different WAL segment sizes?
>
> I don't have plans to add animals.  I think we'd need a way to tell
> 'make check' to use a different segment size for tests and then
> hopefully reconfigure some of the existing animals.
>

OK. My point was that we don't have that capability now, and the latest 
patch is not adding it either.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] increasing the default WAL segment size

From
David Steele
Date:
On 4/6/17 6:52 PM, Tomas Vondra wrote:
> On 04/06/2017 11:45 PM, David Steele wrote:
>>
>> How many people in the field are running custom builds of
>> Postgres?  And of those, how many have changed the WAL segment size?
>> I've never encountered a non-standard segment size or talked to anyone
>> who has.  I'm not saying it has *never* happened but I would venture to
>> say it's rare.
> 
> I agree it's rare, but I don't think that means we can just consider the
> option as 'unsupported'. We're even mentioning it in the docs as a valid
> way to customize granularity of the WAL archival.
> 
> I certainly know people who run custom builds, and some of them run with
> custom WAL segment size. Some of them are our customers, some are not.
> And yes, some of them actually patched the code to allow 256MB WAL
> segments.

I feel a little better knowing that *somebody* is doing it in the field.

>> Just because we don't change the default doesn't mean that others won't.
>>  I still think testing for sizes other than 16MB is severely lacking and
>> I don't believe caveat emptor is the way to go.
> 
> Aren't you mixing regression and performance testing? I agree we need to
> be sure all segment sizes are handled correctly, no argument here.

Not intentionally.  Our standard test suite is only regression as far as
I can see.  All the performance testing I've seen has been done ad hoc.

>> I don't have plans to add animals.  I think we'd need a way to tell
>> 'make check' to use a different segment size for tests and then
>> hopefully reconfigure some of the existing animals.
> 
> OK. My point was that we don't have that capability now, and the latest
> patch is not adding it either.

Agreed.

-- 
-David
david@pgmasters.net



Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:
Hello,

On Wed, Apr 5, 2017 at 6:06 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

(Roughly speaking, to get started, this would mean compiling with
--with-wal-segsize 16, 32, 64, 128, 256, run make check-world both
sequentially and in parallel, and take note of a) passing, b) run time,
c) disk space.)


The attached patch updates a pg_upgrade test which fails for higher segment values: The output of SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot1'}.

The following are the results of the installcheck-world execution.

wal_size     time                   cluster_size   pg_wal       files
16             5m32.761s           539M             417M          26
32             5m32.618s           545M             417M         13
64             5m39.685s           571M             449M          7
128           5m52.961s            641M             513M          4
256           6m13.402s           635M             512M           2
512           7m3.252s             1.2G              1G               2
1024         9m0.205s             1.2G               1G              1


wal_size     time                   cluster_size   pg_wal       files
16             5m31.137s           542M             417M          26
32             5m29.532s          539M             417M         13
64             5m36.189s           571M             449M          7
128           5m50.027s            635M             513M          4
256           6m13.603s           635M             512M           2
512           7m4.154s             1.2G               1G               2
1024         8m55.357s            1.2G               1G              1

For every test, except for connect/test5 in src/interfaces/ecpg, all else passed.

We can see that smaller chunks take lesser time for the same amount of WAL (128 and 256, 512 and 1024). But these tests are not large enough to conclude. 


Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:


On Fri, Apr 7, 2017 at 2:35 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On 04/06/2017 08:33 PM, David Steele wrote:

I'm in favor of 16,64,256,1024.


I don't see a particular reason for this, TBH. The sweet spots will be likely dependent hardware / OS configuration etc. Assuming there actually are sweet spots - no one demonstrated that yet.

Also, I don't see how supporting additional WAL sizes increases chance of incompatibility. We already allow that, so either the tools (e.g. backup solutions) assume WAL segments are always 16MB (in which case are essentially broken) or support valid file sizes (in which case they should have no issues with the new ones).

If we're going to do this, I'm in favor of deciding some reasonable upper limit (say, 1GB or 2GB sounds good), and allowing all 2^n values up to that limit.

I think the majority consensus is to use all valid values. Since 1GB is what we have finalized as the upper limit, lets continue with that for now. 


--

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:
I ran tests and following are the details:

Machine details:
Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                192
On-line CPU(s) list:   0-191
Thread(s) per core:    8
Core(s) per socket:    1
Socket(s):             24
NUMA node(s):          4
Model:                 IBM,8286-42A

clients>          16                      32                       64                    128
size
16MB      18895.63486     28799.48759     37855.39521     27968.88309
32MB      18313.1461      29201.44954     40733.80051     32458.74147
64 MB    18055.73141     30875.28687     42713.54447     38009.60542
128MB   18234.31424     33208.65419     48604.5593      45498.27689
256MB    19524.36498     35740.19032     54686.16898     54060.11168
512MB     20351.90719     37426.72174     55045.60719     56194.99349
1024MB   19667.67062     35696.19194     53666.60373     54353.0614

I did not get any degradation, in fact, higher values showed performance improvement for higher client count.

--

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] increasing the default WAL segment size

From
David Steele
Date:
On 4/7/17 2:59 AM, Beena Emerson wrote:
> I ran tests and following are the details:
> 
> Machine details:
> Architecture:          ppc64le
> Byte Order:            Little Endian
> CPU(s):                192
> On-line CPU(s) list:   0-191
> Thread(s) per core:    8
> Core(s) per socket:    1
> Socket(s):             24
> NUMA node(s):          4
> Model:                 IBM,8286-42A
> 
> clients>          16                      32                       64  
>                  128
> size
> 16MB      18895.63486     28799.48759     37855.39521     27968.88309
> 32MB      18313.1461      29201.44954     40733.80051     32458.74147
> 64 MB    18055.73141     30875.28687     42713.54447     38009.60542
> 128MB   18234.31424     33208.65419     48604.5593      45498.27689
> 256MB    19524.36498     35740.19032     54686.16898     54060.11168
> 512MB     20351.90719     37426.72174     55045.60719     56194.99349
> 1024MB   19667.67062     35696.19194     53666.60373     54353.0614
> 
> I did not get any degradation, in fact, higher values showed performance
> improvement for higher client count.

This submission has been moved to CF 2017-07.

-- 
-David
david@pgmasters.net



Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:
Hello,



On Tue, Mar 28, 2017 at 1:06 AM, Beena Emerson <memissemerson@gmail.com> wrote:
> Hello,
>
> On Sat, Mar 25, 2017 at 10:32 PM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>>
>> At this point, I suggest splitting this patch up into several
>> potentially less controversial pieces.
>>
>> One big piece is that we currently don't support segment sizes larger
>> than 64 GB, for various internal arithmetic reasons.  Your patch appears
>> to address that.  So I suggest isolating that.  Assuming it works
>> correctly, I think there would be no great concern about it.
>>
>> The next piece would be making the various tools aware of varying
>> segment sizes without having to rely on a built-in value.
>>
>> The third piece would then be the rest that allows you to set the size
>> at initdb
>>
>> If we take these in order, we would make it easier to test various sizes
>> and see if there are any more unforeseen issues when changing sizes.  It
>> would also make it easier to do performance testing so we can address
>> the original question of what the default size should be.
>
>
> PFA the patches divided into 3 parts:
>
> 02-increase-max-wal-segsize.patch - Increases the wal-segsize and changes
> the internal representation of max_wal_size and min_wal_size to mb.

Already committed.

>
> 03-modify-tools.patch - Makes XLogSegSize into a variable, currently set as
> XLOG_SEG_SIZE and modifies the tools to fetch the size instead of using
> inbuilt value.
>

The updated 03-modify-tools_v2.patch has the following changes:
 - Rebased over current HEAD
 - Impoverished comments
 - Adding error messages where applicable.
 - Replace XLOG_SEG_SIZE in the tools and xlog_internal.h to
XLogSegSize. XLOG_SEG_SIZE is the wal_segment_size the server was
compiled with and XLogSegSize is the wal_segment_size of the target
server on which the tool is run. When the binaries used and the target
server are compiled with different wal_segment_size, the calculations
would be be affected and the tool would crash. To avoid it, all the
calculations used by tool should use XLogSegSize.
 - pg_waldump : The  fuzzy_open_file is split into two functions -
open_file_in_directory and identify_target_directory so that code can
be reused when determining the XLogSegSize from the WAL file header.
 - IsValidXLogSegSize macro is moved from 04 to here so that we can
use it for validating the size in all the tools.


> 04-initdb-walsegsize.patch - Adds the initdb option to set wal-segsize and
> make related changes. Update pg_test_fsync to use DEFAULT_XLOG_SEG_SIZE
> instead of XLOG_SEG_SIZE

The 04-initdb-walsegsize_v2.patch has the following improvements:
- Rebased over new 03 patch
- Pass the wal-segsize intidb option as command-line option rathern
than in an environment variable.
- Since new function check_wal_size had only had two checks and was
sed once, moved the code to ReadControlFile where it is used and
removed this function.
- improve comments and add validations where required.
- Use DEFAULT_XLOG_SEG_SIZE to set the min_wal_size and
max_wal_size,instead of the value 16.
- Use XLogSegMaxSize and XLogSegMinSize to calculate the range of guc
wal_segment_size instead 16 - INT_MAX.


>
>>
>> One concern I have is that your patch does not contain any tests.  There
>> should probably be lots of tests.
>
>
> 05-initdb_tests.patch adds tap tests to initialize cluster with different
> wal_segment_size and then check the config values. What other tests do you
> have in mind? Checking the various tools?
>
>




-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] increasing the default WAL segment size

From
tushar
Date:
On 07/06/2017 12:04 PM, Beena Emerson wrote:
> The 04-initdb-walsegsize_v2.patch has the following improvements:
> - Rebased over new 03 patch
> - Pass the wal-segsize intidb option as command-line option rathern
> than in an environment variable.
> - Since new function check_wal_size had only had two checks and was
> sed once, moved the code to ReadControlFile where it is used and
> removed this function.
> - improve comments and add validations where required.
> - Use DEFAULT_XLOG_SEG_SIZE to set the min_wal_size and
> max_wal_size,instead of the value 16.
> - Use XLogSegMaxSize and XLogSegMinSize to calculate the range of guc
> wal_segment_size instead 16 - INT_MAX.
Thanks Beena. I tested with below following scenarios  and all are 
working as expected
.)Different WAL segment size i.e 1,2,8,16,32,64,512,1024 at the time of 
initdb
.)SR setup in place.
.)Combinations of  max/min_wal_size in postgresql.conf with different 
wal_segment_size.
.)shutdown the server forcefully (kill -9) / promote slave / to make 
sure -recovery happened successfully.
.)with different utilities like pg_resetwal/pg_upgrade/pg_waldump
.)running pg_bench for substantial workloads (~ 1 hour)
.)WAL segment size is not default (changed at the time of ./configure) + 
different wal_segment_size (at the time of initdb) .

Things looks fine.

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:
On Thu, Jul 6, 2017 at 3:21 PM, tushar <tushar.ahuja@enterprisedb.com> wrote:
> On 07/06/2017 12:04 PM, Beena Emerson wrote:
>>
>> The 04-initdb-walsegsize_v2.patch has the following improvements:
>> - Rebased over new 03 patch
>> - Pass the wal-segsize intidb option as command-line option rathern
>> than in an environment variable.
>> - Since new function check_wal_size had only had two checks and was
>> sed once, moved the code to ReadControlFile where it is used and
>> removed this function.
>> - improve comments and add validations where required.
>> - Use DEFAULT_XLOG_SEG_SIZE to set the min_wal_size and
>> max_wal_size,instead of the value 16.
>> - Use XLogSegMaxSize and XLogSegMinSize to calculate the range of guc
>> wal_segment_size instead 16 - INT_MAX.
>
> Thanks Beena. I tested with below following scenarios  and all are working
> as expected
> .)Different WAL segment size i.e 1,2,8,16,32,64,512,1024 at the time of
> initdb
> .)SR setup in place.
> .)Combinations of  max/min_wal_size in postgresql.conf with different
> wal_segment_size.
> .)shutdown the server forcefully (kill -9) / promote slave / to make sure
> -recovery happened successfully.
> .)with different utilities like pg_resetwal/pg_upgrade/pg_waldump
> .)running pg_bench for substantial workloads (~ 1 hour)
> .)WAL segment size is not default (changed at the time of ./configure) +
> different wal_segment_size (at the time of initdb) .
>
> Things looks fine.
>

Thank you Tushar.


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] increasing the default WAL segment size

From
Andres Freund
Date:
Hi,

Personally I find the split between 03 and 04 and their naming a bit
confusing. I'd rather just merge them.  These patches need a rebase,
they don't apply anymore.


On 2017-07-06 12:04:12 +0530, Beena Emerson wrote:
> @@ -4813,6 +4836,18 @@ XLOGShmemSize(void)
>      {
>          char        buf[32];
>  
> +        /*
> +         * The calculation of XLOGbuffers requires the run-time parameter
> +         * XLogSegSize which is set from the control file. This value is
> +         * required to create the shared memory segment. Hence, temporarily
> +         * allocate space for reading the control file.
> +         */

This makes me uncomfortable.  Having to choose the control file multiple
times seems wrong.  We're effectively treating the control file as part
of the configuration now, and that means we should move it's parsing to
an earlier part of startup.


> +        if (!IsBootstrapProcessingMode())
> +        {
> +            ControlFile = palloc(sizeof(ControlFileData));
> +            ReadControlFile();
> +            pfree(ControlFile);

General plea: Please reset to NULL in cases like this, otherwise the
pointer will [temporarily] point into a freed memory location, which
makes debugging harder.



> @@ -8146,6 +8181,9 @@ InitXLOGAccess(void)
>      ThisTimeLineID = XLogCtl->ThisTimeLineID;
>      Assert(ThisTimeLineID != 0 || IsBootstrapProcessingMode());
>  
> +    /* set XLogSegSize */
> +    XLogSegSize = ControlFile->xlog_seg_size;
> +

Hm, why do we have two variables keeping track of the segment size?
wal_segment_size and XLogSegSize? That's bound to lead to confusion.


>      /* Use GetRedoRecPtr to copy the RedoRecPtr safely */
>      (void) GetRedoRecPtr();
>      /* Also update our copy of doPageWrites. */
> diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
> index b3f0b3c..d2c524b 100644
> --- a/src/backend/bootstrap/bootstrap.c
> +++ b/src/backend/bootstrap/bootstrap.c
> @@ -19,6 +19,7 @@
>  
>  #include "access/htup_details.h"
>  #include "access/xact.h"
> +#include "access/xlog_internal.h"
>  #include "bootstrap/bootstrap.h"
>  #include "catalog/index.h"
>  #include "catalog/pg_collation.h"
> @@ -47,6 +48,7 @@
>  #include "utils/tqual.h"
>  
>  uint32        bootstrap_data_checksum_version = 0;    /* No checksum */
> +uint32        XLogSegSize;

Se we define the same variable declared in a header in multiple files
(once for each applicationq)?  I'm pretty strongly against that. Why's
that needed/a good idea?  Neither is it clear to me why the definition
was moved from xlog.c to bootstrap.c? That doesn't sound like a natural
place.


>  /*
> + * Calculate the default wal_size in proper unit.
> + */
> +static char *
> +pretty_wal_size(int segment_count)
> +{
> +    double        val = wal_segment_size / (1024 * 1024) * segment_count;
> +    double        temp_val;
> +    char       *result = malloc(10);
> +
> +    /*
> +     * Wal segment size ranges from 1MB to 1GB and the default
> +     * segment_count is 5 for min_wal_size and 64 for max_wal_size, so the
> +     * final values can range from 5MB to 64GB.
> +     */

Referencing the defaults here seems unnecessary. And nearly a guarantee
that the values in the comment will go out of date soon-ish.


> +    /* set default max_wal_size and min_wal_size */
> +    snprintf(repltok, sizeof(repltok), "min_wal_size = %s",
> +             pretty_wal_size(DEFAULT_MIN_WAL_SEGS));
> +    conflines = replace_token(conflines, "#min_wal_size = 80MB", repltok);
> +
> +    snprintf(repltok, sizeof(repltok), "max_wal_size = %s",
> +             pretty_wal_size(DEFAULT_MAX_WAL_SEGS));
> +    conflines = replace_token(conflines, "#max_wal_size = 1GB", repltok);
> +

Hm. So postgresql.conf.sample values are now going to contain misleading
information for clusters with non-default segment sizes.

Have we discussed instead defaulting min_wal_size/max_wal_size to a
constant amount of megabytes and rounding up when it doesn't work for
a particular segment size?


> diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
> index 9c0039c..c805f12 100644
> --- a/src/include/access/xlog_internal.h
> +++ b/src/include/access/xlog_internal.h
> @@ -91,6 +91,11 @@ typedef XLogLongPageHeaderData *XLogLongPageHeader;
>   */
>  
>  extern uint32 XLogSegSize;
> +#define XLOG_SEG_SIZE XLogSegSize

I don't think this is a good idea, we should rather rip the bandaid
of and remove this macro. If people are assuming it's a macro they'll
just run into more confusing errors/problems.


> diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
> index f3b3529..f31c30e 100644
> --- a/src/include/pg_config_manual.h
> +++ b/src/include/pg_config_manual.h
> @@ -14,6 +14,12 @@
>   */
>  
>  /*
> + * This is default value for WAL_segment_size to be used at intidb when run
> + * without --walsegsize option.
> + */

WAL_segment_size is a bit weirdly cased...


> diff --git a/contrib/pg_standby/pg_standby.c b/contrib/pg_standby/pg_standby.c
> index d7fa2a8..279728d 100644
> --- a/contrib/pg_standby/pg_standby.c
> +++ b/contrib/pg_standby/pg_standby.c
> @@ -33,9 +33,12 @@
>  #include "pg_getopt.h"
>  
>  #include "access/xlog_internal.h"
> +#include "access/xlogreader.h"
>  
>  const char *progname;
>  
> +uint32        XLogSegSize;
> +
>  /* Options and defaults */
>  int            sleeptime = 5;        /* amount of time to sleep between file checks */
>  int            waittime = -1;        /* how long we have been waiting, -1 no wait
> @@ -100,6 +103,72 @@ int            nextWALFileType;
>  
>  struct stat stat_buf;
>  
> +static bool SetWALFileNameForCleanup(void);
> +
> +/* Set XLogSegSize from the WAL file specified by WALFilePath */

Hm. Why don't we instead accept the segment size as a parameter expanded
in restore_command? Then this magic isn't necessary. This won't be the
only command needing it.


> +static bool
> +RetrieveXLogSegSize()

Please add void as argument.


> -#define MaxSegmentsPerLogFile ( 0xFFFFFFFF / XLOG_SEG_SIZE )
> -
>  static void
>  CustomizableCleanupPriorWALFiles(void)
>  {
> @@ -315,6 +384,7 @@ SetWALFileNameForCleanup(void)
>      uint32        log_diff = 0,
>                  seg_diff = 0;
>      bool        cleanup = false;
> +    int            MaxSegmentsPerLogFile = (0xFFFFFFFF / XLogSegSize);

Inconsistent variable naming here.

>  /*
> + * From version 10, explicitly set XLogSegSize using SHOW wal_segment_size
> + * since ControlFile is not accessible here.
> + */
> +bool
> +RetrieveXLogSegSize(PGconn *conn)
> +{

> +    /* wal_segment_size ranges from 1MB to 1GB */
> +    tmp_result = pg_strdup(PQgetvalue(res, 0, 0));

Why strdup if we just do a sscanf?

> +/*
> + * Try to find fname in the given directory. Returns true if it is found,
> + * false otherwise. If fname is NULL, search the complete directory for any
> + * file with a valid WAL file name.
> + */
> +static bool
> +search_directory(char *directory, char *fname)

This doesn't mention an important fact, namely that this routine tries
to figure out XLogSegSize from the file...

This is kind of an ugly approach, but I don't see anything really simpler.

> +/* check that the given size is a valid XLogSegSize */
> +#define IsPowerOf2(x) (((x) & ((x)-1)) == 0)

Not that it really matters here, but this isn't correct for 0 I
believe.

> +#define IsValidXLogSegSize(size) \
> +     (IsPowerOf2(size) && \
> +     (size >= XLogSegMinSize && size <= XLogSegMaxSize))
> +

Please wrap references to size in parens.

Should we consider making this an inline function instead? There's
some multiple evaluation hazard here...


> +#define XLogSegmentsPerXLogId    (UINT64CONST(0x100000000) / XLogSegSize)
>  
>  #define XLogSegNoOffsetToRecPtr(segno, offset, dest) \
> -        (dest) = (segno) * XLOG_SEG_SIZE + (offset)
> +        (dest) = (segno) * XLogSegSize + (offset)

I don't think it's a good idea to implicitly reference a global variable
in such a macro. IOW, I think this needs to grow another parameter, and
callers should get adjusted.  I know this'll affect a number of macros,
but it still seems like the right thing to do.  I'd welcome other
opinions on this.

Greetings,

Andres Freund



Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:
PFA the updated patches.

On Wed, Aug 16, 2017 at 1:55 AM, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> Personally I find the split between 03 and 04 and their naming a bit
> confusing. I'd rather just merge them.  These patches need a rebase,
> they don't apply anymore.

01 is rebased. 04 and 03 are now merged into
02-initdb-configurable-walsegsize.patch. It also fixes a issue on
Windows, the XLogSegSize is now passed through BackendParameters so
the values are available during process forking.

>
>
> On 2017-07-06 12:04:12 +0530, Beena Emerson wrote:
>> @@ -4813,6 +4836,18 @@ XLOGShmemSize(void)
>>       {
>>               char            buf[32];
>>
>> +             /*
>> +              * The calculation of XLOGbuffers requires the run-time parameter
>> +              * XLogSegSize which is set from the control file. This value is
>> +              * required to create the shared memory segment. Hence, temporarily
>> +              * allocate space for reading the control file.
>> +              */
>
> This makes me uncomfortable.  Having to choose the control file multiple
> times seems wrong.  We're effectively treating the control file as part
> of the configuration now, and that means we should move it's parsing to
> an earlier part of startup.

Yes, this may seem ugly. ControlFile was originally read into the
shared memory segment but then we now need the XLogSegSize from the
ControlFile to initialise the shared memory segment. I could not
figure out any other way to achieve this.

>
>
>> +             if (!IsBootstrapProcessingMode())
>> +             {
>> +                     ControlFile = palloc(sizeof(ControlFileData));
>> +                     ReadControlFile();
>> +                     pfree(ControlFile);
>
> General plea: Please reset to NULL in cases like this, otherwise the
> pointer will [temporarily] point into a freed memory location, which
> makes debugging harder.

done.

>
>
>
>> @@ -8146,6 +8181,9 @@ InitXLOGAccess(void)
>>       ThisTimeLineID = XLogCtl->ThisTimeLineID;
>>       Assert(ThisTimeLineID != 0 || IsBootstrapProcessingMode());
>>
>> +     /* set XLogSegSize */
>> +     XLogSegSize = ControlFile->xlog_seg_size;
>> +
>
> Hm, why do we have two variables keeping track of the segment size?
> wal_segment_size and XLogSegSize? That's bound to lead to confusion.
>

wal_segment_size is the guc which stores the number of segments
(XLogSegSize / XLOG_BLCKSZ).

>
>>       /* Use GetRedoRecPtr to copy the RedoRecPtr safely */
>>       (void) GetRedoRecPtr();
>>       /* Also update our copy of doPageWrites. */
>> diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
>> index b3f0b3c..d2c524b 100644
>> --- a/src/backend/bootstrap/bootstrap.c
>> +++ b/src/backend/bootstrap/bootstrap.c
>> @@ -19,6 +19,7 @@
>>
>>  #include "access/htup_details.h"
>>  #include "access/xact.h"
>> +#include "access/xlog_internal.h"
>>  #include "bootstrap/bootstrap.h"
>>  #include "catalog/index.h"
>>  #include "catalog/pg_collation.h"
>> @@ -47,6 +48,7 @@
>>  #include "utils/tqual.h"
>>
>>  uint32               bootstrap_data_checksum_version = 0;    /* No checksum */
>> +uint32               XLogSegSize;
>
> Se we define the same variable declared in a header in multiple files
> (once for each applicationq)?  I'm pretty strongly against that. Why's
> that needed/a good idea?  Neither is it clear to me why the definition
> was moved from xlog.c to bootstrap.c? That doesn't sound like a natural
> place.

I have moved back to xlog.c.

>
>
>>  /*
>> + * Calculate the default wal_size in proper unit.
>> + */
>> +static char *
>> +pretty_wal_size(int segment_count)
>> +{
>> +     double          val = wal_segment_size / (1024 * 1024) * segment_count;
>> +     double          temp_val;
>> +     char       *result = malloc(10);
>> +
>> +     /*
>> +      * Wal segment size ranges from 1MB to 1GB and the default
>> +      * segment_count is 5 for min_wal_size and 64 for max_wal_size, so the
>> +      * final values can range from 5MB to 64GB.
>> +      */
>
> Referencing the defaults here seems unnecessary. And nearly a guarantee
> that the values in the comment will go out of date soon-ish.

Removed the comment.

>
>
>> +     /* set default max_wal_size and min_wal_size */
>> +     snprintf(repltok, sizeof(repltok), "min_wal_size = %s",
>> +                      pretty_wal_size(DEFAULT_MIN_WAL_SEGS));
>> +     conflines = replace_token(conflines, "#min_wal_size = 80MB", repltok);
>> +
>> +     snprintf(repltok, sizeof(repltok), "max_wal_size = %s",
>> +                      pretty_wal_size(DEFAULT_MAX_WAL_SEGS));
>> +     conflines = replace_token(conflines, "#max_wal_size = 1GB", repltok);
>> +
>
> Hm. So postgresql.conf.sample values are now going to contain misleading
> information for clusters with non-default segment sizes.
>
> Have we discussed instead defaulting min_wal_size/max_wal_size to a
> constant amount of megabytes and rounding up when it doesn't work for
> a particular segment size?

This was not discussed.

In the original code, the min_wal_size and max_wal_size are computed
in the guc.c for any wal_segment_size set at configure.

    {
        {"min_wal_size", PGC_SIGHUP, WAL_CHECKPOINTS,
            gettext_noop("Sets the minimum size to shrink the WAL to."),
            NULL,
            GUC_UNIT_MB
        },
        &min_wal_size_mb,
        5 * (XLOG_SEG_SIZE / (1024 * 1024)), 2, MAX_KILOBYTES,
        NULL, NULL, NULL
    },

    {
        {"max_wal_size", PGC_SIGHUP, WAL_CHECKPOINTS,
            gettext_noop("Sets the WAL size that triggers a checkpoint."),
            NULL,
            GUC_UNIT_MB
        },
        &max_wal_size_mb,
        64 * (XLOG_SEG_SIZE / (1024 * 1024)), 2, MAX_KILOBYTES,
        NULL, assign_max_wal_size, NULL
    },

Hence I have retained the same calculation for min_wal_size and
max_wal_size. If we get consensus for fixing a default and updating
when required, then I will change the code accordingly.


>
>
>> diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
>> index 9c0039c..c805f12 100644
>> --- a/src/include/access/xlog_internal.h
>> +++ b/src/include/access/xlog_internal.h
>> @@ -91,6 +91,11 @@ typedef XLogLongPageHeaderData *XLogLongPageHeader;
>>   */
>>
>>  extern uint32 XLogSegSize;
>> +#define XLOG_SEG_SIZE XLogSegSize
>
> I don't think this is a good idea, we should rather rip the bandaid
> of and remove this macro. If people are assuming it's a macro they'll
> just run into more confusing errors/problems.
>

Okay. done.

>
>> diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
>> index f3b3529..f31c30e 100644
>> --- a/src/include/pg_config_manual.h
>> +++ b/src/include/pg_config_manual.h
>> @@ -14,6 +14,12 @@
>>   */
>>
>>  /*
>> + * This is default value for WAL_segment_size to be used at intidb when run
>> + * without --walsegsize option.
>> + */
>
> WAL_segment_size is a bit weirdly cased...

corrected.

>
>
>> diff --git a/contrib/pg_standby/pg_standby.c b/contrib/pg_standby/pg_standby.c
>> index d7fa2a8..279728d 100644
>> --- a/contrib/pg_standby/pg_standby.c
>> +++ b/contrib/pg_standby/pg_standby.c
>> @@ -33,9 +33,12 @@
>>  #include "pg_getopt.h"
>>
>>  #include "access/xlog_internal.h"
>> +#include "access/xlogreader.h"
>>
>>  const char *progname;
>>
>> +uint32               XLogSegSize;
>> +
>>  /* Options and defaults */
>>  int                  sleeptime = 5;          /* amount of time to sleep between file checks */
>>  int                  waittime = -1;          /* how long we have been waiting, -1 no wait
>> @@ -100,6 +103,72 @@ int                      nextWALFileType;
>>
>>  struct stat stat_buf;
>>
>> +static bool SetWALFileNameForCleanup(void);
>> +
>> +/* Set XLogSegSize from the WAL file specified by WALFilePath */
>
> Hm. Why don't we instead accept the segment size as a parameter expanded
> in restore_command? Then this magic isn't necessary. This won't be the
> only command needing it.
>
>
>> +static bool
>> +RetrieveXLogSegSize()
>
> Please add void as argument.

done.


>> -#define MaxSegmentsPerLogFile ( 0xFFFFFFFF / XLOG_SEG_SIZE )
>> -
>>  static void
>>  CustomizableCleanupPriorWALFiles(void)
>>  {
>> @@ -315,6 +384,7 @@ SetWALFileNameForCleanup(void)
>>       uint32          log_diff = 0,
>>                               seg_diff = 0;
>>       bool            cleanup = false;
>> +     int                     MaxSegmentsPerLogFile = (0xFFFFFFFF / XLogSegSize);
>
> Inconsistent variable naming here.

XLOG_SEG_SIZE is now removed so we can only use the variable XLogSegSize.

>
>>  /*
>> + * From version 10, explicitly set XLogSegSize using SHOW wal_segment_size
>> + * since ControlFile is not accessible here.
>> + */
>> +bool
>> +RetrieveXLogSegSize(PGconn *conn)
>> +{
>
>> +     /* wal_segment_size ranges from 1MB to 1GB */
>> +     tmp_result = pg_strdup(PQgetvalue(res, 0, 0));
>
> Why strdup if we just do a sscanf?

Fixed.

>
>> +/*
>> + * Try to find fname in the given directory. Returns true if it is found,
>> + * false otherwise. If fname is NULL, search the complete directory for any
>> + * file with a valid WAL file name.
>> + */
>> +static bool
>> +search_directory(char *directory, char *fname)
>
> This doesn't mention an important fact, namely that this routine tries
> to figure out XLogSegSize from the file...

Added to the comment.

>
> This is kind of an ugly approach, but I don't see anything really simpler.
>
>> +/* check that the given size is a valid XLogSegSize */
>> +#define IsPowerOf2(x) (((x) & ((x)-1)) == 0)
>
> Not that it really matters here, but this isn't correct for 0 I
> believe.
>
>> +#define IsValidXLogSegSize(size) \
>> +      (IsPowerOf2(size) && \
>> +      (size >= XLogSegMinSize && size <= XLogSegMaxSize))
>> +
>
> Please wrap references to size in parens.

Added the parens.

>
> Should we consider making this an inline function instead? There's
> some multiple evaluation hazard here...
>
>
>> +#define XLogSegmentsPerXLogId        (UINT64CONST(0x100000000) / XLogSegSize)
>>
>>  #define XLogSegNoOffsetToRecPtr(segno, offset, dest) \
>> -             (dest) = (segno) * XLOG_SEG_SIZE + (offset)
>> +             (dest) = (segno) * XLogSegSize + (offset)
>
> I don't think it's a good idea to implicitly reference a global variable
> in such a macro. IOW, I think this needs to grow another parameter, and
> callers should get adjusted.  I know this'll affect a number of macros,
> but it still seems like the right thing to do.  I'd welcome other
> opinions on this.

I have added this change in the separate patch
(03-modify-xlog-macros.patch) since it touches a lot of files.


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] increasing the default WAL segment size

From
Andres Freund
Date:
Hi,

On 2017-08-23 12:13:15 +0530, Beena Emerson wrote:
> >> +             /*
> >> +              * The calculation of XLOGbuffers requires the run-time parameter
> >> +              * XLogSegSize which is set from the control file. This value is
> >> +              * required to create the shared memory segment. Hence, temporarily
> >> +              * allocate space for reading the control file.
> >> +              */
> >
> > This makes me uncomfortable.  Having to choose the control file multiple
> > times seems wrong.  We're effectively treating the control file as part
> > of the configuration now, and that means we should move it's parsing to
> > an earlier part of startup.
> 
> Yes, this may seem ugly. ControlFile was originally read into the
> shared memory segment but then we now need the XLogSegSize from the
> ControlFile to initialise the shared memory segment. I could not
> figure out any other way to achieve this.

I think reading it one into local memory inside the startup process and
then copying it into shared memory from there should work?


> >> @@ -8146,6 +8181,9 @@ InitXLOGAccess(void)
> >>       ThisTimeLineID = XLogCtl->ThisTimeLineID;
> >>       Assert(ThisTimeLineID != 0 || IsBootstrapProcessingMode());
> >>
> >> +     /* set XLogSegSize */
> >> +     XLogSegSize = ControlFile->xlog_seg_size;
> >> +
> >
> > Hm, why do we have two variables keeping track of the segment size?
> > wal_segment_size and XLogSegSize? That's bound to lead to confusion.
> >
> 
> wal_segment_size is the guc which stores the number of segments
> (XLogSegSize / XLOG_BLCKSZ).

wal_segment_size and XLogSegSize are the same name, spelt different, so
if that's where we want to go, we should name them differently. But
perhaps more fundamentally, I don't see why we need both: What stops us
from just defining the GUC in bytes?

Regards,

Andres



Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:
On Wed, Aug 30, 2017 at 4:43 AM, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2017-08-23 12:13:15 +0530, Beena Emerson wrote:
>> >> +             /*
>> >> +              * The calculation of XLOGbuffers requires the run-time parameter
>> >> +              * XLogSegSize which is set from the control file. This value is
>> >> +              * required to create the shared memory segment. Hence, temporarily
>> >> +              * allocate space for reading the control file.
>> >> +              */
>> >
>> > This makes me uncomfortable.  Having to choose the control file multiple
>> > times seems wrong.  We're effectively treating the control file as part
>> > of the configuration now, and that means we should move it's parsing to
>> > an earlier part of startup.
>>
>> Yes, this may seem ugly. ControlFile was originally read into the
>> shared memory segment but then we now need the XLogSegSize from the
>> ControlFile to initialise the shared memory segment. I could not
>> figure out any other way to achieve this.
>
> I think reading it one into local memory inside the startup process and
> then copying it into shared memory from there should work?
>.

Done.

>
>> >> @@ -8146,6 +8181,9 @@ InitXLOGAccess(void)
>> >>       ThisTimeLineID = XLogCtl->ThisTimeLineID;
>> >>       Assert(ThisTimeLineID != 0 || IsBootstrapProcessingMode());
>> >>
>> >> +     /* set XLogSegSize */
>> >> +     XLogSegSize = ControlFile->xlog_seg_size;
>> >> +
>> >
>> > Hm, why do we have two variables keeping track of the segment size?
>> > wal_segment_size and XLogSegSize? That's bound to lead to confusion.
>> >
>>
>> wal_segment_size is the guc which stores the number of segments
>> (XLogSegSize / XLOG_BLCKSZ).
>
> wal_segment_size and XLogSegSize are the same name, spelt different, so
> if that's where we want to go, we should name them differently. But
> perhaps more fundamentally, I don't see why we need both: What stops us
> from just defining the GUC in bytes?

I made a few changes for this:
- Make XLogSegSize int instead of uint32
- Add a GUC_UNIT_BYT for the unit conversion so that show
wal_segment_size displays user-friendly values.
- track_activity_query_size unit is set to GUC_UNIT_BYT. This was
initially null because we did not have a unit for bytes. This may not
be necessary as it changes the output of SHOW command.

-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] increasing the default WAL segment size

From
Andres Freund
Date:
Hi,

I was looking to commit this, but the changes I made ended up being
pretty large. Here's what I changed in the attached:
- split GUC_UNIT_BYTE into a separate commit, squashed rest
- renamed GUC_UNIT_BYT to GUC_UNIT_BYTE, don't see why we'd have such a
  weird abbreviation?
- bumped control file version, otherwise things wouldn't work correctly
- wal_segment_size text still said "Shows the number of pages per write
  ahead log segment."
- I still feel strongly that exporting XLogSegSize, which previously was
  a macro and now a integer variable, is a bad idea. Hence I've renamed
  it to wal_segment_size.
- There still were comments referencing XLOG_SEG_SIZE
- IsPowerOf2 regarded 0 as a valid power of two
- ConvertToXSegs() depended on a variable not passed as arg, bad idea.
- As previously mentioned, I don't think it's ok to rely on vars like
  XLogSegSize to be defined both in backend and frontend code.
- I don't think XLogReader can rely on XLogSegSize, needs to be
  parametrized.
- pg_rewind exported another copy of extern int XLogSegSize
- streamutil.h had a extern uint32 WalSegsz; but used
  RetrieveXlogSegSize, that seems needlessly different
- moved wal_segment_size (aka XLogSegSize) to xlog.h
- pg_standby included xlogreader, not sure why?
- MaxSegmentsPerLogFile still had a conflicting naming scheme
- you'd included "sys/stat.h", that's not really appropriate for system
  headers, should be <sys/stat.h> (and then grouped w/ rest)
- pg_controldata's warning about an invalid segsize missed newlines

Unresolved:
- this needs some new performance tests, the number of added instructions
  isn't trivial. Don't think there's anything, but ...
- read through it again, check long lines
- pg_standby's RetrieveWALSegSize() does too much for it's name. It
  seems quite weird that a function named that way has the section below
  "/* check if clean up is necessary */"
- the way you redid the ReadControlFile() invocation doesn't quite seem
  right. Consider what happens if XLOGbuffers isn't -1 - then we
  wouldn't read the control file, but you unconditionally copy it in
  XLOGShmemInit(). I think we instead should introduce something like
  XLOGPreShmemInit() that reads the control file unless in bootstrap
  mode. Then get rid of the second ReadControlFile() already present.
- In pg_resetwal.c:ReadControlFile() we ignore the file contents if
  there's an invalid segment size, but accept the contents as guessed if
  there's a crc failure - that seems a bit weird?
- verify EXEC_BACKEND does the right thing
- not this commit/patch, but XLogReadDetermineTimeline() could really
  use some simplifying of repetitive expresssions
- XLOGShmemInit shouldn't memcpy to temp_cfile and such, why not just
  save previous pointer in a local variable?
- could you fill in the Reviewed-By: line in the commit message?

Running out of concentration / time now.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:
Hello,

On Wed, Sep 6, 2017 at 7:37 AM, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> I was looking to commit this, but the changes I made ended up being
> pretty large. Here's what I changed in the attached:
> - split GUC_UNIT_BYTE into a separate commit, squashed rest
> - renamed GUC_UNIT_BYT to GUC_UNIT_BYTE, don't see why we'd have such a
>   weird abbreviation?
> - bumped control file version, otherwise things wouldn't work correctly
> - wal_segment_size text still said "Shows the number of pages per write
>   ahead log segment."
> - I still feel strongly that exporting XLogSegSize, which previously was
>   a macro and now a integer variable, is a bad idea. Hence I've renamed
>   it to wal_segment_size.
> - There still were comments referencing XLOG_SEG_SIZE
> - IsPowerOf2 regarded 0 as a valid power of two
> - ConvertToXSegs() depended on a variable not passed as arg, bad idea.
> - As previously mentioned, I don't think it's ok to rely on vars like
>   XLogSegSize to be defined both in backend and frontend code.
> - I don't think XLogReader can rely on XLogSegSize, needs to be
>   parametrized.
> - pg_rewind exported another copy of extern int XLogSegSize
> - streamutil.h had a extern uint32 WalSegsz; but used
>   RetrieveXlogSegSize, that seems needlessly different
> - moved wal_segment_size (aka XLogSegSize) to xlog.h
> - pg_standby included xlogreader, not sure why?
> - MaxSegmentsPerLogFile still had a conflicting naming scheme
> - you'd included "sys/stat.h", that's not really appropriate for system
>   headers, should be <sys/stat.h> (and then grouped w/ rest)
> - pg_controldata's warning about an invalid segsize missed newlines
>

Thank you.

> Unresolved:
> - this needs some new performance tests, the number of added instructions
>   isn't trivial. Don't think there's anything, but ...

I will give out the results soon.

> - read through it again, check long lines
I have broken the long lines where necessary and applied pgindent as well.

> - pg_standby's RetrieveWALSegSize() does too much for it's name. It
>   seems quite weird that a function named that way has the section below
>   "/* check if clean up is necessary */"

 we set 2 cleanup related variables once WalSegSize is set, namely
need_cleanup and exclusiveCleanupFileName. Does
SetWALSegSizeAndCleanupValues look good?

> - the way you redid the ReadControlFile() invocation doesn't quite seem
>   right. Consider what happens if XLOGbuffers isn't -1 - then we
>   wouldn't read the control file, but you unconditionally copy it in
>   XLOGShmemInit(). I think we instead should introduce something like
>   XLOGPreShmemInit() that reads the control file unless in bootstrap
>   mode. Then get rid of the second ReadControlFile() already present.

I did not think it was necessary to create a new function, I have
simply added the check and
function call within the XLOGShmemInit().

> - In pg_resetwal.c:ReadControlFile() we ignore the file contents if
>   there's an invalid segment size, but accept the contents as guessed if
>   there's a crc failure - that seems a bit weird?

I have changed the behaviour to treat it as guessed and also modified
the error message.

>- verify EXEC_BACKEND does the right thing
> - not this commit/patch, but XLogReadDetermineTimeline() could really
>   use some simplifying of repetitive expressions

I will check this.

> - XLOGShmemInit shouldn't memcpy to temp_cfile and such, why not just
>   save previous pointer in a local variable?
done.

> - could you fill in the Reviewed-By: line in the commit message?
I have added the names in alphabetical order.

Kindly check the attached v2 patch.



-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:
On Wed, Sep 6, 2017 at 8:24 PM, Beena Emerson <memissemerson@gmail.com> wrote:
> Hello,
>
> On Wed, Sep 6, 2017 at 7:37 AM, Andres Freund <andres@anarazel.de> wrote:
>> Hi,
>
>> Unresolved:
>> - this needs some new performance tests, the number of added instructions
>>   isn't trivial. Don't think there's anything, but ...
>
> I will give out the results soon.

Performance tests:

The following results are the median of 3 runs for 32 and 56
clients/threads on a pgbench database of 300 scale with each run of
900s (15 min) for various wal segment sizes and shared buffers 8GB.

Following is the % difference of the performance of patched code
(initdb wal-segsize) over the original code (configure wal-segsize)

size        |       c_32        |       c_56
------------+-------------------+--------------
 4MB        |       1.11        |       -0.18
 8MB        |       0           |       -1.56
 16MB       |       0.79        |       0.23
 64MB       |       0.89        |       0.28
 1024MB     |       -1.29       |       -0.09


Median values:
size        |       32_original     |       32_patched      |
56_original     |       56_patched
------------+-----------------------+-----------------------+-----------------------+--------------------
 4MB        |       83999.06142     |       84933.78919     |
95667.13483     |       95492.21335
 8MB        |       84949.08195     |       84947.35953     |
96584.13828     |       95081.37257
 16MB       |       84155.40321     |       84820.98328     |
95697.53134     |       95914.98814
 64MB       |       84496.2927      |       85245.70758     |
96307.95222     |       96581.1183
 1024       |       76230.39323     |       75247.03348     |
92495.18142     |       92410.59222

We can conclude that there is not much difference.

[1] Previous performance results:
https://www.postgresql.org/message-id/CAOG9ApESjqYm2VQWxNrZAKySzVo-vDw2JWhDqYQStzD%2BgwRUiA%40mail.gmail.com

>
>> - read through it again, check long lines
> I have broken the long lines where necessary and applied pgindent as well.
>
>> - pg_standby's RetrieveWALSegSize() does too much for it's name. It
>>   seems quite weird that a function named that way has the section below
>>   "/* check if clean up is necessary */"
>
>  we set 2 cleanup related variables once WalSegSize is set, namely
> need_cleanup and exclusiveCleanupFileName. Does
> SetWALSegSizeAndCleanupValues look good?
>
>> - the way you redid the ReadControlFile() invocation doesn't quite seem
>>   right. Consider what happens if XLOGbuffers isn't -1 - then we
>>   wouldn't read the control file, but you unconditionally copy it in
>>   XLOGShmemInit(). I think we instead should introduce something like
>>   XLOGPreShmemInit() that reads the control file unless in bootstrap
>>   mode. Then get rid of the second ReadControlFile() already present.
>
> I did not think it was necessary to create a new function, I have
> simply added the check and
> function call within the XLOGShmemInit().
>
>> - In pg_resetwal.c:ReadControlFile() we ignore the file contents if
>>   there's an invalid segment size, but accept the contents as guessed if
>>   there's a crc failure - that seems a bit weird?
>
> I have changed the behaviour to treat it as guessed and also modified
> the error message.
>
>>- verify EXEC_BACKEND does the right thing

Ashutosh Sharma has verified this and confirms that there are no issues.

>> - not this commit/patch, but XLogReadDetermineTimeline() could really
>>   use some simplifying of repetitive expressions
>
> I will check this.
>
>> - XLOGShmemInit shouldn't memcpy to temp_cfile and such, why not just
>>   save previous pointer in a local variable?
> done.
>
>> - could you fill in the Reviewed-By: line in the commit message?
> I have added the names in alphabetical order.
>
> Kindly check the attached v2 patch.

PFA the rebased patch.


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] increasing the default WAL segment size

From
Andres Freund
Date:
Hi,

On 2017-09-06 20:24:16 +0530, Beena Emerson wrote:
> > - pg_standby's RetrieveWALSegSize() does too much for it's name. It
> >   seems quite weird that a function named that way has the section below
> >   "/* check if clean up is necessary */"
> 
>  we set 2 cleanup related variables once WalSegSize is set, namely
> need_cleanup and exclusiveCleanupFileName. Does
> SetWALSegSizeAndCleanupValues look good?

It's better, but see below.


> > - the way you redid the ReadControlFile() invocation doesn't quite seem
> >   right. Consider what happens if XLOGbuffers isn't -1 - then we
> >   wouldn't read the control file, but you unconditionally copy it in
> >   XLOGShmemInit(). I think we instead should introduce something like
> >   XLOGPreShmemInit() that reads the control file unless in bootstrap
> >   mode. Then get rid of the second ReadControlFile() already present.
> 
> I did not think it was necessary to create a new function, I have
> simply added the check and
> function call within the XLOGShmemInit().

Which is wrong. XLogShmemSize() already needs to know the actual size,
otherwise we allocate the wrong shmem size. You may sometimes succeed
nevertheless because we leave some slop unused shared memory space, but
it's not ok to rely on.  See the refactoring I did in 0001.

Changes:
- refactored the way the control file was handled, moved it to separate
  phase.  I wrote this last and it's late, so I'm not yet fully confident
  in it, but it survives plain and EXEC_BACKEND builds.  This also gets
  rid of ferrying wal_segment_size through the EXEC_BACKEND variable
  stuff, which didn't really do much, given how many other parts weren't
  carried over.
- renamed all the non-postgres binary version of wal_segment_size to
  WalSegSz, diverging seems pointless, and the WalSegsz seems
  inconsistent.
- changed malloc in pg_waldump's search_directory() to a stack
  allocation. Less because of efficiency, more because there wasn't any
  error handling.
- removed redundant char * -> XLogPageHeader -> XLogLongPageHeader casting.
- replace new malloc with pg_malloc in initdb (failure handling!)
- replaced the floating point logic in pretty_wal_size with a, imo much
  simpler, (sz % 1024) == 0
- it's inconsistent that the new code for pg_standby was added to the
  top of the file, where all the customizable stuff resides.
- other small changes

Issues:

- I think the pg_standby stuff isn't correct. And it's hard to
  understand. Consider the case where the first file restored is *not* a
  timeline history file, but *also* not a complete file. We'll start to
  spew "not enough data in file" errors and such, which we previously
  didn't.  My preferred solution would be to remove pg_standby ([1]),
  but that's probably not quick enough.  Unless we can quickly agree on
  that, I think we need to refactor this a bit, I've done so in the
  attached, but it's untested. Could you please verify it works and if
  not fix it up?

What do you think?

Regards,

Andres

[1] http://archives.postgresql.org/message-id/20170913064824.rqflkadxwpboabgw%40alap3.anarazel.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] increasing the default WAL segment size

From
Beena Emerson
Date:
On Wed, Sep 13, 2017 at 2:58 PM, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2017-09-06 20:24:16 +0530, Beena Emerson wrote:
>> > - pg_standby's RetrieveWALSegSize() does too much for it's name. It
>> >   seems quite weird that a function named that way has the section below
>> >   "/* check if clean up is necessary */"
>>
>>  we set 2 cleanup related variables once WalSegSize is set, namely
>> need_cleanup and exclusiveCleanupFileName. Does
>> SetWALSegSizeAndCleanupValues look good?
>
> It's better, but see below.
>
>
>> > - the way you redid the ReadControlFile() invocation doesn't quite seem
>> >   right. Consider what happens if XLOGbuffers isn't -1 - then we
>> >   wouldn't read the control file, but you unconditionally copy it in
>> >   XLOGShmemInit(). I think we instead should introduce something like
>> >   XLOGPreShmemInit() that reads the control file unless in bootstrap
>> >   mode. Then get rid of the second ReadControlFile() already present.
>>
>> I did not think it was necessary to create a new function, I have
>> simply added the check and
>> function call within the XLOGShmemInit().
>
> Which is wrong. XLogShmemSize() already needs to know the actual size,
> otherwise we allocate the wrong shmem size. You may sometimes succeed
> nevertheless because we leave some slop unused shared memory space, but
> it's not ok to rely on.  See the refactoring I did in 0001.
>
> Changes:
> - refactored the way the control file was handled, moved it to separate
>   phase.  I wrote this last and it's late, so I'm not yet fully confident
>   in it, but it survives plain and EXEC_BACKEND builds.  This also gets
>   rid of ferrying wal_segment_size through the EXEC_BACKEND variable
>   stuff, which didn't really do much, given how many other parts weren't
>   carried over.
> - renamed all the non-postgres binary version of wal_segment_size to
>   WalSegSz, diverging seems pointless, and the WalSegsz seems
>   inconsistent.
> - changed malloc in pg_waldump's search_directory() to a stack
>   allocation. Less because of efficiency, more because there wasn't any
>   error handling.
> - removed redundant char * -> XLogPageHeader -> XLogLongPageHeader casting.
> - replace new malloc with pg_malloc in initdb (failure handling!)
> - replaced the floating point logic in pretty_wal_size with a, imo much
>   simpler, (sz % 1024) == 0
> - it's inconsistent that the new code for pg_standby was added to the
>   top of the file, where all the customizable stuff resides.
> - other small changes
>
> Issues:
>
> - I think the pg_standby stuff isn't correct. And it's hard to
>   understand. Consider the case where the first file restored is *not* a
>   timeline history file, but *also* not a complete file. We'll start to
>   spew "not enough data in file" errors and such, which we previously
>   didn't.  My preferred solution would be to remove pg_standby ([1]),
>   but that's probably not quick enough.  Unless we can quickly agree on
>   that, I think we need to refactor this a bit, I've done so in the
>   attached, but it's untested. Could you please verify it works and if
>   not fix it up?
>
> What do you think?

The change looks good and is working as expected.
PFA the updated patch after running pgindent.


Thank you,

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] increasing the default WAL segment size

From
Andres Freund
Date:
Hi,

On 2017-09-14 11:31:33 +0530, Beena Emerson wrote:
> The change looks good and is working as expected.
> PFA the updated patch after running pgindent.

I've pushed this version. Yay!  Thanks for the work Beena, everyone!

The only change I made is to run the pg_upgrade tests with a 1 MB
segment size, as discussed in [1].  We'll probably want to refine that,
but let's discuss that in the other thread.

Regards,

Andres

[1] http://archives.postgresql.org/message-id/20170919175457.liz3oreqiambuhca%40alap3.anarazel.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers