Thread: [HACKERS] Replication/backup defaults
Cycling back to this topic again, but this time at the beginning of a CF.
wal_level=replica
--
Here's an actual patch to change:
wal_level=replica
max_wal_senders=10
max_replication_slots=20
Based on feedback from last year (https://www.postgresql.org/message-id/CABUevEwfV7zDutescm2PHGvsJdYA0RWHFMTRGhwrJPGgSbzZDQ%40mail.gmail.com):
There were requests for benchmarks of performance difference. Tomas has promised to run a couple of benchmarks on his standard benchmarking setups to give numbers on that. Thanks Tomas, please pipe in with your results when you have them!
Security considerations about pg_hba.conf -- I avoided those by not actually changing pg_hba.conf. Since pg_hba.conf can be changed on a reload instead of a restart it's a lot easier to deal with. I still think changing it to allow "postgres" the same type of connections as it does for regular users would not be a security problem, but again thanks to it only needing a reload it's not as big an issue.
There was the idea to have multiple sets of defaults to choose from at initdb time. I don't see a problem having that, but it's been another year and nobody built it. I don't think not having that is an excuse for the current defaults. And implementing something like that is in no way hindered by
changing the current defaults.
We were too close to beta1 -- this is why I'm sending it earlier this time :) (Even though I intended to do it already back in September, better now than even later)
Finally, there's the argument that we're already shaking up a number of other things with version 10, so this is a good time to do this one as well.
Attachment
On 31 December 2016 at 15:00, Magnus Hagander <magnus@hagander.net> wrote: > Cycling back to this topic again, but this time at the beginning of a CF. > > Here's an actual patch to change: > > > max_wal_senders=10 > max_replication_slots=20 +1 If that doesn't fly, it seems easy enough to introduce a "min_reserve_limit" GUC that defaults to 10 that gives a lower bound on the amount of memory we reserve for many of those shmem allocs; that can be set to 0 for people that want the old behaviour. Then we can allow changes up to the memory limit without a restart. > wal_level=replica This is more problematic because it changes behaviours. A more useful approach would be to bring all the things needed to enable replication into one ALTER SYSTEM command, so people have just one thing they need to execute and it will work out the details and locations for you. That way we can maintain the defaults yet make it easier to enable in a useful way. > Based on feedback from last year > (https://www.postgresql.org/message-id/CABUevEwfV7zDutescm2PHGvsJdYA0RWHFMTRGhwrJPGgSbzZDQ%40mail.gmail.com): > > > There were requests for benchmarks of performance difference. Tomas has > promised to run a couple of benchmarks on his standard benchmarking setups > to give numbers on that. Thanks Tomas, please pipe in with your results when > you have them! > > > Security considerations about pg_hba.conf -- I avoided those by not actually > changing pg_hba.conf. Since pg_hba.conf can be changed on a reload instead > of a restart it's a lot easier to deal with. I still think changing it to > allow "postgres" the same type of connections as it does for regular users > would not be a security problem, but again thanks to it only needing a > reload it's not as big an issue. > > > There was the idea to have multiple sets of defaults to choose from at > initdb time. I don't see a problem having that, but it's been another year > and nobody built it. I don't think not having that is an excuse for the > current defaults. And implementing something like that is in no way hindered > by > changing the current defaults. > > We were too close to beta1 -- this is why I'm sending it earlier this time > :) (Even though I intended to do it already back in September, better now > than even later) > > Finally, there's the argument that we're already shaking up a number of > other things with version 10, so this is a good time to do this one as well. > > -- > Magnus Hagander > Me: http://www.hagander.net/ > Work: http://www.redpill-linpro.com/ > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Jan 2, 2017 at 10:17 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 31 December 2016 at 15:00, Magnus Hagander <magnus@hagander.net> wrote:
> Cycling back to this topic again, but this time at the beginning of a CF.
>
> Here's an actual patch to change:
>
>
> max_wal_senders=10
> max_replication_slots=20
+1
If that doesn't fly, it seems easy enough to introduce a
"min_reserve_limit" GUC that defaults to 10 that gives a lower bound
on the amount of memory we reserve for many of those shmem allocs;
that can be set to 0 for people that want the old behaviour. Then we
can allow changes up to the memory limit without a restart.
> wal_level=replica
This is more problematic because it changes behaviours.
You can't actually change the other two without changing wal_level.
A more useful approach would be to bring all the things needed to
enable replication into one ALTER SYSTEM command, so people have just
one thing they need to execute and it will work out the details and
locations for you.
That way we can maintain the defaults yet make it easier to enable in
a useful way.
Sure, that would be great - the core being the ability to change these things without a restart. But I would argue for not letting perfection get in the way of progress, and do this anyway. I doubt there is any way the bigger change is going to get done for 10 at this point, so we should give people the ability to do backups off a default installation already.
On 2 January 2017 at 09:21, Magnus Hagander <magnus@hagander.net> wrote: > > > On Mon, Jan 2, 2017 at 10:17 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> >> On 31 December 2016 at 15:00, Magnus Hagander <magnus@hagander.net> wrote: >> > Cycling back to this topic again, but this time at the beginning of a >> > CF. >> > >> > Here's an actual patch to change: >> > >> > >> > max_wal_senders=10 >> > max_replication_slots=20 >> >> +1 >> >> If that doesn't fly, it seems easy enough to introduce a >> "min_reserve_limit" GUC that defaults to 10 that gives a lower bound >> on the amount of memory we reserve for many of those shmem allocs; >> that can be set to 0 for people that want the old behaviour. Then we >> can allow changes up to the memory limit without a restart. >> >> > wal_level=replica >> >> This is more problematic because it changes behaviours. > > > You can't actually change the other two without changing wal_level. You could, but we currently disallow it. >> A more useful approach would be to bring all the things needed to >> enable replication into one ALTER SYSTEM command, so people have just >> one thing they need to execute and it will work out the details and >> locations for you. >> That way we can maintain the defaults yet make it easier to enable in >> a useful way. > > > Sure, that would be great - the core being the ability to change these > things without a restart. But I would argue for not letting perfection get > in the way of progress, and do this anyway. I doubt there is any way the > bigger change is going to get done for 10 at this point, so we should give > people the ability to do backups off a default installation already. We could fairly easily change wal_level without restart; its been discussed for many years. The problem from my perspective is that you're immediately turning off the performance benefits for initial bulk loads. Arguing how that isn't a problem looks at least as time consuming as fixing the problem. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Jan 2, 2017 at 10:32 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
The conclusion has been that our defaults should really allow people to take backups of their systems, and they currently don't.
-- On 2 January 2017 at 09:21, Magnus Hagander <magnus@hagander.net> wrote:
>
>
> On Mon, Jan 2, 2017 at 10:17 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>
>> On 31 December 2016 at 15:00, Magnus Hagander <magnus@hagander.net> wrote:
>> > Cycling back to this topic again, but this time at the beginning of a
>> > CF.
>> >
>> > Here's an actual patch to change:
>> >
>> >
>> > max_wal_senders=10
>> > max_replication_slots=20
>>
>> +1
>>
>> If that doesn't fly, it seems easy enough to introduce a
>> "min_reserve_limit" GUC that defaults to 10 that gives a lower bound
>> on the amount of memory we reserve for many of those shmem allocs;
>> that can be set to 0 for people that want the old behaviour. Then we
>> can allow changes up to the memory limit without a restart.
>>
>> > wal_level=replica
>>
>> This is more problematic because it changes behaviours.
>
>
> You can't actually change the other two without changing wal_level.
You could, but we currently disallow it.
I always assumed we disallowed it because we would have to write actual code to make it safe.
>> A more useful approach would be to bring all the things needed to
>> enable replication into one ALTER SYSTEM command, so people have just
>> one thing they need to execute and it will work out the details and
>> locations for you.
>> That way we can maintain the defaults yet make it easier to enable in
>> a useful way.
>
>
> Sure, that would be great - the core being the ability to change these
> things without a restart. But I would argue for not letting perfection get
> in the way of progress, and do this anyway. I doubt there is any way the
> bigger change is going to get done for 10 at this point, so we should give
> people the ability to do backups off a default installation already.
We could fairly easily change wal_level without restart; its been
discussed for many years.
The problem from my perspective is that you're immediately turning off
the performance benefits for initial bulk loads.
We've had this discussion many times over. Please see for example the thread I referenced.
The conclusion has been that our defaults should really allow people to take backups of their systems, and they currently don't.
Making things run faster is tuning, and people should expect to do that if they need things to run faster. But being able to make a backup is pretty fundamental.
Arguing how that isn't a problem looks at least as time consuming as
fixing the problem.
Please do submit a patch for it. I don't know exactly what's involved in that part, I just know that people have been complaining about this at least since 9.0 was released so our track record of actually fixing it isn't very good.
I'm not arguing that it's not a problem, btw. I'm arguing that until we can solve the problem, we're much better off letting people do backups and set up things like replication than optimizing for a usecase that many never hit.
On 2 January 2017 at 09:39, Magnus Hagander <magnus@hagander.net> wrote: > Please do submit a patch for it. The way this is supposed to go is someone submits a patch and they receive feedback, then act on that feedback. If I was able to get away with deflecting all review comments with a simple "you fix it if you don't like it" there would be considerably more patches with my name on it accepted, but probably no further forward in real terms because of the loose ends it creates. In this case, simply changing the default will remove a whole class of performance optimization that we have educated people to expect. I'm sorry to point this out but removing that will cause many real changes for people's systems that we will not be thanked for, even though I understand your reasoning and wish the same goals to be achieved. I'm willing to assist in a project to allow changing wal_level online in this release. Please let's follow that path. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Jan 2, 2017 at 10:48 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 2 January 2017 at 09:39, Magnus Hagander <magnus@hagander.net> wrote:
> Please do submit a patch for it.
The way this is supposed to go is someone submits a patch and they
receive feedback, then act on that feedback. If I was able to get away
with deflecting all review comments with a simple "you fix it if you
don't like it" there would be considerably more patches with my name
on it accepted, but probably no further forward in real terms because
of the loose ends it creates.
Fair enough.
It's just that people keep saying that this is easy, and have said so for a long time, but nobody has written a patch for it.
In this case, simply changing the default will remove a whole class of
performance optimization that we have educated people to expect. I'm
sorry to point this out but removing that will cause many real changes
for people's systems that we will not be thanked for, even though I
understand your reasoning and wish the same goals to be achieved.
My claim here is that a lot *fewer* people have come to expect this performance optimization, than would (quite reasonably) expect that backups should work on a system without taking it down for restart to reconfigure it to support that.
I run into that all the time. I hear complaints about that all the time. I have not heard a single user complain about performance loss after enabling backups.
And how many people that rely on this optimization don't do any *other* optimization on their system *anyway*, that would cause them to require a restart anyway? It's not like we're taking away their ability to enable the optimization, it's just not on by default.
I'm willing to assist in a project to allow changing wal_level online
in this release. Please let's follow that path.
Sure thing, I will be happy to help test and review such a patch.
I will still argue that the *default* should be wal_level=replica though. Because once we have such a patch, it's trivial to re-enable this performance optimization (at the cost of backups and replication).
//Magnus
On 2017-01-02 11:05:05 +0100, Magnus Hagander wrote: > My claim here is that a lot *fewer* people have come to expect this > performance optimization, than would (quite reasonably) expect that backups > should work on a system without taking it down for restart to reconfigure > it to support that. +1 As evidenced by the fact that a large fraction of those optimizations are actually currently entirely broken. Without anybody noticing for years: http://archives.postgresql.org/message-id/20150702220524.GA9392%40svana.org Greetings, Andres Freund
On 2 January 2017 at 10:13, Andres Freund <andres@anarazel.de> wrote: > On 2017-01-02 11:05:05 +0100, Magnus Hagander wrote: >> My claim here is that a lot *fewer* people have come to expect this >> performance optimization, than would (quite reasonably) expect that backups >> should work on a system without taking it down for restart to reconfigure >> it to support that. > > +1 > > As evidenced by the fact that a large fraction of those optimizations > are actually currently entirely broken. Without anybody noticing for > years: > http://archives.postgresql.org/message-id/20150702220524.GA9392%40svana.org No, the optimization works, but there is a bug in it that makes it unsafe, not the same thing as entirely broken. That clearly needs to be fixed, but it does not prevent the performance benefit, so that argument is invalid. We must listen to feedback, not just try to blast through it. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2017-01-02 10:31:28 +0000, Simon Riggs wrote: > We must listen to feedback, not just try to blast through it. Not agreeing with your priorities isn't "blasting through feedback".
On 2 January 2017 at 09:48, Simon Riggs <simon@2ndquadrant.com> wrote: > I'm willing to assist in a project to allow changing wal_level online > in this release. Please let's follow that path. wal_level looks like one of the easier ones to change without a server restart There are actions to take in either direction, up or down. My initial thoughts on the pseudocode would be... reset wal_level so all new transactions see that value /* actions after setting new value */ if (old_wal_level < new_wal_level) /* going up */ get list of running transactions (perhaps only those using no-WAL-opt) else /* coming down */ {if (old_wal_level == logical) disconnect logical replication and disallow logical slotsif (new_wal_level == minimal) disconnectstreaming replication and disallow physical slots } wait for a checkpoint (fast checkpoint if no other transactions actions active) if (list) wait for list of running xacts to complete wait for a checkpoint (fast checkpoint if no other transactions actions active) XLogReportParameters() So it looks easier to go up than down, which is good since that is the important direction. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2 January 2017 at 09:39, Magnus Hagander <magnus@hagander.net> wrote: > The conclusion has been that our defaults should really allow people to take > backups of their systems, and they currently don't. > > Making things run faster is tuning, and people should expect to do that if > they need things to run faster. But being able to make a backup is pretty > fundamental. In the hope of making things better in 10.0, I remove my objection. If people want to use wal_level = minimal they can restart their server and they can find that out in the release notes. Should we set wal_level = replica or wal_level = logical as the default for 10.0? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Jan 2, 2017 at 10:55 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > In the hope of making things better in 10.0, I remove my objection. If > people want to use wal_level = minimal they can restart their server > and they can find that out in the release notes. > > Should we set wal_level = replica or wal_level = logical as the > default for 10.0? replica sounds like a better default to me as most users use at least archiving. Logical decoding is still fresh though, and its use is not that wide. Have there been any study on its performance impact compared to replica by the way? -- Michael
Hi, On 12/31/2016 04:00 PM, Magnus Hagander wrote: > Cycling back to this topic again, but this time at the beginning of a CF. > > Here's an actual patch to change: > > > wal_level=replica > max_wal_senders=10 > max_replication_slots=20 > > Based on feedback from last year > (https://www.postgresql.org/message-id/CABUevEwfV7zDutescm2PHGvsJdYA0RWHFMTRGhwrJPGgSbzZDQ%40mail.gmail.com): > > > There were requests for benchmarks of performance difference. Tomas > has promised to run a couple of benchmarks on his standard > benchmarking setups to give numbers on that. Thanks Tomas, please > pipe in with your results when you have them!> As promised, I'm running some benchmarks, and I have some early results to report. And perhaps we can discuss whether we need to test some additional workloads. I'm 100% on board with the idea that we should switch to wal_level which allows taking backups or setting-up a streaming replica, as long as it does not cause severe performance regression in common workloads. So while it'd be trivial to construct workloads demonstrating the optimizations in wal_level=minimal (e.g. initial loads doing CREATE TABLE + COPY + CREATE INDEX in a single transaction), but that would be mostly irrelevant I guess. Instead, I've decided to run regular pgbench TPC-B-like workload on a bunch of different scales, and measure throughput + some xlog stats with each of the three wal_level options. Note: I tweaked the code a bit to allow archiving with "minimal" WAL level, to allow computing WAL stats on the archived segments (instead of keeping all segments in the data directory). As usual, I'm running it on two machines - a small old one (i5-2500k box with 4 cores and 8GB of RAM) and a new one (2x e5-2620v4 with 16/32 cores, 64GB of RAM). Both machines have SSD-based storage. The clusters on both machines were reasonably tuned, see 'settings.log' for each run. The tests are fairly long, covering multiple checkpoints etc. In other words, the results should be fairly stable. The scripts/results/stats/configs are available here: * https://bitbucket.org/tvondra/wal-levels-e2620-v4/src * https://bitbucket.org/tvondra/wal-levels-i5/src So far I only have results for the smallest data sets (50 on i5 and 100 on e5), which easily fits into shared_buffers in both cases, and the numbers look like this: minimal replica standby ------------------------------------------------ i5-2500k 5884 5896 5873 e5-2620v4 23968 24393 24259 So the performance penalty of replica/standby WAL levels on this workload is pretty much non-existent - for the larger machine those levels are actually a tad faster than 'minimal', but the difference is within 2% (so might easily be noise). I'll push results for larger ones once those tests complete (possibly tomorrow). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 01/03/2017 01:34 PM, Michael Paquier wrote: > On Mon, Jan 2, 2017 at 10:55 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> In the hope of making things better in 10.0, I remove my objection. >> If people want to use wal_level = minimal they can restart their >> server and they can find that out in the release notes. >> >> Should we set wal_level = replica or wal_level = logical as the >> default for 10.0? > > replica sounds like a better default to me as most users use at > least archiving. Logical decoding is still fresh though, and its use > is not that wide. Have there been any study on its performance > impact compared to replica by the way? > I've just posted results for some benchmarks I'm running. Those are some simple pgbench tests, nothing special, and according to those results the performance impact (logical vs. replica) is negligible. I can run additional tests with other workloads, of course. While we can probably construct workloads where the difference is measurable, I'm not sure performance impact is the main concern here. As you point out, we have 'replica' since 9.0 effectively, while logical is much newer, so perhaps there are some hidden bugs? It'd be embarrassing to pick 'logical' and hurt everyone, even if they don't get any benefit from wal_level=logical. So +1 to 'replica' and allowing switching to 'logical' without restart. That should not be extremely difficult, as the main culprit seems to be max_wal_senders/max_replication_slots requiring shared memory. But with 'replica' we already have those enabled/allocated, unlike when switching from 'minimal' to 'replica'. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 12/31/16 10:00 AM, Magnus Hagander wrote: > max_wal_senders=10 > max_replication_slots=20 How about we default max_replication_slots to -1, which means to use the same value as max_wal_senders? I think this would address the needs of 99% of users. If we do like you suggest, there are going to be very many users who forget to adjust these two values together, and very many who will do it but will be puzzled and annoyed by it. Since we're now pushing the use of replication slots even more (your pg_basebackup change, upcoming logical replication), I think this could be a major source of misconfigurations. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jan 4, 2017 at 3:43 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 12/31/16 10:00 AM, Magnus Hagander wrote:
> max_wal_senders=10
> max_replication_slots=20
How about we default max_replication_slots to -1, which means to use the
same value as max_wal_senders?
I think this would address the needs of 99% of users. If we do like you
suggest, there are going to be very many users who forget to adjust
these two values together, and very many who will do it but will be
puzzled and annoyed by it. Since we're now pushing the use of
replication slots even more (your pg_basebackup change, upcoming logical
replication), I think this could be a major source of misconfigurations.
But you don't necessarily want to adjust them together, do you? They are both capped by max_connections, but I don't think they have any other direct relation between each other?
On 1/4/17 9:46 AM, Magnus Hagander wrote: > How about we default max_replication_slots to -1, which means to use the > same value as max_wal_senders? > But you don't necessarily want to adjust them together, do you? They are > both capped by max_connections, but I don't think they have any other > direct relation between each other? I think the most usual case is that you use approximately one replication slot per wal sender slot. So it would be a good default to make them equal. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 3 January 2017 at 12:34, Michael Paquier <michael.paquier@gmail.com> wrote: > On Mon, Jan 2, 2017 at 10:55 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> In the hope of making things better in 10.0, I remove my objection. If >> people want to use wal_level = minimal they can restart their server >> and they can find that out in the release notes. >> >> Should we set wal_level = replica or wal_level = logical as the >> default for 10.0? > > replica sounds like a better default to me as most users use at least > archiving. Logical decoding is still fresh though, and its use is not > that wide. Have there been any study on its performance impact > compared to replica by the way? Magnus' arguments should also be applied to wal_level = logical since users will be surprised if they cannot use the logical replication features we are adding as a main feature of 10.0. Why go through the same pain again? And if preventing their use is acceptable for the user, we should treat it as a performance feature to reduce the wal_level. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 01/03/2017 11:56 PM, Tomas Vondra wrote: > Hi, > ... > I'll push results for larger ones once those tests complete (possibly > tomorrow). > I just pushed additional results (from the additional scales) to the git repositories. On the larger (16/32-cores) machine with 2x e5-2620, the results look like this scale minimal replica logical ----------------------------------------------------- 100 23968 24393 24393 1000 23412 23656 23794 10000 5283 5320 5197 and on the smaller one (i5-2500k with 4 cores) I got this: scale minimal replica logical ----------------------------------------------------- 50 5884 5896 5873 400 5324 5342 5478 1000 5341 5439 5425 The scales were chosen so that the smallest one fits into shared buffers, the medium exceeds shared buffers but still fits into RAM, and the largest scale exceeds RAM. The results seem to confirm that for this workload (regular pgbench), there's very little difference between the different WAL levels. Actually, the 'replica' seems a tad faster than 'minimal', but the difference may be easily due to noise. I've also looked at the amount of WAL actually produced, by doing pgbench runs throttled to the same throughput, and counting the number of archived WAL segments & running pg_xlogdump. Interestingly enough, those two metrics differ quite a bit - for example for scale 1000 (on the 32-core machine), the 2h runs produced these number of WAL segments: minimal: 5515 (88.2GB) replica: 5587 (89.4GB) logical: 6058 (96.9GB) so 'replica' adds ~1.3% and 'logical' ~9.8%. But per pg_xlogdump, the WAL amounts are only 73.3GB, 73.9GB and 74.4GB - a difference of only ~1.5% between minimal and logical. The values are also much lower than raw WAL size, so I assume it's because pg_xlogdump ignores some extra overhead, present in the segments. Moreover, the sequential nature of WAL writes means even the +10% is not a big deal (unless it results in saturating the bandwidth, but running on >90% is a bad idea anyway). My conclusion from these results is that using 'wal_level=replica' by default seems fine. Perhaps even wal_level=logical would be OK, but that's probably a too big step for 10.0. Any ideas how to construct a plausible workload where the differences are significantly larger? Running the tests on non-SSD storage might also be useful. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jan 5, 2017 at 12:44 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
-- On 01/03/2017 11:56 PM, Tomas Vondra wrote:Hi,...I'll push results for larger ones once those tests complete (possibly
tomorrow).
I just pushed additional results (from the additional scales) to the git repositories. On the larger (16/32-cores) machine with 2x e5-2620, the results look like this
scale minimal replica logical
-----------------------------------------------------
100 23968 24393 24393
1000 23412 23656 23794
10000 5283 5320 5197
and on the smaller one (i5-2500k with 4 cores) I got this:
scale minimal replica logical
-----------------------------------------------------
50 5884 5896 5873
400 5324 5342 5478
1000 5341 5439 5425
The scales were chosen so that the smallest one fits into shared buffers, the medium exceeds shared buffers but still fits into RAM, and the largest scale exceeds RAM.
The results seem to confirm that for this workload (regular pgbench), there's very little difference between the different WAL levels. Actually, the 'replica' seems a tad faster than 'minimal', but the difference may be easily due to noise.
I've also looked at the amount of WAL actually produced, by doing pgbench runs throttled to the same throughput, and counting the number of archived WAL segments & running pg_xlogdump. Interestingly enough, those two metrics differ quite a bit - for example for scale 1000 (on the 32-core machine), the 2h runs produced these number of WAL segments:
minimal: 5515 (88.2GB)
replica: 5587 (89.4GB)
logical: 6058 (96.9GB)
so 'replica' adds ~1.3% and 'logical' ~9.8%. But per pg_xlogdump, the WAL amounts are only 73.3GB, 73.9GB and 74.4GB - a difference of only ~1.5% between minimal and logical. The values are also much lower than raw WAL size, so I assume it's because pg_xlogdump ignores some extra overhead, present in the segments. Moreover, the sequential nature of WAL writes means even the +10% is not a big deal (unless it results in saturating the bandwidth, but running on >90% is a bad idea anyway).
If you are using log archiving, it also means your log archive grows by 10% (well, 8% assuming it was 9.8% on top of 0, not on top of replica).
My conclusion from these results is that using 'wal_level=replica' by default seems fine. Perhaps even wal_level=logical would be OK, but that's probably a too big step for 10.0.
I think it sounds like 'replica' is the safe default.
If we can make it possible to go replica<->logical without a restart, that makes it easy enough to increase it if necessary, and the default still applies to most people (most people take backups, most people probably don't do logical replication).
Any ideas how to construct a plausible workload where the differences are significantly larger? Running the tests on non-SSD storage might also be useful.
It's easy enough to construct a benchmark specifically to show the difference, but of any actual "normal workload" for it. Typically the optimization applies to things like bulk loading, which typically never done alone and does not lend itself to that type of benchmarking very easily.
On 1/4/17 2:44 PM, Peter Eisentraut wrote: > On 1/4/17 9:46 AM, Magnus Hagander wrote: >> How about we default max_replication_slots to -1, which means to use the >> same value as max_wal_senders? > >> But you don't necessarily want to adjust them together, do you? They are >> both capped by max_connections, but I don't think they have any other >> direct relation between each other? > > I think the most usual case is that you use approximately one > replication slot per wal sender slot. So it would be a good default to > make them equal. Well, let's worry about that later. I think everyone is now in agreement with your original change proposal. My suggestion would be to make the defaults of max_wal_senders and max_replication_slots the same, so we don't open an opportunity for ongoing discussions about why they are different and how different they should be. Ideally, we would make max_replication_slots go away at some point similar to my suggestion above. I also suggest making the defaults for both 20 instead of 10. That leaves enough room that almost nobody ever has to change them, whereas 10 can be a bit tight for some not-outrageous installations (8 standbys plus backup?). Your patch looks OK, documentation changes look good. I wouldn't be surprised if we found another place or two that will need updating, but that is not a big deal. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 01/05/2017 02:23 PM, Magnus Hagander wrote: > > > On Thu, Jan 5, 2017 at 12:44 AM, Tomas Vondra > <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote: > > On 01/03/2017 11:56 PM, Tomas Vondra wrote: > > Hi, > > ... > > I'll push results for larger ones once those tests complete > (possibly > tomorrow). > > > I just pushed additional results (from the additional scales) to the > git repositories. On the larger (16/32-cores) machine with 2x > e5-2620, the results look like this > > scale minimal replica logical > ----------------------------------------------------- > 100 23968 24393 24393 > 1000 23412 23656 23794 > 10000 5283 5320 5197 > > and on the smaller one (i5-2500k with 4 cores) I got this: > > scale minimal replica logical > ----------------------------------------------------- > 50 5884 5896 5873 > 400 5324 5342 5478 > 1000 5341 5439 5425 > > The scales were chosen so that the smallest one fits into shared > buffers, the medium exceeds shared buffers but still fits into RAM, > and the largest scale exceeds RAM. > > The results seem to confirm that for this workload (regular > pgbench), there's very little difference between the different WAL > levels. Actually, the 'replica' seems a tad faster than 'minimal', > but the difference may be easily due to noise. > > I've also looked at the amount of WAL actually produced, by doing > pgbench runs throttled to the same throughput, and counting the > number of archived WAL segments & running pg_xlogdump. Interestingly > enough, those two metrics differ quite a bit - for example for scale > 1000 (on the 32-core machine), the 2h runs produced these number of > WAL segments: > > minimal: 5515 (88.2GB) > replica: 5587 (89.4GB) > logical: 6058 (96.9GB) > > so 'replica' adds ~1.3% and 'logical' ~9.8%. But per pg_xlogdump, > the WAL amounts are only 73.3GB, 73.9GB and 74.4GB - a difference of > only ~1.5% between minimal and logical. The values are also much > lower than raw WAL size, so I assume it's because pg_xlogdump > ignores some extra overhead, present in the segments. Moreover, the > sequential nature of WAL writes means even the +10% is not a big > deal (unless it results in saturating the bandwidth, but running on > >90% is a bad idea anyway). > > > If you are using log archiving, it also means your log archive grows by > 10% (well, 8% assuming it was 9.8% on top of 0, not on top of replica). > ... and that the standby has to chew through the additional 10% of WAL. We already have standbys that occasionally struggle to keep up with the master, and adding more load won't make them happy (even if just 10%). > > > My conclusion from these results is that using 'wal_level=replica' > by default seems fine. Perhaps even wal_level=logical would be OK, > but that's probably a too big step for 10.0. > > > > I think it sounds like 'replica' is the safe default. > > If we can make it possible to go replica<->logical without a restart, > that makes it easy enough to increase it if necessary, and the default > still applies to most people (most people take backups, most people > probably don't do logical replication). > My thoughts, exactly. > > > Any ideas how to construct a plausible workload where the > differences are significantly larger? Running the tests on non-SSD > storage might also be useful. > > > It's easy enough to construct a benchmark specifically to show the > difference, but of any actual "normal workload" for it. Typically the > optimization applies to things like bulk loading, which typically never > done alone and does not lend itself to that type of benchmarking very > easily. > Not sure if I understand correctly what you're saying. You're saying that although it'd be easy to construct a benchmark showing significant performance impact, it won't represent a common workload. Correct? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas, * Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote: > On 01/05/2017 02:23 PM, Magnus Hagander wrote: > >It's easy enough to construct a benchmark specifically to show the > >difference, but of any actual "normal workload" for it. Typically the > >optimization applies to things like bulk loading, which typically never > >done alone and does not lend itself to that type of benchmarking very > >easily. > > Not sure if I understand correctly what you're saying. You're saying > that although it'd be easy to construct a benchmark showing > significant performance impact, it won't represent a common > workload. Correct? I think he's saying that it's not very easy to construct a good example of typical bulk-loading workloads using just pgbench. Bulk loading certainly happens with PG and I don't think we'll make very many friends if we break optimizations when wal_level is set to minimal like those you get using: BEGIN; CREATE TABLE x (c1 int); COPY x FROM STDIN; COMMIT; or: BEGIN; TRUNCATE x; COPY x FROM STDIN; COMMIT; Changing the wal_level from 'minimal' to 'replica' or 'logical' with such a benchmark is going to make the WAL go from next-to-nothing to size-of-database. One doesn't typically *just* do bulk loads, however, often it's a bulk load into a table and then the contents of that table are merged with another table or perhaps joined to it to produce some report or something along those lines. In many of those cases, our more-recently added capability to have UNLOGGED tables will work, but not all (in particular, it can be very handy to load everything in using the above technique and then switch the wal_level to replica, which avoids having to have the bulk of the data sent through WAL, something you can't avoid if you want to turn an unlogged table into a logged one). Thanks! Stephen
On 2017-01-05 08:38:32 -0500, Peter Eisentraut wrote: > I also suggest making the defaults for both 20 instead of 10. That > leaves enough room that almost nobody ever has to change them, whereas > 10 can be a bit tight for some not-outrageous installations (8 standbys > plus backup?). I'm afraid we need to add initdb integration / testing for those. I mean we have initdb test down to 10 connections to deal with limited resources... Andres
On Thu, Jan 5, 2017 at 6:01 PM, Andres Freund <andres@anarazel.de> wrote:
On 2017-01-05 08:38:32 -0500, Peter Eisentraut wrote:
> I also suggest making the defaults for both 20 instead of 10. That
> leaves enough room that almost nobody ever has to change them, whereas
> 10 can be a bit tight for some not-outrageous installations (8 standbys
> plus backup?).
I'm afraid we need to add initdb integration / testing for those. I mean
we have initdb test down to 10 connections to deal with limited
resources...
If we make both 10 by default we should be OK though, no?
On 2017-01-05 18:08:36 +0100, Magnus Hagander wrote: > On Thu, Jan 5, 2017 at 6:01 PM, Andres Freund <andres@anarazel.de> wrote: > > > On 2017-01-05 08:38:32 -0500, Peter Eisentraut wrote: > > > I also suggest making the defaults for both 20 instead of 10. That > > > leaves enough room that almost nobody ever has to change them, whereas > > > 10 can be a bit tight for some not-outrageous installations (8 standbys > > > plus backup?). > > > > I'm afraid we need to add initdb integration / testing for those. I mean > > we have initdb test down to 10 connections to deal with limited > > resources... > > > > If we make both 10 by default we should be OK though, no? I'm a bit doubtful about that. On the other hand, we've increased max_parallel_workers without anybody complaining. Andres
On 2017-01-05 09:12:49 -0800, Andres Freund wrote: > On 2017-01-05 18:08:36 +0100, Magnus Hagander wrote: > > On Thu, Jan 5, 2017 at 6:01 PM, Andres Freund <andres@anarazel.de> wrote: > > > > > On 2017-01-05 08:38:32 -0500, Peter Eisentraut wrote: > > > > I also suggest making the defaults for both 20 instead of 10. That > > > > leaves enough room that almost nobody ever has to change them, whereas > > > > 10 can be a bit tight for some not-outrageous installations (8 standbys > > > > plus backup?). > > > > > > I'm afraid we need to add initdb integration / testing for those. I mean > > > we have initdb test down to 10 connections to deal with limited > > > resources... > > > > > > > If we make both 10 by default we should be OK though, no? > > I'm a bit doubtful about that. On the other hand, we've increased > max_parallel_workers without anybody complaining. Err, I mean max_worker_processes. Andres
On 01/05/2017 05:37 PM, Stephen Frost wrote: > Tomas, > > * Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote: >> On 01/05/2017 02:23 PM, Magnus Hagander wrote: >>> It's easy enough to construct a benchmark specifically to show the >>> difference, but of any actual "normal workload" for it. Typically the >>> optimization applies to things like bulk loading, which typically never >>> done alone and does not lend itself to that type of benchmarking very >>> easily. >> >> Not sure if I understand correctly what you're saying. You're saying >> that although it'd be easy to construct a benchmark showing >> significant performance impact, it won't represent a common >> workload. Correct? > > I think he's saying that it's not very easy to construct a good example > of typical bulk-loading workloads using just pgbench. Bulk loading > certainly happens with PG and I don't think we'll make very many friends > if we break optimizations when wal_level is set to minimal like those > you get using: > > BEGIN; > CREATE TABLE x (c1 int); > COPY x FROM STDIN; > COMMIT; > > or: > > BEGIN; > TRUNCATE x; > COPY x FROM STDIN; > COMMIT; > > Changing the wal_level from 'minimal' to 'replica' or 'logical' with > such a benchmark is going to make the WAL go from next-to-nothing to > size-of-database. Sure, I do know how to construct such workloads - and it's trivial even with pgbench custom scripts. The question is whether such workloads are common or not. Most importantly, no one is proposing to break the optimizations, but changing the defaults - users relying on the optimizations are free to switch back to wal_level=minimal if needed. > > One doesn't typically *just* do bulk loads, however, > often it's a bulk load into a table and then the contents of that table > are merged with another table or perhaps joined to it to produce some > report or something along those lines. In many of those cases, our > more-recently added capability to have UNLOGGED tables will work, but > not all (in particular, it can be very handy to load everything in using > the above technique and then switch the wal_level to replica, which > avoids having to have the bulk of the data sent through WAL, something > you can't avoid if you want to turn an unlogged table into a logged > one). > Ultimately, the question is whether the number of people running into "Hey, I can't take pg_basebackup or setup a standby with the default config!" is higher or lower than number of people running into "Hey, CREATE TABLE + COPY is slower now!" I haven't seen many systems relying on such load optimizations, for a number of reasons: 1) The important/critical systems usually have replicas, so are inherently incompatible with wal_level=minimal. 2) The batch jobs usually don't truncate the main table, but load the increment into a temporary/unlogged table first, then merge it into the main one. That is not to say there are no other cases benefiting from those optimizations, but we're talking about the default value - we're not removing the wal_level=minimal. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Jan 02, 2017 at 10:21:41AM +0100, Magnus Hagander wrote: > On Mon, Jan 2, 2017 at 10:17 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On 31 December 2016 at 15:00, Magnus Hagander <magnus@hagander.net> wrote: > > > max_wal_senders=10 > > > max_replication_slots=20 [...] > > > wal_level=replica > > > > This is more problematic because it changes behaviours. > > You can't actually change the other two without changing wal_level. That actually goes both ways: I recently saw a server not start cause we were experimenting with temporarily setting wal_level to minimal for initial bulk loading, but did not reduce max_wal_senders back to zero. So it failed at startup with 'FATAL: WAL streaming (max_wal_senders > 0) requires wal_level "replica" or "logical"'. I don't want to hijack this thread, but I wonder whether the name "*max*_wal_senders" really conveys that dependence on wal_level (there's no comment to that end in the postgresql.conf sample) and/or whether maybe the admin should just be notified that WAL streaming is turned off cause wal_level < 'replica'? Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
On 1/5/17 4:56 PM, Michael Banck wrote: >> You can't actually change the other two without changing wal_level. > That actually goes both ways: I recently saw a server not start cause we > were experimenting with temporarily setting wal_level to minimal for > initial bulk loading, but did not reduce max_wal_senders back to zero. > So it failed at startup with 'FATAL: WAL streaming (max_wal_senders > > 0) requires wal_level "replica" or "logical"'. I think that was the point: You can't change the default of max_wal_senders without also changing the default of wal_level. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 1/5/17 12:01 PM, Andres Freund wrote: > On 2017-01-05 08:38:32 -0500, Peter Eisentraut wrote: >> I also suggest making the defaults for both 20 instead of 10. That >> leaves enough room that almost nobody ever has to change them, whereas >> 10 can be a bit tight for some not-outrageous installations (8 standbys >> plus backup?). > > I'm afraid we need to add initdb integration / testing for those. I mean > we have initdb test down to 10 connections to deal with limited > resources... Those initdb defaults were last touched in 2005, before the use of System V shared memory was reduced to a minimum. It might be worth revisiting that. The only way to end up with a low number of connection slots would seem to be a very low semaphore configuration. In the build farm, I have found 6 critters that do not end up with the 100/128MB setting: sidewinder, curculio, coypu, brolga, lorikeet, opossum. I wonder what limitations initdb is bumping against. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Jan 7, 2017 at 1:27 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 1/5/17 12:01 PM, Andres Freund wrote:
> On 2017-01-05 08:38:32 -0500, Peter Eisentraut wrote:
>> I also suggest making the defaults for both 20 instead of 10. That
>> leaves enough room that almost nobody ever has to change them, whereas
>> 10 can be a bit tight for some not-outrageous installations (8 standbys
>> plus backup?).
>
> I'm afraid we need to add initdb integration / testing for those. I mean
> we have initdb test down to 10 connections to deal with limited
> resources...
Those initdb defaults were last touched in 2005, before the use of
System V shared memory was reduced to a minimum. It might be worth
revisiting that. The only way to end up with a low number of connection
slots would seem to be a very low semaphore configuration.
In the build farm, I have found 6 critters that do not end up with the
100/128MB setting: sidewinder, curculio, coypu, brolga, lorikeet,
opossum. I wonder what limitations initdb is bumping against.
Since you lookeda t the data -- they did not end up with 100, but what's the lowest they did end up with? Did they go all the way down to 10?
On 1/7/17 6:23 AM, Magnus Hagander wrote: > In the build farm, I have found 6 critters that do not end up with the > 100/128MB setting: sidewinder, curculio, coypu, brolga, lorikeet, > opossum. I wonder what limitations initdb is bumping against. > > > Since you lookeda t the data -- they did not end up with 100, but what's > the lowest they did end up with? Did they go all the way down to 10? They all ended up on 30 or 40. The only documented way this could happen is if the semaphore configuration does not allow enough room, but this would have to be a very particular setting on all these quite different boxes. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 1/5/17 2:50 PM, Tomas Vondra wrote: > Ultimately, the question is whether the number of people running into > "Hey, I can't take pg_basebackup or setup a standby with the default > config!" is higher or lower than number of people running into "Hey, > CREATE TABLE + COPY is slower now!" I'm betting it's way higher. Loads of folks use Postgres and never do any kind of ETL. > That is not to say there are no other cases benefiting from those > optimizations, but we're talking about the default value - we're not > removing the wal_level=minimal. This would be a non-issue if we provided example configs for a few different workloads. Obviously those would never be optimal either, but they *would* show users what settings they should immediately look at changing in their environment. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
On Sun, Jan 8, 2017 at 2:19 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 1/5/17 2:50 PM, Tomas Vondra wrote:Ultimately, the question is whether the number of people running into
"Hey, I can't take pg_basebackup or setup a standby with the default
config!" is higher or lower than number of people running into "Hey,
CREATE TABLE + COPY is slower now!"
I'm betting it's way higher. Loads of folks use Postgres and never do any kind of ETL.
I'm willing to say "the majority".
That is not to say there are no other cases benefiting from those
optimizations, but we're talking about the default value - we're not
removing the wal_level=minimal.
This would be a non-issue if we provided example configs for a few different workloads. Obviously those would never be optimal either, but they *would* show users what settings they should immediately look at changing in their environment.
It might also be worthwhile to provide a section in the docs just saying "these are the parameters you probably want to look at for workload <x>" rather than an actual example configuration. Including a short sentence or two about why.
On Sat, Jan 7, 2017 at 7:57 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 1/7/17 6:23 AM, Magnus Hagander wrote:
> In the build farm, I have found 6 critters that do not end up with the
> 100/128MB setting: sidewinder, curculio, coypu, brolga, lorikeet,
> opossum. I wonder what limitations initdb is bumping against.
>
>
> Since you lookeda t the data -- they did not end up with 100, but what's
> the lowest they did end up with? Did they go all the way down to 10?
They all ended up on 30 or 40.
The only documented way this could happen is if the semaphore
configuration does not allow enough room, but this would have to be a
very particular setting on all these quite different boxes.
So based on that, I suggest we go ahead and make the change to make both the values 10 by default. And that we do that now, because that lets us get it out through more testing on different platforms, so that we catch issues earlier on if they do arise.
It would be interesting to find out why it's limited as well, of course, but I don't think we need to wait for that.
On 1/9/17 7:44 AM, Magnus Hagander wrote: > So based on that, I suggest we go ahead and make the change to make both > the values 10 by default. And that we do that now, because that lets us > get it out through more testing on different platforms, so that we catch > issues earlier on if they do arise. Sounds good. > It would be interesting to find out why it's limited as well, of course, > but I don't think we need to wait for that. After some testing and searching for documentation, it seems that at least the BSD platforms have a very low default semmns setting (apparently 60, which leads to max_connections=30). -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Jan 9, 2017 at 11:02 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 1/9/17 7:44 AM, Magnus Hagander wrote: >> So based on that, I suggest we go ahead and make the change to make both >> the values 10 by default. And that we do that now, because that lets us >> get it out through more testing on different platforms, so that we catch >> issues earlier on if they do arise. > > Sounds good. I may be outvoted, but I'm still not in favor of changing the default wal_level. That caters only to people who lack sufficient foresight to know that they need a replica before the system becomes so critical that they can't bounce it to update the configuration. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Jan 10, 2017 at 8:03 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Jan 9, 2017 at 11:02 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 1/9/17 7:44 AM, Magnus Hagander wrote:
>> So based on that, I suggest we go ahead and make the change to make both
>> the values 10 by default. And that we do that now, because that lets us
>> get it out through more testing on different platforms, so that we catch
>> issues earlier on if they do arise.
>
> Sounds good.
I may be outvoted, but I'm still not in favor of changing the default
wal_level. That caters only to people who lack sufficient foresight
to know that they need a replica before the system becomes so critical
that they can't bounce it to update the configuration.
True. But the current level only caters to those people who run large ETL jobs without doing any tuning on their system (at least none that would require a restart), or another one of the fairly specific workloads.
And as I keep re-iterating, it's not just about replicas, it's also about the ability to make proper backups. Which is a pretty fundamental feature.
I do think you are outvoted, yes :) At least that's the result of my tallying up the people who have spoken out on the thread.
Greetings, * Magnus Hagander (magnus@hagander.net) wrote: > On Tue, Jan 10, 2017 at 8:03 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > > On Mon, Jan 9, 2017 at 11:02 AM, Peter Eisentraut > > <peter.eisentraut@2ndquadrant.com> wrote: > > > On 1/9/17 7:44 AM, Magnus Hagander wrote: > > >> So based on that, I suggest we go ahead and make the change to make both > > >> the values 10 by default. And that we do that now, because that lets us > > >> get it out through more testing on different platforms, so that we catch > > >> issues earlier on if they do arise. > > > > > > Sounds good. > > > > I may be outvoted, but I'm still not in favor of changing the default > > wal_level. That caters only to people who lack sufficient foresight > > to know that they need a replica before the system becomes so critical > > that they can't bounce it to update the configuration. > > True. But the current level only caters to those people who run large ETL > jobs without doing any tuning on their system (at least none that would > require a restart), or another one of the fairly specific workloads. > > And as I keep re-iterating, it's not just about replicas, it's also about > the ability to make proper backups. Which is a pretty fundamental feature. > > I do think you are outvoted, yes :) At least that's the result of my > tallying up the people who have spoken out on the thread. I tend to agree with Magnus on this, being able to perform an online backup is pretty darn important. Thanks! Stephen
On 1/10/17 3:06 PM, Stephen Frost wrote: > * Magnus Hagander (magnus@hagander.net) wrote: >> On Tue, Jan 10, 2017 at 8:03 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> I may be outvoted, but I'm still not in favor of changing the default >>> wal_level. That caters only to people who lack sufficient foresight >>> to know that they need a replica before the system becomes so critical >>> that they can't bounce it to update the configuration. >> >> True. But the current level only caters to those people who run large ETL >> jobs without doing any tuning on their system (at least none that would >> require a restart), or another one of the fairly specific workloads. >> >> And as I keep re-iterating, it's not just about replicas, it's also about >> the ability to make proper backups. Which is a pretty fundamental feature. >> >> I do think you are outvoted, yes :) At least that's the result of my >> tallying up the people who have spoken out on the thread. > > I tend to agree with Magnus on this, being able to perform an online > backup is pretty darn important. Agreed and +1. -- -David david@pgmasters.net
On Wed, Jan 11, 2017 at 10:06 AM, David Steele <david@pgmasters.net> wrote: > On 1/10/17 3:06 PM, Stephen Frost wrote: >> * Magnus Hagander (magnus@hagander.net) wrote: >>> On Tue, Jan 10, 2017 at 8:03 PM, Robert Haas <robertmhaas@gmail.com> wrote: > >>>> I may be outvoted, but I'm still not in favor of changing the default >>>> wal_level. That caters only to people who lack sufficient foresight >>>> to know that they need a replica before the system becomes so critical >>>> that they can't bounce it to update the configuration. >>> >>> True. But the current level only caters to those people who run large ETL >>> jobs without doing any tuning on their system (at least none that would >>> require a restart), or another one of the fairly specific workloads. >>> >>> And as I keep re-iterating, it's not just about replicas, it's also about >>> the ability to make proper backups. Which is a pretty fundamental feature. >>> >>> I do think you are outvoted, yes :) At least that's the result of my >>> tallying up the people who have spoken out on the thread. >> >> I tend to agree with Magnus on this, being able to perform an online >> backup is pretty darn important. > > Agreed and +1. +1'ing. -- Michael
On Wed, Jan 11, 2017 at 2:09 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Wed, Jan 11, 2017 at 10:06 AM, David Steele <david@pgmasters.net> wrote:
> On 1/10/17 3:06 PM, Stephen Frost wrote:
>> * Magnus Hagander (magnus@hagander.net) wrote:
>>> On Tue, Jan 10, 2017 at 8:03 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
>>>> I may be outvoted, but I'm still not in favor of changing the default
>>>> wal_level. That caters only to people who lack sufficient foresight
>>>> to know that they need a replica before the system becomes so critical
>>>> that they can't bounce it to update the configuration.
>>>
>>> True. But the current level only caters to those people who run large ETL
>>> jobs without doing any tuning on their system (at least none that would
>>> require a restart), or another one of the fairly specific workloads.
>>>
>>> And as I keep re-iterating, it's not just about replicas, it's also about
>>> the ability to make proper backups. Which is a pretty fundamental feature.
>>>
>>> I do think you are outvoted, yes :) At least that's the result of my
>>> tallying up the people who have spoken out on the thread.
>>
>> I tend to agree with Magnus on this, being able to perform an online
>> backup is pretty darn important.
>
> Agreed and +1.
+1'ing.
I've pushed this.