Re: Getting rid of wal_level=archive and default to hot_standby + wal_senders - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Getting rid of wal_level=archive and default to hot_standby + wal_senders |
Date | |
Msg-id | CA+TgmobtyR1F83MJV_cCM1ZVoL0iv6Mhe7Skf51QFFo3h83FWQ@mail.gmail.com Whole thread Raw |
In response to | Re: Getting rid of wal_level=archive and default to hot_standby + wal_senders (Andres Freund <andres@2ndquadrant.com>) |
List | pgsql-hackers |
On Wed, Feb 4, 2015 at 8:44 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> I think my vote is to maintain the status quo. What you're basically >> proposing to do is ship the system half-configured for replication, >> and I don't see the point of that. > > Not only replication, but also hot backup. > > I think we should actually should ship it fully configured for that in > the long term. This is the biggest step towards that. > > At the moment it's really hard to get there for a beginner. Usually it > goes like > 1) Try to create a base backup. Fails because of max_wal_senders. > 2) Try to adjust max_wal_senders, fails because of wal_level. Set to > archive. > 3) New base backup is created. > 4) Try to start the new base backup with hot_standby enabled, fails > because of wal_level. > 5) Enable wal_level=hot_standby, restart master > 6) Restart standby. Still fails because it's trying to start from the > checkpoint with wal_level still archive. > 7) Give up here. If not earlier. In my opinion, the solution to this problem is to make this stuff simpler to configure. You might be right that the wal_level=minimal configuration is mostly useful for the initial load, but the initial load is the step a lot of people do first. I don't think we should just dismiss that as unimportant. >> Could we, maybe, even make it a derived value rather than one that is >> explicitly configured? Like, if you set max_wal_senders>0, you automatically get >> wal_level=hot_standby? > > Our experience with derived gucs isn't that great. Remember the whole > effective_cache_size mess? Maybe we just need to find a better way to > implement that though, instead of avoiding it from here on. The only thing I remember about effective_cache_size is that Bruce had a theory that multiplying a constant by the size of shared_buffers would give you an estimate of the total memory of the system, but since few people run with shared_buffers>8GB and many people have RAM>32GB, that was a lame way to estimate it. I think the auto-tuning of wal_buffers has been pretty successful. Anyway, I'm not talking about deriving the GUC, I'm talking about deriving the WAL level which is currently controlled solely by the GUC. We do something like this for full-page writes. Even if you in general have full_page_writes=off, trying to take a hot backup forces it on. This is smart. I think we could do something similar for replication & hot backup. Suppose we remove the wal_level GUC altogether, but there's a control file property that indicates whether replication (broadly construed to include hot backup and PITR) is enabled. Actually, more specifically, we store an LSN. If it's 0, replication features are disabled; if it's the location of the previous checkpoint, we're in the process of enabling replication features; if it precedes the location of the previous checkpoint, replication features are enabled. Then, we add a command like this: ALTER SYSTEM REPLICATION ENABLE; When you do that, it sets the LSN in the control file to the location of the most recent checkpoint, and then triggers a checkpoint. When the checkpoint is complete, it returns. You can shut it off again by saying: ALTER SYSTEM REPLICATION DISABLE; ...which just zeroes out the LSN in the control file. A copy of the control-file LSN is stored in shared memory; when it's non-zero, backends behave like wal_level=hot_standby; when zero, they behave like wal_level=minimal. Figuring out how to make sure they notice the change in a timely fashion is, uh, left as an exercise for the student. Hopefully that's a solveable problem. max_wal_senders can be set to a non-zero value even when replication is disabled, but connections are refused with a suitable error message until you enable it. We ship with a default value of, say, max_wal_senders=3. Then your above outline gets simplified to this: 1. Try to create a base backup. Fails with "ERROR: replication must be enabled to create a base backup; HINT: Use ALTER SYSTEM REPLICATION ENABLE to enable this feature". 2. Run the command from the HINT. 3. Try again; it works. >> If you register a logical replication slot, you automatically get >> wal_level=logical? > > That actually shouldn't be very hard, if the level is hot_standby > beforehand. At least not on the primary, on the standby it obviously > can't work (not that we support decoding there yet). If hot_standby_feedback=on, it would be reasonable for the standby to let the master know that it now needs wal_level=logical; or there could be ALTER SYSTEM REPLICATION ENABLE LOGICAL or whatever. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: