Thread: Re: History of WAL_LEVEL (archive vs hot_standby)
shamccoy wrote > Hello. I've been doing some benchmarks on performance / size differences > between actions when wal_level is set to either archive or hot_standby. > I'm not seeing a ton of difference. I've read some posts about > discussions as to whether this parameter should be simplified and remove > or merge these 2 values. > > I'd like to understand the historic reason between have the extra > "hot_standby" value. Was it to introduce replication and not disturb the > already working "archive" value? If I'm new to Postgres, is there any > strategic reason to use "archive" at this point if replication is > something I'll be using in the future? I'm not seeing any downside to > "hot_standby" unless I'm missing something fundamental. > > Thanks, > Shawn There is a semantic difference in that "archive" is limited to "wal shipping" to a dead-drop area for future PITR. "hot_standby" implies that the wal is being sent to another running system that is immediately reading in the information to maintain an exact live copy of the master. As I think both can be used for PITR I don't believe there is much downside, technically or with resources, to using hot_standby instead of archive; but I do not imagine it having any practical benefit either. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/History-of-WAL-LEVEL-archive-vs-hot-standby-tp5797717p5797720.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
On 03/27/2014 03:06 PM, David Johnston wrote: > As I think both can be used for PITR I don't believe there is much downside, > technically or with resources, to using hot_standby instead of archive; but > I do not imagine it having any practical benefit either. Actually, "hot_standby" does have to write some extra records to the WAL which "archive" does not. I don't know that anyone has checked the actual volume difference between the two, though, which would probably also vary by workload. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Thu, Mar 27, 2014 at 03:06:02PM -0700, David Johnston wrote: > shamccoy wrote > > Hello. I've been doing some benchmarks on performance / size differences > > between actions when wal_level is set to either archive or hot_standby. > > I'm not seeing a ton of difference. I've read some posts about > > discussions as to whether this parameter should be simplified and remove > > or merge these 2 values. > > > > I'd like to understand the historic reason between have the extra > > "hot_standby" value. Was it to introduce replication and not disturb the > > already working "archive" value? I think so. > > If I'm new to Postgres, is there any > > strategic reason to use "archive" at this point if replication is > > something I'll be using in the future? I'm not seeing any downside to > > "hot_standby" unless I'm missing something fundamental. Probably not. You might manage to construct a benchmark in which the extra master-side work is measurable, but it wouldn't be easy. > There is a semantic difference in that "archive" is limited to "wal > shipping" to a dead-drop area for future PITR. "hot_standby" implies that > the wal is being sent to another running system that is immediately reading > in the information to maintain an exact live copy of the master. > > As I think both can be used for PITR I don't believe there is much downside, > technically or with resources, to using hot_standby instead of archive; but > I do not imagine it having any practical benefit either. wal_level=archive vs. hot_standby is orthogonal to the mechanism for transmitting WAL and time of applying WAL. Rather, it dictates whether a PostgreSQL cluster replaying that WAL can accept read-only connections during recovery. You can send wal_level=archive log data over streaming replication, even synchronous streaming replication. However, any standby will be unable to accept connections until failover ends recovery. On the flip side, if you use wal_level=hot_standby, a backup undergoing PITR can accept read-only connections while applying years-old WAL from an archive. That is useful for verifying, before you end recovery, that you have replayed far enough. -- Noah Misch EnterpriseDB http://www.enterprisedb.com
Noah Misch-2 wrote > On Thu, Mar 27, 2014 at 03:06:02PM -0700, David Johnston wrote: >> shamccoy wrote >> > Hello. I've been doing some benchmarks on performance / size >> differences >> > between actions when wal_level is set to either archive or hot_standby. >> > I'm not seeing a ton of difference. I've read some posts about >> > discussions as to whether this parameter should be simplified and >> remove >> > or merge these 2 values. >> > >> > I'd like to understand the historic reason between have the extra >> > "hot_standby" value. Was it to introduce replication and not disturb >> the >> > already working "archive" value? > > I think so. > >> > If I'm new to Postgres, is there any >> > strategic reason to use "archive" at this point if replication is >> > something I'll be using in the future? I'm not seeing any downside to >> > "hot_standby" unless I'm missing something fundamental. > > Probably not. You might manage to construct a benchmark in which the > extra > master-side work is measurable, but it wouldn't be easy. > >> There is a semantic difference in that "archive" is limited to "wal >> shipping" to a dead-drop area for future PITR. "hot_standby" implies >> that >> the wal is being sent to another running system that is immediately >> reading >> in the information to maintain an exact live copy of the master. >> >> As I think both can be used for PITR I don't believe there is much >> downside, >> technically or with resources, to using hot_standby instead of archive; >> but >> I do not imagine it having any practical benefit either. > > wal_level=archive vs. hot_standby is orthogonal to the mechanism for > transmitting WAL and time of applying WAL. Rather, it dictates whether a > PostgreSQL cluster replaying that WAL can accept read-only connections > during > recovery. You can send wal_level=archive log data over streaming > replication, > even synchronous streaming replication. However, any standby will be > unable > to accept connections until failover ends recovery. On the flip side, if > you > use wal_level=hot_standby, a backup undergoing PITR can accept read-only > connections while applying years-old WAL from an archive. That is useful > for > verifying, before you end recovery, that you have replayed far enough. Went looking for this in the docs... http://www.postgresql.org/docs/9.3/interactive/runtime-config-wal.html#GUC-WAL-LEVEL So I guess, no-restore/offline/online would be good names (and maybe wal_restore_mode instead of wal_level) if we started from scratch. Note that no-restore does not preclude same-system recovery. Just something to help me remember... David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/History-of-WAL-LEVEL-archive-vs-hot-standby-tp5797717p5797733.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
David Johnston wrote > > Noah Misch-2 wrote >> On Thu, Mar 27, 2014 at 03:06:02PM -0700, David Johnston wrote: >>> shamccoy wrote >>> > Hello. I've been doing some benchmarks on performance / size >>> differences >>> > between actions when wal_level is set to either archive or >>> hot_standby. >>> > I'm not seeing a ton of difference. I've read some posts about >>> > discussions as to whether this parameter should be simplified and >>> remove >>> > or merge these 2 values. >>> > >>> > I'd like to understand the historic reason between have the extra >>> > "hot_standby" value. Was it to introduce replication and not disturb >>> the >>> > already working "archive" value? >> >> I think so. >> >>> > If I'm new to Postgres, is there any >>> > strategic reason to use "archive" at this point if replication is >>> > something I'll be using in the future? I'm not seeing any downside to >>> > "hot_standby" unless I'm missing something fundamental. >> >> Probably not. You might manage to construct a benchmark in which the >> extra >> master-side work is measurable, but it wouldn't be easy. >> >>> There is a semantic difference in that "archive" is limited to "wal >>> shipping" to a dead-drop area for future PITR. "hot_standby" implies >>> that >>> the wal is being sent to another running system that is immediately >>> reading >>> in the information to maintain an exact live copy of the master. >>> >>> As I think both can be used for PITR I don't believe there is much >>> downside, >>> technically or with resources, to using hot_standby instead of archive; >>> but >>> I do not imagine it having any practical benefit either. >> >> wal_level=archive vs. hot_standby is orthogonal to the mechanism for >> transmitting WAL and time of applying WAL. Rather, it dictates whether a >> PostgreSQL cluster replaying that WAL can accept read-only connections >> during >> recovery. You can send wal_level=archive log data over streaming >> replication, >> even synchronous streaming replication. However, any standby will be >> unable >> to accept connections until failover ends recovery. On the flip side, if >> you >> use wal_level=hot_standby, a backup undergoing PITR can accept read-only >> connections while applying years-old WAL from an archive. That is useful >> for >> verifying, before you end recovery, that you have replayed far enough. > Went looking for this in the docs... > > http://www.postgresql.org/docs/9.3/interactive/runtime-config-wal.html#GUC-WAL-LEVEL > > So I guess, no-restore/offline/online would be good names (and maybe > wal_restore_mode instead of wal_level) if we started from scratch. Note > that no-restore does not preclude same-system recovery. > > Just something to help me remember... > > David J. Slightly tangential but are the locking operations associated with the recent bugfix generated in both (all?) modes or only hot_standby? I thought it strange that transient locking operations were output with WAL though I get it if they are there to support read-only queries. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/History-of-WAL-LEVEL-archive-vs-hot-standby-tp5797717p5797735.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Hi, > > Went looking for this in the docs... > > > > http://www.postgresql.org/docs/9.3/interactive/runtime-config-wal.html#GUC-WAL-LEVEL > > > > So I guess, no-restore/offline/online would be good names (and maybe > > wal_restore_mode instead of wal_level) if we started from scratch. Note > > that no-restore does not preclude same-system recovery. > > > > Just something to help me remember... > > > > David J. > > Slightly tangential but are the locking operations associated with the > recent bugfix generated in both (all?) modes or only hot_standby? I thought > it strange that transient locking operations were output with WAL though I > get it if they are there to support read-only queries. Putting aside the naming:), I have caught by the discussion about the differences of wal records to be emitted among the wal levels. I grep'ed 'wal_level' for whole backend but all it showed was for checking of some options in postgresql.conf against other options in postgresql.conf and that in control file. None of them seems to care it for the purpose of controlling how/what wal records to emit or record construction, except for WAL_LEVEL_LOGICAL. As far as I could see, I doubt that there is any difference in emitted wal records amoung wal levels, (except for logical changeset). I came to want to try to run streaming replication with wal_level = minimal but no time for now:( regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On Fri, Mar 28, 2014 at 12:16 PM, David Johnston <polobo@yahoo.com> wrote: > > Slightly tangential but are the locking operations associated with the > recent bugfix generated in both (all?) modes or only hot_standby? I thought > it strange that transient locking operations were output with WAL though I > get it if they are there to support read-only queries. > IIUC, XLogStandbyInfoActive() is used at places where it is thought that the WAL record being written at that point would be required on a standby for correct hot standby operation (comments at these call sites are helpful). /* Do we need to WAL-log information required only for Hot Standby and logical replication? */ #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_HOT_STANDBY) -- Amit
On Thu, Mar 27, 2014 at 08:16:13PM -0700, David Johnston wrote: > Slightly tangential but are the locking operations associated with the > recent bugfix generated in both (all?) modes or only hot_standby? All modes. > I thought > it strange that transient locking operations were output with WAL though I > get it if they are there to support read-only queries. It is unintuitive. This comment in heap_lock_tuple() attempts to explain: /* * XLOG stuff. You might think that we don't need an XLOG record because * there is no state change worth restoring aftera crash. You would be * wrong however: we have just written either a TransactionId or a * MultiXactId that may neverhave been seen on disk before, and we need * to make sure that there are XLOG entries covering those ID numbers. * Elsethe same IDs might be re-used after a crash, which would be * disastrous if this page made it to disk before the crash. Essentially * we have to enforce the WAL log-before-data rule even in this case. * (Also, in a PITR log-shipping or2PC environment, we have to have XLOG * entries for everything anyway.) */ Another reason not mentioned is torn pages. Locking a tuple updates t_xmax, t_infomask2 and t_infomask. It's possible for t_xmax to fall on one side of a page tear and the infomasks to fall on the other side. Writing t_xmax without writing the corresponding infomasks could cause the tuple to be considered deleted, not merely locked, after a crash.