Thread: Parameter name standby_mode
We want to teach people that Hot Standby and Streaming Replication are two different features. However, Streaming Replication calls its main parameter "standby_mode" which reminds more of Hot Standby than of Streaming Replication. People could also run a warm standby without streaming replication, which would result in a standby that has standby_mode = 'off'. I found the parameter name confusing and I'd vote for changing its name. Joachim
Joachim Wieland wrote: > We want to teach people that Hot Standby and Streaming Replication are > two different features. I'm not sure about that, actually. Now that they're both in the tree, they work nicely together and many users will think of them as one. > However, Streaming Replication calls its main > parameter "standby_mode" which reminds more of Hot Standby than of > Streaming Replication. > > People could also run a warm standby without streaming replication, > which would result in a standby that has standby_mode = 'off'. If they want to implement the warm standby using the (new) built-in logic to keep retrying restore_command, they would set standby_mode='on'. standby_mode='on' doesn't imply streaming replication. If you want to use pg_standby or similar tools, then you would indeed set standby_mode='off', but I think that makes sense because you're implementing the standby functionality outside the server in that case. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, Feb 10, 2010 at 12:16 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > If they want to implement the warm standby using the (new) built-in > logic to keep retrying restore_command, they would set > standby_mode='on'. standby_mode='on' doesn't imply streaming replication. > > If you want to use pg_standby or similar tools, then you would indeed > set standby_mode='off', but I think that makes sense because you're > implementing the standby functionality outside the server in that case. Okay, got it now with your explanations. For some reason it didn't work before with standby_mode = 'on' (it does now) and the warning "FATAL: sorry, too many standbys already" gave me a first suspicion that SR is the only use case for this. Then I checked the docs and there it said "If this parameter is on, the streaming replication is enabled". I understand now what it does and that it is a prerequisite but that there is also a non-SR use case... So the name is okay for me :-) Thanks again, Joachim
On Wed, 2010-02-10 at 13:16 +0200, Heikki Linnakangas wrote: > If they want to implement the warm standby using the (new) built-in > logic to keep retrying restore_command, they would set > standby_mode='on'. standby_mode='on' doesn't imply streaming replication. The docs say "If this parameter is on, the streaming replication is enabled". So who is wrong? ISTM that Joachim's viewpoint is right and that most people will be confused about this. I think we need something named more intuitively. Something that better describes what action (i.e. a verb) will occur when this is set. Suggestions: streaming_replication = on We may need to split out various complexities into multiple parameters, or have valued parameters, e.g. standby_mode = REPLICA. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > The docs say "If this parameter is on, the streaming replication is > enabled". So who is wrong? The docs. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, Feb 10, 2010 at 8:16 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > If they want to implement the warm standby using the (new) built-in > logic to keep retrying restore_command, they would set > standby_mode='on'. standby_mode='on' doesn't imply streaming replication. But if we fail in restoring the archived WAL file, "standby_mode = on" *always* tries to start streaming replication. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Fujii Masao wrote: > On Wed, Feb 10, 2010 at 8:16 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> If they want to implement the warm standby using the (new) built-in >> logic to keep retrying restore_command, they would set >> standby_mode='on'. standby_mode='on' doesn't imply streaming replication. > > But if we fail in restoring the archived WAL file, "standby_mode = on" > *always* tries to start streaming replication. Hmm, somehow I thought it doesn't if you don't set primary_conninfo. I think that's the way it should work, ie. if primary_conninfo is not set, don't launch walreceiver but just keep trying to restore from the archive. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, Feb 12, 2010 at 3:19 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Fujii Masao wrote: >> On Wed, Feb 10, 2010 at 8:16 PM, Heikki Linnakangas >> <heikki.linnakangas@enterprisedb.com> wrote: >>> If they want to implement the warm standby using the (new) built-in >>> logic to keep retrying restore_command, they would set >>> standby_mode='on'. standby_mode='on' doesn't imply streaming replication. >> >> But if we fail in restoring the archived WAL file, "standby_mode = on" >> *always* tries to start streaming replication. > > Hmm, somehow I thought it doesn't if you don't set primary_conninfo. I > think that's the way it should work, ie. if primary_conninfo is not set, > don't launch walreceiver but just keep trying to restore from the archive. Yeah, even if primary_conninfo is not given, the standby tries to invoke walreceiver by using the another connection settings (environment variables or defaults). This is intentional behavior, and would make the setup of SR easier. So I'd like to leave it be. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Fri, Feb 12, 2010 at 7:28 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > Yeah, even if primary_conninfo is not given, the standby tries to invoke > walreceiver by using the another connection settings (environment variables > or defaults). This is intentional behavior, and would make the setup of SR > easier. So I'd like to leave it be. On the other hand, if it has to use defaults for the target host/port, chances are high that either it connects to the wrong host/port or that SR is just not wanted :-) Whoever sets up SR will also take the effort to configure primary_conninfo and will have a different primary than the default - which I think is just the standby itself, no? Joachim
Fujii Masao wrote: > On Fri, Feb 12, 2010 at 3:19 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> Fujii Masao wrote: >>> But if we fail in restoring the archived WAL file, "standby_mode = on" >>> *always* tries to start streaming replication. >> Hmm, somehow I thought it doesn't if you don't set primary_conninfo. I >> think that's the way it should work, ie. if primary_conninfo is not set, >> don't launch walreceiver but just keep trying to restore from the archive. > > Yeah, even if primary_conninfo is not given, the standby tries to invoke > walreceiver by using the another connection settings (environment variables > or defaults). This is intentional behavior, and would make the setup of SR > easier. So I'd like to leave it be. You could do primary_conninfo='' for that. Maybe we should have two options, "streaming_mode='on'" and "primary_conninfo='...'". -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, Feb 12, 2010 at 4:04 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Fujii Masao wrote: >> On Fri, Feb 12, 2010 at 3:19 PM, Heikki Linnakangas >> <heikki.linnakangas@enterprisedb.com> wrote: >>> Fujii Masao wrote: >>>> But if we fail in restoring the archived WAL file, "standby_mode = on" >>>> *always* tries to start streaming replication. >>> Hmm, somehow I thought it doesn't if you don't set primary_conninfo. I >>> think that's the way it should work, ie. if primary_conninfo is not set, >>> don't launch walreceiver but just keep trying to restore from the archive. >> >> Yeah, even if primary_conninfo is not given, the standby tries to invoke >> walreceiver by using the another connection settings (environment variables >> or defaults). This is intentional behavior, and would make the setup of SR >> easier. So I'd like to leave it be. > > You could do primary_conninfo='' for that. > > Maybe we should have two options, "streaming_mode='on'" and > "primary_conninfo='...'". It looks better for me to extend the "standby_mode": For example, standby_mode = 'streaming' or 'archive'. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Fujii Masao wrote: > On Fri, Feb 12, 2010 at 4:04 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> Fujii Masao wrote: >>> On Fri, Feb 12, 2010 at 3:19 PM, Heikki Linnakangas >>> <heikki.linnakangas@enterprisedb.com> wrote: >>>> Fujii Masao wrote: >>>>> But if we fail in restoring the archived WAL file, "standby_mode = on" >>>>> *always* tries to start streaming replication. >>>> Hmm, somehow I thought it doesn't if you don't set primary_conninfo. I >>>> think that's the way it should work, ie. if primary_conninfo is not set, >>>> don't launch walreceiver but just keep trying to restore from the archive. >>> Yeah, even if primary_conninfo is not given, the standby tries to invoke >>> walreceiver by using the another connection settings (environment variables >>> or defaults). This is intentional behavior, and would make the setup of SR >>> easier. So I'd like to leave it be. >> You could do primary_conninfo='' for that. >> >> Maybe we should have two options, "streaming_mode='on'" and >> "primary_conninfo='...'". > > It looks better for me to extend the "standby_mode": > For example, standby_mode = 'streaming' or 'archive'. There's yet another mode that would be useful with hot standby: start up the standby, but don't poll the archive and don't try to connect to the master. Kind of 'paused' mode. Simon had functions to do that and more in the original hot standby patch. I've been thinking that this would work with just the three options we have now: standby_mode (true/false) controls whether the server keeps retrying until trigger file is found (if trigger_file is set), rather than finish recovery. primary_conninfo (string) specifies a connection string to use to connect to the master. If not given, don't try to connect. restore_command (string) specifies a command to use to restore a file from archive. If not given, don't try to restore files from archive. I think this is pretty coherent and easy to explain, and makes all the combinations restoring files from archive/streaming possible. But if someone comes up with an even better scheme, I'm all ears. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Joachim Wieland wrote: > On Fri, Feb 12, 2010 at 7:28 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> Yeah, even if primary_conninfo is not given, the standby tries to invoke >> walreceiver by using the another connection settings (environment variables >> or defaults). This is intentional behavior, and would make the setup of SR >> easier. So I'd like to leave it be. > > On the other hand, if it has to use defaults for the target host/port, > chances are high that either it connects to the wrong host/port or > that SR is just not wanted :-) Agreed. I've changed it now so that if primary_conninfo is not set, it doesn't try to establish a streaming connection. If you want to get the connection information from environment variables, you can use primary_conninfo=''. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, Feb 12, 2010 at 4:59 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Agreed. I've changed it now so that if primary_conninfo is not set, it > doesn't try to establish a streaming connection. If you want to get the > connection information from environment variables, you can use > primary_conninfo=''. OK, you win. I would live with primary_conninfo=''. And you need to change the document, recovery.conf.sample and so on. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Fri, Feb 12, 2010 at 8:59 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Agreed. I've changed it now so that if primary_conninfo is not set, it > doesn't try to establish a streaming connection. If you want to get the > connection information from environment variables, you can use > primary_conninfo=''. Why not just remove the default: If no primary_conninfo variable is set explicitly in the configuration file, check the environment variables. If the environment variable is not set, don't try to establish a connection. ? Joachim
Joachim Wieland wrote: > On Fri, Feb 12, 2010 at 8:59 AM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> Agreed. I've changed it now so that if primary_conninfo is not set, it >> doesn't try to establish a streaming connection. If you want to get the >> connection information from environment variables, you can use >> primary_conninfo=''. > > Why not just remove the default: > > If no primary_conninfo variable is set explicitly in the configuration > file, check the environment variables. If the environment variable is > not set, don't try to establish a connection. The environment variables in question are the libpq environment variables like PGHOST, PGPORT. The server shouldn't need to know about them. Besides, there'd still be the corner case that you really want to use the built-in defaults, ie. connect to a server running in the same host at the default port, so you'd not set any environment variables either. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > There's yet another mode that would be useful with hot standby: start up > the standby, but don't poll the archive and don't try to connect to the > master. Kind of 'paused' mode. Simon had functions to do that and more > in the original hot standby patch. And having the pause/resume functions would lower the need for perfect conflict resolution too. When you want to run this huge reporting query set and not get interrupted, pause the standby. Afterward, resume it. Of course, while paused, it's not a good HA standby anymore, but you just did pause it, so you're not surprised, right? > I've been thinking that this would work with just the three options we > have now: I like that, because it exposes exactly the code logic, and it is not complex enough to merit being hidden from the users. Also, you depend on understanding how the server really works to setup a trustworthy HA solution, so exposing the very used concepts is a win. > primary_conninfo (string) specifies a connection string to use to > connect to the master. If not given, don't try to connect. Would it be possible to expose that at the SQL level, so that you can easily check in scripts what master you're a slave of? Think nagios cascading alerts or topology graphs, etc. Regards, -- dim
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > Joachim Wieland wrote: >> If no primary_conninfo variable is set explicitly in the configuration >> file, check the environment variables. If the environment variable is >> not set, don't try to establish a connection. > The environment variables in question are the libpq environment > variables like PGHOST, PGPORT. The server shouldn't need to know about > them. Even more to the point is that some of them, like PGPORT, are highly likely to be set in a server's environment to point to the server itself. It would be extremely dangerous to automatically try to start replication just because we find those set. In fact, I would argue that we should fix things so that any such variables inherited from the server environment are intentionally *NOT* used for making SR connections. regards, tom lane
On Fri, Feb 12, 2010 at 11:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Even more to the point is that some of them, like PGPORT, are highly > likely to be set in a server's environment to point to the server > itself. It would be extremely dangerous to automatically try to start > replication just because we find those set. In fact, I would argue that > we should fix things so that any such variables inherited from the > server environment are intentionally *NOT* used for making SR > connections. There are many environment variables which libpq automatically uses. Which variables should not be used for SR connection? All? If both primary_conninfo and environment variables are not given, the default value (e.g., port = 5432) is automatically used for SR connection. Is this OK? or NG as well as the environment variables? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Fri, Feb 12, 2010 at 4:59 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Joachim Wieland wrote: >> On Fri, Feb 12, 2010 at 7:28 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> Yeah, even if primary_conninfo is not given, the standby tries to invoke >>> walreceiver by using the another connection settings (environment variables >>> or defaults). This is intentional behavior, and would make the setup of SR >>> easier. So I'd like to leave it be. >> >> On the other hand, if it has to use defaults for the target host/port, >> chances are high that either it connects to the wrong host/port or >> that SR is just not wanted :-) > > Agreed. I've changed it now so that if primary_conninfo is not set, it > doesn't try to establish a streaming connection. If you want to get the > connection information from environment variables, you can use > primary_conninfo=''. If standby_mode is enabled, and neither primary_conninfo nor restore_command are set, the standby would get stuck. How about forbidding (i.e., causing a FATAL message) this wrong setting? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Feb 24, 2010 at 2:18 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > If standby_mode is enabled, and neither primary_conninfo nor restore_command > are set, the standby would get stuck. How about forbidding (i.e., causing a > FATAL message) this wrong setting? Here is the patch which forbids that wrong setting of recovery.conf. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Wed, Mar 3, 2010 at 9:41 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Wed, Feb 24, 2010 at 2:18 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> If standby_mode is enabled, and neither primary_conninfo nor restore_command >> are set, the standby would get stuck. How about forbidding (i.e., causing a >> FATAL message) this wrong setting? > > Here is the patch which forbids that wrong setting of recovery.conf. I think that this patch should be applied. Otherwise, if you wrongly set neither primary_conninfo nor restore_command in recovery.conf, the standby server would do nothing and get stuck because it doesn't know where to retrieve the WAL files from. Banning the incorrect setting makes sense to me. Does anyone commit the patch? Does anyone have a say? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Mar 30, 2010 at 12:26 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Wed, Mar 3, 2010 at 9:41 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Wed, Feb 24, 2010 at 2:18 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> If standby_mode is enabled, and neither primary_conninfo nor restore_command >>> are set, the standby would get stuck. How about forbidding (i.e., causing a >>> FATAL message) this wrong setting? >> >> Here is the patch which forbids that wrong setting of recovery.conf. > > I think that this patch should be applied. Otherwise, if you wrongly > set neither primary_conninfo nor restore_command in recovery.conf, > the standby server would do nothing and get stuck because it doesn't > know where to retrieve the WAL files from. Banning the incorrect > setting makes sense to me. > > Does anyone commit the patch? Does anyone have a say? I just tested this and it seems to just sit there doing this over and over again: LOG: record with zero length at 0/3006B28 I'm not sure that we should forbid this configuration, but the current behavior doesn't seem right either. ISTM that, in the absence of a way to get any more WAL, it would be reasonable for the standby server to just start up and sit there in recovery mode but without actually advancing recovery, but the repeated log messages are pretty annoying.If we're connected in streaming mode and there is noactivity on the primary, we don't emit logs of this type, so it doesn't seem like we should do that if there is no primary either. A related question is... do we ever reload recovery.conf? I tried adding the setting to recovery.conf and doing pg_ctl reload, and it says that it's "reloading configuration files", but doesn't pick up the new setting. :-( ...Robert
On Wed, Mar 31, 2010 at 12:21 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I just tested this and it seems to just sit there doing this over and > over again: > > LOG: record with zero length at 0/3006B28 > > I'm not sure that we should forbid this configuration, but the current > behavior doesn't seem right either. ISTM that, in the absence of a > way to get any more WAL, it would be reasonable for the standby server > to just start up and sit there in recovery mode but without actually > advancing recovery, but the repeated log messages are pretty annoying. I'm concerned about that the configuration might prevent the standby from accepting connection from a client because it cannot get the WAL for making the database consistent. So that configuration seems to be reasonable only when starting the standby from the already-consistent database or with enough WAL files in pg_xlog. But it seems to me that the standby often starts from the inconsistent database without enough WAL in pg_xlog. > A related question is... do we ever reload recovery.conf? I tried > adding the setting to recovery.conf and doing pg_ctl reload, and it > says that it's "reloading configuration files", but doesn't pick up > the new setting. :-( recovery.conf cannot be reloaded while the server is running. This restriction should be removed in the future release, I think. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Mar 31, 2010 at 1:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Wed, Mar 31, 2010 at 12:21 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> I just tested this and it seems to just sit there doing this over and >> over again: >> >> LOG: record with zero length at 0/3006B28 >> >> I'm not sure that we should forbid this configuration, but the current >> behavior doesn't seem right either. ISTM that, in the absence of a >> way to get any more WAL, it would be reasonable for the standby server >> to just start up and sit there in recovery mode but without actually >> advancing recovery, but the repeated log messages are pretty annoying. > > I'm concerned about that the configuration might prevent the standby > from accepting connection from a client because it cannot get the WAL > for making the database consistent. So that configuration seems to be > reasonable only when starting the standby from the already-consistent > database or with enough WAL files in pg_xlog. But it seems to me that > the standby often starts from the inconsistent database without enough > WAL in pg_xlog. Agreed. I think if the server starts up in standby mode and it is an inconsistent state with no source of WAL, then the startup process should exit with a suitable error message, which AIUI will result in the whole server shutting down. However if there is no source of WAL but the server is in a consistent state, then I think we should allow it to start up as a read-only standby. Now, an interesting question is - if the server is in this state, and somebody manually drops more WAL into pg_xlog, what happens? And what happens in the similar case where primary_conninfo is set but we can't connect to the master at the moment, and someone drops a pile of WAL on us? >> A related question is... do we ever reload recovery.conf? I tried >> adding the setting to recovery.conf and doing pg_ctl reload, and it >> says that it's "reloading configuration files", but doesn't pick up >> the new setting. :-( > > recovery.conf cannot be reloaded while the server is running. This > restriction should be removed in the future release, I think. Yes. If we don't already have a TODO for that, we should definitely add one. I found myself annoyed by this several times last night. I kept having to restart the master, too, first to fix archive_mode and then to fix max_wal_senders. It's far too late to start tinkering with this stuff now but I am pretty confident there will be a huge sigh of collective relief out there if we can relax some of these restrictions for 9.1. Nobody likes having to shut down the server, even if it's just for a few seconds. ...Robert
Robert Haas wrote: > Agreed. I think if the server starts up in standby mode and it is an > inconsistent state with no source of WAL, then the startup process > should exit with a suitable error message, which AIUI will result in > the whole server shutting down. However if there is no source of WAL > but the server is in a consistent state, then I think we should allow > it to start up as a read-only standby. > > Now, an interesting question is - if the server is in this state, and > somebody manually drops more WAL into pg_xlog, what happens? And what > happens in the similar case where primary_conninfo is set but we can't > connect to the master at the moment, and someone drops a pile of WAL > on us? With the recent changes to the retry logic (http://archives.postgresql.org/pgsql-committers/2010-03/msg00356.php), they will be replayed. Even if neither primary_conninfo or restore_command is given, the server will still keep polling pg_xlog, and if you copy a WAL file to standby's pg_xlog directory, it will be replayed and recovery will make progress. I wouldn't recommend setting up a standby server like that, but it's not totally unreasonable. So the standby always has a potential source of WAL, pg_xlog. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, Mar 31, 2010 at 4:54 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Robert Haas wrote: >> Agreed. I think if the server starts up in standby mode and it is an >> inconsistent state with no source of WAL, then the startup process >> should exit with a suitable error message, which AIUI will result in >> the whole server shutting down. However if there is no source of WAL >> but the server is in a consistent state, then I think we should allow >> it to start up as a read-only standby. >> >> Now, an interesting question is - if the server is in this state, and >> somebody manually drops more WAL into pg_xlog, what happens? And what >> happens in the similar case where primary_conninfo is set but we can't >> connect to the master at the moment, and someone drops a pile of WAL >> on us? > > With the recent changes to the retry logic > (http://archives.postgresql.org/pgsql-committers/2010-03/msg00356.php), > they will be replayed. Even if neither primary_conninfo or > restore_command is given, the server will still keep polling pg_xlog, > and if you copy a WAL file to standby's pg_xlog directory, it will be > replayed and recovery will make progress. > > I wouldn't recommend setting up a standby server like that, but it's not > totally unreasonable. So the standby always has a potential source of > WAL, pg_xlog. OK. Is it reasonable to think that we can find a way to make it not print the duplicate messages over and over again? LOG: record with zero length at 0/3006B28 Maybe only print that if the location has advanced since the last such message? Should we make it shut down if it can't immediately read enough WAL to get to a consistent state, or just figure it's the user's job to fix it? ...Robert
Robert Haas wrote: > Is it reasonable to think that we can find a way to make it not print > the duplicate messages over and over again? > > LOG: record with zero length at 0/3006B28 > > Maybe only print that if the location has advanced since the last such message? Yeah, seems reasonable. > Should we make it shut down if it can't immediately read enough WAL to > get to a consistent state, or just figure it's the user's job to fix > it? I'd say no. In testing, I have done this many times: pg_start_backup() copy data directory to server create recovery.conf Start standby server. pg_stop_backup() The standby doesn't reach consistency before it sees the end-of-backup record written by pg_stop_backup(), but it does replay up to the last WAL segment, and connect to the master. Not sure if that's useful in real life, but there could be situations where restore_command isn't totally reliable, for example, and it's good to keep trying. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, Mar 31, 2010 at 5:23 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Robert Haas wrote: >> Is it reasonable to think that we can find a way to make it not print >> the duplicate messages over and over again? >> >> LOG: record with zero length at 0/3006B28 >> >> Maybe only print that if the location has advanced since the last such message? > > Yeah, seems reasonable. > >> Should we make it shut down if it can't immediately read enough WAL to >> get to a consistent state, or just figure it's the user's job to fix >> it? > > I'd say no. In testing, I have done this many times: > > pg_start_backup() > copy data directory to server > create recovery.conf > Start standby server. > pg_stop_backup() > > The standby doesn't reach consistency before it sees the end-of-backup > record written by pg_stop_backup(), but it does replay up to the last > WAL segment, and connect to the master. > > Not sure if that's useful in real life, but there could be situations > where restore_command isn't totally reliable, for example, and it's good > to keep trying. I was only thinking of doing it in the case where there's no primary_conninfo or restore_command. ...Robert
On Thu, Apr 1, 2010 at 6:04 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> I wouldn't recommend setting up a standby server like that, but it's not >> totally unreasonable. So the standby always has a potential source of >> WAL, pg_xlog. > > OK. OK, too. I turn down the patch. > Is it reasonable to think that we can find a way to make it not print > the duplicate messages over and over again? > > LOG: record with zero length at 0/3006B28 > > Maybe only print that if the location has advanced since the last such message? Agreed. But what log message is repeated depends on the situation. So message without any location might be output. BTW, In my testing, the following message was repeated. LOG: invalid magic number 0000 in log file 0, segment 14, offset 9617408 > Should we make it shut down if it can't immediately read enough WAL to > get to a consistent state, or just figure it's the user's job to fix > it? I think that it's difficult for the user to fix it. So I agree to shut down the server in that case, i.e., throw a FATAL when an invalid WAL record is found and recovery hasn't reached the safe starting point even if neither primary_conninfo nor restore_command is given. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Mar 31, 2010 at 9:01 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > Agreed. But what log message is repeated depends on the situation. > So message without any location might be output. BTW, In my testing, > the following message was repeated. > > LOG: invalid magic number 0000 in log file 0, segment 14, offset 9617408 Yeah, that's a pain in the neck. We need to think about a way to avoid any of these messages repeating. Not sure how, off the top of my head. >> Should we make it shut down if it can't immediately read enough WAL to >> get to a consistent state, or just figure it's the user's job to fix >> it? > > I think that it's difficult for the user to fix it. So I agree to shut > down the server in that case, i.e., throw a FATAL when an invalid WAL > record is found and recovery hasn't reached the safe starting point > even if neither primary_conninfo nor restore_command is given. I think that's reasonable. It's not like this should cause any problem for the user: they can add the missing WAL while the server is down just as well as they could if it were up, and Hot Standby isn't going to come up anyway. But I could possibly be persuaded to change my mind on this one, if someone feels strongly otherwise. ...Robert
On Wed, 2010-03-31 at 23:54 +0300, Heikki Linnakangas wrote: > Robert Haas wrote: > > Agreed. I think if the server starts up in standby mode and it is an > > inconsistent state with no source of WAL, then the startup process > > should exit with a suitable error message, which AIUI will result in > > the whole server shutting down. However if there is no source of WAL > > but the server is in a consistent state, then I think we should allow > > it to start up as a read-only standby. > > > > Now, an interesting question is - if the server is in this state, and > > somebody manually drops more WAL into pg_xlog, what happens? And what > > happens in the similar case where primary_conninfo is set but we can't > > connect to the master at the moment, and someone drops a pile of WAL > > on us? > > With the recent changes to the retry logic > (http://archives.postgresql.org/pgsql-committers/2010-03/msg00356.php), > they will be replayed. Even if neither primary_conninfo or > restore_command is given, the server will still keep polling pg_xlog, > and if you copy a WAL file to standby's pg_xlog directory, it will be > replayed and recovery will make progress. > > I wouldn't recommend setting up a standby server like that, but it's not > totally unreasonable. So the standby always has a potential source of > WAL, pg_xlog. I have inadvertently made it impossible to specify standby_mode && (!primary_conninfo && !restore_command) I did that because Robert had separately to this thread reported a hang, caused by this specification. I have verified this. pg_xlog is a *potential* source of WAL, but if the files requested are not present then the server just sits and waits with *no* messages. That is unacceptable, IMHO. What should we do now? -- Simon Riggs www.2ndQuadrant.com
On Mon, Feb 15, 2010 at 3:45 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Fri, Feb 12, 2010 at 11:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Even more to the point is that some of them, like PGPORT, are highly >> likely to be set in a server's environment to point to the server >> itself. It would be extremely dangerous to automatically try to start >> replication just because we find those set. In fact, I would argue that >> we should fix things so that any such variables inherited from the >> server environment are intentionally *NOT* used for making SR >> connections. This Tom's complaint is listed as a TODO item. How should we treat this? I'm leaning toward postponing the item to v9.1 or later. Currently the server during recovery doesn't accept the replication connection. So it's not so dangerous for walreceiver to use the environment variables which might point to the server itself, I think. That connection is always refused. Let us revisit this issue when we allow the standby server to accept the replication connection from another standby? And I think that we should prevent the standby from accepting the connection from its walreceiver, rather than prevent the standby from using the environment variables. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, Apr 5, 2010 at 4:46 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Wed, 2010-03-31 at 23:54 +0300, Heikki Linnakangas wrote: >> Robert Haas wrote: >> > Agreed. I think if the server starts up in standby mode and it is an >> > inconsistent state with no source of WAL, then the startup process >> > should exit with a suitable error message, which AIUI will result in >> > the whole server shutting down. However if there is no source of WAL >> > but the server is in a consistent state, then I think we should allow >> > it to start up as a read-only standby. >> > >> > Now, an interesting question is - if the server is in this state, and >> > somebody manually drops more WAL into pg_xlog, what happens? And what >> > happens in the similar case where primary_conninfo is set but we can't >> > connect to the master at the moment, and someone drops a pile of WAL >> > on us? >> >> With the recent changes to the retry logic >> (http://archives.postgresql.org/pgsql-committers/2010-03/msg00356.php), >> they will be replayed. Even if neither primary_conninfo or >> restore_command is given, the server will still keep polling pg_xlog, >> and if you copy a WAL file to standby's pg_xlog directory, it will be >> replayed and recovery will make progress. >> >> I wouldn't recommend setting up a standby server like that, but it's not >> totally unreasonable. So the standby always has a potential source of >> WAL, pg_xlog. > > I have inadvertently made it impossible to specify > standby_mode && (!primary_conninfo && !restore_command) > > I did that because Robert had separately to this thread reported a hang, > caused by this specification. I have verified this. I don't remember reporting this (or maybe you meant the other Robert); but there are so many threads on this topic that it's hard to keep track of them all. Can you refresh my memory? > pg_xlog is a *potential* source of WAL, but if the files requested are > not present then the server just sits and waits with *no* messages. That > is unacceptable, IMHO. > > What should we do now? Well, actually, what it does for me is sits there and prints the last xlog location over and over again every 2s. I'd actually like to get to "sits and waits with no messages", but it's not clear how to do that. ...Robert
On Mon, 2010-04-05 at 07:11 -0400, Robert Haas wrote: > On Mon, Apr 5, 2010 at 4:46 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Wed, 2010-03-31 at 23:54 +0300, Heikki Linnakangas wrote: > >> Robert Haas wrote: > >> > Agreed. I think if the server starts up in standby mode and it is an > >> > inconsistent state with no source of WAL, then the startup process > >> > should exit with a suitable error message, which AIUI will result in > >> > the whole server shutting down. However if there is no source of WAL > >> > but the server is in a consistent state, then I think we should allow > >> > it to start up as a read-only standby. > >> > > >> > Now, an interesting question is - if the server is in this state, and > >> > somebody manually drops more WAL into pg_xlog, what happens? And what > >> > happens in the similar case where primary_conninfo is set but we can't > >> > connect to the master at the moment, and someone drops a pile of WAL > >> > on us? > >> > >> With the recent changes to the retry logic > >> (http://archives.postgresql.org/pgsql-committers/2010-03/msg00356.php), > >> they will be replayed. Even if neither primary_conninfo or > >> restore_command is given, the server will still keep polling pg_xlog, > >> and if you copy a WAL file to standby's pg_xlog directory, it will be > >> replayed and recovery will make progress. > >> > >> I wouldn't recommend setting up a standby server like that, but it's not > >> totally unreasonable. So the standby always has a potential source of > >> WAL, pg_xlog. > > > > I have inadvertently made it impossible to specify > > standby_mode && (!primary_conninfo && !restore_command) > > > > I did that because Robert had separately to this thread reported a hang, > > caused by this specification. I have verified this. > > I don't remember reporting this (or maybe you meant the other Robert); > but there are so many threads on this topic that it's hard to keep > track of them all. Can you refresh my memory? > > > pg_xlog is a *potential* source of WAL, but if the files requested are > > not present then the server just sits and waits with *no* messages. That > > is unacceptable, IMHO. > > > > What should we do now? > > Well, actually, what it does for me is sits there and prints the last > xlog location over and over again every 2s. I'd actually like to get > to "sits and waits with no messages", but it's not clear how to do > that. That's exactly the opposite of your report. Thread you started, on hackers, in last week or so. It's not clear to me *why* you would want it to sit there doing nothing, and even if that has a purpose, saying nothing at all is not useful. (Note that it cannot enter Hot Standby mode even in that state). -- Simon Riggs www.2ndQuadrant.com
On Mon, Apr 5, 2010 at 7:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Mon, 2010-04-05 at 07:11 -0400, Robert Haas wrote: >> On Mon, Apr 5, 2010 at 4:46 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> > On Wed, 2010-03-31 at 23:54 +0300, Heikki Linnakangas wrote: >> >> Robert Haas wrote: >> >> > Agreed. I think if the server starts up in standby mode and it is an >> >> > inconsistent state with no source of WAL, then the startup process >> >> > should exit with a suitable error message, which AIUI will result in >> >> > the whole server shutting down. However if there is no source of WAL >> >> > but the server is in a consistent state, then I think we should allow >> >> > it to start up as a read-only standby. >> >> > >> >> > Now, an interesting question is - if the server is in this state, and >> >> > somebody manually drops more WAL into pg_xlog, what happens? And what >> >> > happens in the similar case where primary_conninfo is set but we can't >> >> > connect to the master at the moment, and someone drops a pile of WAL >> >> > on us? >> >> >> >> With the recent changes to the retry logic >> >> (http://archives.postgresql.org/pgsql-committers/2010-03/msg00356.php), >> >> they will be replayed. Even if neither primary_conninfo or >> >> restore_command is given, the server will still keep polling pg_xlog, >> >> and if you copy a WAL file to standby's pg_xlog directory, it will be >> >> replayed and recovery will make progress. >> >> >> >> I wouldn't recommend setting up a standby server like that, but it's not >> >> totally unreasonable. So the standby always has a potential source of >> >> WAL, pg_xlog. >> > >> > I have inadvertently made it impossible to specify >> > standby_mode && (!primary_conninfo && !restore_command) >> > >> > I did that because Robert had separately to this thread reported a hang, >> > caused by this specification. I have verified this. >> >> I don't remember reporting this (or maybe you meant the other Robert); >> but there are so many threads on this topic that it's hard to keep >> track of them all. Can you refresh my memory? >> >> > pg_xlog is a *potential* source of WAL, but if the files requested are >> > not present then the server just sits and waits with *no* messages. That >> > is unacceptable, IMHO. >> > >> > What should we do now? >> >> Well, actually, what it does for me is sits there and prints the last >> xlog location over and over again every 2s. I'd actually like to get >> to "sits and waits with no messages", but it's not clear how to do >> that. > > That's exactly the opposite of your report. Thread you started, on > hackers, in last week or so. Which thread? What was the subject line? The only thing I remember saying about this was: http://archives.postgresql.org/pgsql-hackers/2010-03/msg01247.php > It's not clear to me *why* you would want it to sit there doing nothing, > and even if that has a purpose, saying nothing at all is not useful. > (Note that it cannot enter Hot Standby mode even in that state). Actually it can, if the database state is consistent. Anyway, this was already discussed upthread... feel free to put in your $0.02. ...Robert
On Mon, 2010-04-05 at 18:03 +0900, Fujii Masao wrote: > I'm leaning toward postponing the item to v9.1 or later. If you want to defer anything, then I'd like to get a summary of what you are thinking of deferring and why that is acceptable. Right now there are lots of unfinished items and no movement on them. Yes, I'm unhappy about that. My feeling is that we should only default on "port" as part of the primary_conninfo. All other settings are required or we should reject streaming. Replication connections should be explicit. -- Simon Riggs www.2ndQuadrant.com