Thread: Parameter name standby_mode

Parameter name standby_mode

From
Joachim Wieland
Date:
We want to teach people that Hot Standby and Streaming Replication are
two different features. However, Streaming Replication calls its main
parameter "standby_mode" which reminds more of Hot Standby than of
Streaming Replication.

People could also run a warm standby without streaming replication,
which would result in a standby that has standby_mode = 'off'.

I found the parameter name confusing and I'd vote for changing its name.


Joachim


Re: Parameter name standby_mode

From
Heikki Linnakangas
Date:
Joachim Wieland wrote:
> We want to teach people that Hot Standby and Streaming Replication are
> two different features.

I'm not sure about that, actually. Now that they're both in the tree,
they work nicely together and many users will think of them as one.

> However, Streaming Replication calls its main
> parameter "standby_mode" which reminds more of Hot Standby than of
> Streaming Replication.
> 
> People could also run a warm standby without streaming replication,
> which would result in a standby that has standby_mode = 'off'.

If they want to implement the warm standby using the (new) built-in
logic to keep retrying restore_command, they would set
standby_mode='on'. standby_mode='on' doesn't imply streaming replication.

If you want to use pg_standby or similar tools, then you would indeed
set standby_mode='off', but I think that makes sense because you're
implementing the standby functionality outside the server in that case.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Parameter name standby_mode

From
Joachim Wieland
Date:
On Wed, Feb 10, 2010 at 12:16 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> If they want to implement the warm standby using the (new) built-in
> logic to keep retrying restore_command, they would set
> standby_mode='on'. standby_mode='on' doesn't imply streaming replication.
>
> If you want to use pg_standby or similar tools, then you would indeed
> set standby_mode='off', but I think that makes sense because you're
> implementing the standby functionality outside the server in that case.

Okay, got it now with your explanations.

For some reason it didn't work before with standby_mode = 'on' (it
does now) and the warning "FATAL:  sorry, too many standbys already"
gave me a first suspicion that SR is the only use case for this. Then
I checked the docs and there it said "If this parameter is on, the
streaming replication is enabled". I understand now what it does and
that it is a prerequisite but that there is also a non-SR use case...
So the name is okay for me :-)


Thanks again,
Joachim


Re: Parameter name standby_mode

From
Simon Riggs
Date:
On Wed, 2010-02-10 at 13:16 +0200, Heikki Linnakangas wrote:

> If they want to implement the warm standby using the (new) built-in
> logic to keep retrying restore_command, they would set
> standby_mode='on'. standby_mode='on' doesn't imply streaming replication.

The docs say "If this parameter is on, the streaming replication is
enabled". So who is wrong?

ISTM that Joachim's viewpoint is right and that most people will be
confused about this.

I think we need something named more intuitively. Something that better
describes what action (i.e. a verb) will occur when this is set.

Suggestions: streaming_replication = on
We may need to split out various complexities into multiple parameters,
or have valued parameters, e.g. standby_mode = REPLICA.

-- Simon Riggs           www.2ndQuadrant.com



Re: Parameter name standby_mode

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> The docs say "If this parameter is on, the streaming replication is
> enabled". So who is wrong?

The docs.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Parameter name standby_mode

From
Fujii Masao
Date:
On Wed, Feb 10, 2010 at 8:16 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> If they want to implement the warm standby using the (new) built-in
> logic to keep retrying restore_command, they would set
> standby_mode='on'. standby_mode='on' doesn't imply streaming replication.

But if we fail in restoring the archived WAL file, "standby_mode = on"
*always* tries to start streaming replication.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Parameter name standby_mode

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> On Wed, Feb 10, 2010 at 8:16 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> If they want to implement the warm standby using the (new) built-in
>> logic to keep retrying restore_command, they would set
>> standby_mode='on'. standby_mode='on' doesn't imply streaming replication.
> 
> But if we fail in restoring the archived WAL file, "standby_mode = on"
> *always* tries to start streaming replication.

Hmm, somehow I thought it doesn't if you don't set primary_conninfo. I
think that's the way it should work, ie. if primary_conninfo is not set,
don't launch walreceiver but just keep trying to restore from the archive.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Parameter name standby_mode

From
Fujii Masao
Date:
On Fri, Feb 12, 2010 at 3:19 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Fujii Masao wrote:
>> On Wed, Feb 10, 2010 at 8:16 PM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>>> If they want to implement the warm standby using the (new) built-in
>>> logic to keep retrying restore_command, they would set
>>> standby_mode='on'. standby_mode='on' doesn't imply streaming replication.
>>
>> But if we fail in restoring the archived WAL file, "standby_mode = on"
>> *always* tries to start streaming replication.
>
> Hmm, somehow I thought it doesn't if you don't set primary_conninfo. I
> think that's the way it should work, ie. if primary_conninfo is not set,
> don't launch walreceiver but just keep trying to restore from the archive.

Yeah, even if primary_conninfo is not given, the standby tries to invoke
walreceiver by using the another connection settings (environment variables
or defaults). This is intentional behavior, and would make the setup of SR
easier. So I'd like to leave it be.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Parameter name standby_mode

From
Joachim Wieland
Date:
On Fri, Feb 12, 2010 at 7:28 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> Yeah, even if primary_conninfo is not given, the standby tries to invoke
> walreceiver by using the another connection settings (environment variables
> or defaults). This is intentional behavior, and would make the setup of SR
> easier. So I'd like to leave it be.

On the other hand, if it has to use defaults for the target host/port,
chances are high that either it connects to the wrong host/port or
that SR is just not wanted :-)

Whoever sets up SR will also take the effort to configure
primary_conninfo and will have a different primary than the default -
which I think is just the standby itself, no?


Joachim


Re: Parameter name standby_mode

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> On Fri, Feb 12, 2010 at 3:19 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> Fujii Masao wrote:
>>> But if we fail in restoring the archived WAL file, "standby_mode = on"
>>> *always* tries to start streaming replication.
>> Hmm, somehow I thought it doesn't if you don't set primary_conninfo. I
>> think that's the way it should work, ie. if primary_conninfo is not set,
>> don't launch walreceiver but just keep trying to restore from the archive.
> 
> Yeah, even if primary_conninfo is not given, the standby tries to invoke
> walreceiver by using the another connection settings (environment variables
> or defaults). This is intentional behavior, and would make the setup of SR
> easier. So I'd like to leave it be.

You could do primary_conninfo='' for that.

Maybe we should have two options, "streaming_mode='on'" and
"primary_conninfo='...'".

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Parameter name standby_mode

From
Fujii Masao
Date:
On Fri, Feb 12, 2010 at 4:04 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Fujii Masao wrote:
>> On Fri, Feb 12, 2010 at 3:19 PM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>>> Fujii Masao wrote:
>>>> But if we fail in restoring the archived WAL file, "standby_mode = on"
>>>> *always* tries to start streaming replication.
>>> Hmm, somehow I thought it doesn't if you don't set primary_conninfo. I
>>> think that's the way it should work, ie. if primary_conninfo is not set,
>>> don't launch walreceiver but just keep trying to restore from the archive.
>>
>> Yeah, even if primary_conninfo is not given, the standby tries to invoke
>> walreceiver by using the another connection settings (environment variables
>> or defaults). This is intentional behavior, and would make the setup of SR
>> easier. So I'd like to leave it be.
>
> You could do primary_conninfo='' for that.
>
> Maybe we should have two options, "streaming_mode='on'" and
> "primary_conninfo='...'".

It looks better for me to extend the "standby_mode":
For example, standby_mode = 'streaming' or 'archive'.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Parameter name standby_mode

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> On Fri, Feb 12, 2010 at 4:04 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> Fujii Masao wrote:
>>> On Fri, Feb 12, 2010 at 3:19 PM, Heikki Linnakangas
>>> <heikki.linnakangas@enterprisedb.com> wrote:
>>>> Fujii Masao wrote:
>>>>> But if we fail in restoring the archived WAL file, "standby_mode = on"
>>>>> *always* tries to start streaming replication.
>>>> Hmm, somehow I thought it doesn't if you don't set primary_conninfo. I
>>>> think that's the way it should work, ie. if primary_conninfo is not set,
>>>> don't launch walreceiver but just keep trying to restore from the archive.
>>> Yeah, even if primary_conninfo is not given, the standby tries to invoke
>>> walreceiver by using the another connection settings (environment variables
>>> or defaults). This is intentional behavior, and would make the setup of SR
>>> easier. So I'd like to leave it be.
>> You could do primary_conninfo='' for that.
>>
>> Maybe we should have two options, "streaming_mode='on'" and
>> "primary_conninfo='...'".
> 
> It looks better for me to extend the "standby_mode":
> For example, standby_mode = 'streaming' or 'archive'.

There's yet another mode that would be useful with hot standby: start up
the standby, but don't poll the archive and don't try to connect to the
master. Kind of 'paused' mode. Simon had functions to do that and more
in the original hot standby patch.

I've been thinking that this would work with just the three options we
have now:

standby_mode (true/false) controls whether the server keeps retrying
until trigger file is found (if trigger_file is set), rather than finish
recovery.

primary_conninfo (string) specifies a connection string to use to
connect to the master. If not given, don't try to connect.

restore_command (string) specifies a command to use to restore a file
from archive. If not given, don't try to restore files from archive.

I think this is pretty coherent and easy to explain, and makes all the
combinations restoring files from archive/streaming possible. But if
someone comes up with an even better scheme, I'm all ears.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Parameter name standby_mode

From
Heikki Linnakangas
Date:
Joachim Wieland wrote:
> On Fri, Feb 12, 2010 at 7:28 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> Yeah, even if primary_conninfo is not given, the standby tries to invoke
>> walreceiver by using the another connection settings (environment variables
>> or defaults). This is intentional behavior, and would make the setup of SR
>> easier. So I'd like to leave it be.
> 
> On the other hand, if it has to use defaults for the target host/port,
> chances are high that either it connects to the wrong host/port or
> that SR is just not wanted :-)

Agreed. I've changed it now so that if primary_conninfo is not set, it
doesn't try to establish a streaming connection. If you want to get the
connection information from environment variables, you can use
primary_conninfo=''.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Parameter name standby_mode

From
Fujii Masao
Date:
On Fri, Feb 12, 2010 at 4:59 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Agreed. I've changed it now so that if primary_conninfo is not set, it
> doesn't try to establish a streaming connection. If you want to get the
> connection information from environment variables, you can use
> primary_conninfo=''.

OK, you win. I would live with primary_conninfo=''.

And you need to change the document, recovery.conf.sample and
so on.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Parameter name standby_mode

From
Joachim Wieland
Date:
On Fri, Feb 12, 2010 at 8:59 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Agreed. I've changed it now so that if primary_conninfo is not set, it
> doesn't try to establish a streaming connection. If you want to get the
> connection information from environment variables, you can use
> primary_conninfo=''.

Why not just remove the default:

If no primary_conninfo variable is set explicitly in the configuration
file, check the environment variables. If the environment variable is
not set, don't try to establish a connection.

?

Joachim


Re: Parameter name standby_mode

From
Heikki Linnakangas
Date:
Joachim Wieland wrote:
> On Fri, Feb 12, 2010 at 8:59 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> Agreed. I've changed it now so that if primary_conninfo is not set, it
>> doesn't try to establish a streaming connection. If you want to get the
>> connection information from environment variables, you can use
>> primary_conninfo=''.
> 
> Why not just remove the default:
> 
> If no primary_conninfo variable is set explicitly in the configuration
> file, check the environment variables. If the environment variable is
> not set, don't try to establish a connection.

The environment variables in question are the libpq environment
variables like PGHOST, PGPORT. The server shouldn't need to know about
them. Besides, there'd still be the corner case that you really want to
use the built-in defaults, ie. connect to a server running in the same
host at the default port, so you'd not set any environment variables either.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Parameter name standby_mode

From
Dimitri Fontaine
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> There's yet another mode that would be useful with hot standby: start up
> the standby, but don't poll the archive and don't try to connect to the
> master. Kind of 'paused' mode. Simon had functions to do that and more
> in the original hot standby patch.

And having the pause/resume functions would lower the need for perfect
conflict resolution too. When you want to run this huge reporting query
set and not get interrupted, pause the standby. Afterward, resume it.

Of course, while paused, it's not a good HA standby anymore, but you
just did pause it, so you're not surprised, right?

> I've been thinking that this would work with just the three options we
> have now:

I like that, because it exposes exactly the code logic, and it is not
complex enough to merit being hidden from the users. Also, you depend on
understanding how the server really works to setup a trustworthy HA
solution, so exposing the very used concepts is a win.

> primary_conninfo (string) specifies a connection string to use to
> connect to the master. If not given, don't try to connect.

Would it be possible to expose that at the SQL level, so that you can
easily check in scripts what master you're a slave of? Think nagios
cascading alerts or topology graphs, etc.

Regards,
-- 
dim


Re: Parameter name standby_mode

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Joachim Wieland wrote:
>> If no primary_conninfo variable is set explicitly in the configuration
>> file, check the environment variables. If the environment variable is
>> not set, don't try to establish a connection.

> The environment variables in question are the libpq environment
> variables like PGHOST, PGPORT. The server shouldn't need to know about
> them.

Even more to the point is that some of them, like PGPORT, are highly
likely to be set in a server's environment to point to the server
itself.  It would be extremely dangerous to automatically try to start
replication just because we find those set.  In fact, I would argue that
we should fix things so that any such variables inherited from the
server environment are intentionally *NOT* used for making SR
connections.
        regards, tom lane


Re: Parameter name standby_mode

From
Fujii Masao
Date:
On Fri, Feb 12, 2010 at 11:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Even more to the point is that some of them, like PGPORT, are highly
> likely to be set in a server's environment to point to the server
> itself.  It would be extremely dangerous to automatically try to start
> replication just because we find those set.  In fact, I would argue that
> we should fix things so that any such variables inherited from the
> server environment are intentionally *NOT* used for making SR
> connections.

There are many environment variables which libpq automatically uses.
Which variables should not be used for SR connection? All?

If both primary_conninfo and environment variables are not given,
the default value (e.g., port = 5432) is automatically used for SR
connection. Is this OK? or NG as well as the environment variables?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Parameter name standby_mode

From
Fujii Masao
Date:
On Fri, Feb 12, 2010 at 4:59 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Joachim Wieland wrote:
>> On Fri, Feb 12, 2010 at 7:28 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> Yeah, even if primary_conninfo is not given, the standby tries to invoke
>>> walreceiver by using the another connection settings (environment variables
>>> or defaults). This is intentional behavior, and would make the setup of SR
>>> easier. So I'd like to leave it be.
>>
>> On the other hand, if it has to use defaults for the target host/port,
>> chances are high that either it connects to the wrong host/port or
>> that SR is just not wanted :-)
>
> Agreed. I've changed it now so that if primary_conninfo is not set, it
> doesn't try to establish a streaming connection. If you want to get the
> connection information from environment variables, you can use
> primary_conninfo=''.

If standby_mode is enabled, and neither primary_conninfo nor restore_command
are set, the standby would get stuck. How about forbidding (i.e., causing a
FATAL message) this wrong setting?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Parameter name standby_mode

From
Fujii Masao
Date:
On Wed, Feb 24, 2010 at 2:18 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> If standby_mode is enabled, and neither primary_conninfo nor restore_command
> are set, the standby would get stuck. How about forbidding (i.e., causing a
> FATAL message) this wrong setting?

Here is the patch which forbids that wrong setting of recovery.conf.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: Parameter name standby_mode

From
Fujii Masao
Date:
On Wed, Mar 3, 2010 at 9:41 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Wed, Feb 24, 2010 at 2:18 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> If standby_mode is enabled, and neither primary_conninfo nor restore_command
>> are set, the standby would get stuck. How about forbidding (i.e., causing a
>> FATAL message) this wrong setting?
>
> Here is the patch which forbids that wrong setting of recovery.conf.

I think that this patch should be applied. Otherwise, if you wrongly
set neither primary_conninfo nor restore_command in recovery.conf,
the standby server would do nothing and get stuck because it doesn't
know where to retrieve the WAL files from. Banning the incorrect
setting makes sense to me.

Does anyone commit the patch? Does anyone have a say?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Parameter name standby_mode

From
Robert Haas
Date:
On Tue, Mar 30, 2010 at 12:26 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Wed, Mar 3, 2010 at 9:41 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Wed, Feb 24, 2010 at 2:18 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> If standby_mode is enabled, and neither primary_conninfo nor restore_command
>>> are set, the standby would get stuck. How about forbidding (i.e., causing a
>>> FATAL message) this wrong setting?
>>
>> Here is the patch which forbids that wrong setting of recovery.conf.
>
> I think that this patch should be applied. Otherwise, if you wrongly
> set neither primary_conninfo nor restore_command in recovery.conf,
> the standby server would do nothing and get stuck because it doesn't
> know where to retrieve the WAL files from. Banning the incorrect
> setting makes sense to me.
>
> Does anyone commit the patch? Does anyone have a say?

I just tested this and it seems to just sit there doing this over and
over again:

LOG:  record with zero length at 0/3006B28

I'm not sure that we should forbid this configuration, but the current
behavior doesn't seem right either.  ISTM that, in the absence of a
way to get any more WAL, it would be reasonable for the standby server
to just start up and sit there in recovery mode but without actually
advancing recovery, but the repeated log messages are pretty annoying.If we're connected in streaming mode and there is
noactivity on the
 
primary, we don't emit logs of this type, so it doesn't seem like we
should do that if there is no primary either.

A related question is... do we ever reload recovery.conf?  I tried
adding the setting to recovery.conf and doing pg_ctl reload, and it
says that it's "reloading configuration files", but doesn't pick up
the new setting.  :-(

...Robert


Re: Parameter name standby_mode

From
Fujii Masao
Date:
On Wed, Mar 31, 2010 at 12:21 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I just tested this and it seems to just sit there doing this over and
> over again:
>
> LOG:  record with zero length at 0/3006B28
>
> I'm not sure that we should forbid this configuration, but the current
> behavior doesn't seem right either.  ISTM that, in the absence of a
> way to get any more WAL, it would be reasonable for the standby server
> to just start up and sit there in recovery mode but without actually
> advancing recovery, but the repeated log messages are pretty annoying.

I'm concerned about that the configuration might prevent the standby
from accepting connection from a client because it cannot get the WAL
for making the database consistent. So that configuration seems to be
reasonable only when starting the standby from the already-consistent
database or with enough WAL files in pg_xlog. But it seems to me that
the standby often starts from the inconsistent database without enough
WAL in pg_xlog.

> A related question is... do we ever reload recovery.conf?  I tried
> adding the setting to recovery.conf and doing pg_ctl reload, and it
> says that it's "reloading configuration files", but doesn't pick up
> the new setting.  :-(

recovery.conf cannot be reloaded while the server is running. This
restriction should be removed in the future release, I think.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Parameter name standby_mode

From
Robert Haas
Date:
On Wed, Mar 31, 2010 at 1:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Wed, Mar 31, 2010 at 12:21 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I just tested this and it seems to just sit there doing this over and
>> over again:
>>
>> LOG:  record with zero length at 0/3006B28
>>
>> I'm not sure that we should forbid this configuration, but the current
>> behavior doesn't seem right either.  ISTM that, in the absence of a
>> way to get any more WAL, it would be reasonable for the standby server
>> to just start up and sit there in recovery mode but without actually
>> advancing recovery, but the repeated log messages are pretty annoying.
>
> I'm concerned about that the configuration might prevent the standby
> from accepting connection from a client because it cannot get the WAL
> for making the database consistent. So that configuration seems to be
> reasonable only when starting the standby from the already-consistent
> database or with enough WAL files in pg_xlog. But it seems to me that
> the standby often starts from the inconsistent database without enough
> WAL in pg_xlog.

Agreed.  I think if the server starts up in standby mode and it is an
inconsistent state with no source of WAL, then the startup process
should exit with a suitable error message, which AIUI will result in
the whole server shutting down.  However if there is no source of WAL
but the server is in a consistent state, then I think we should allow
it to start up as a read-only standby.

Now, an interesting question is - if the server is in this state, and
somebody manually drops more WAL into pg_xlog, what happens?  And what
happens in the similar case where primary_conninfo is set but we can't
connect to the master at the moment, and someone drops a pile of WAL
on us?

>> A related question is... do we ever reload recovery.conf?  I tried
>> adding the setting to recovery.conf and doing pg_ctl reload, and it
>> says that it's "reloading configuration files", but doesn't pick up
>> the new setting.  :-(
>
> recovery.conf cannot be reloaded while the server is running. This
> restriction should be removed in the future release, I think.

Yes.  If we don't already have a TODO for that, we should definitely
add one.  I found myself annoyed by this several times last night.  I
kept having to restart the master, too, first to fix archive_mode and
then to fix max_wal_senders.  It's far too late to start tinkering
with this stuff now but I am pretty confident there will be a huge
sigh of collective relief out there if we can relax some of these
restrictions for 9.1.  Nobody likes having to shut down the server,
even if it's just for a few seconds.

...Robert


Re: Parameter name standby_mode

From
Heikki Linnakangas
Date:
Robert Haas wrote:
> Agreed.  I think if the server starts up in standby mode and it is an
> inconsistent state with no source of WAL, then the startup process
> should exit with a suitable error message, which AIUI will result in
> the whole server shutting down.  However if there is no source of WAL
> but the server is in a consistent state, then I think we should allow
> it to start up as a read-only standby.
>
> Now, an interesting question is - if the server is in this state, and
> somebody manually drops more WAL into pg_xlog, what happens? And what
> happens in the similar case where primary_conninfo is set but we can't
> connect to the master at the moment, and someone drops a pile of WAL
> on us?

With the recent changes to the retry logic
(http://archives.postgresql.org/pgsql-committers/2010-03/msg00356.php),
they will be replayed. Even if neither primary_conninfo or
restore_command is given, the server will still keep polling pg_xlog,
and if you copy a WAL file to standby's pg_xlog directory, it will be
replayed and recovery will make progress.

I wouldn't recommend setting up a standby server like that, but it's not
totally unreasonable. So the standby always has a potential source of
WAL, pg_xlog.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Parameter name standby_mode

From
Robert Haas
Date:
On Wed, Mar 31, 2010 at 4:54 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Robert Haas wrote:
>> Agreed.  I think if the server starts up in standby mode and it is an
>> inconsistent state with no source of WAL, then the startup process
>> should exit with a suitable error message, which AIUI will result in
>> the whole server shutting down.  However if there is no source of WAL
>> but the server is in a consistent state, then I think we should allow
>> it to start up as a read-only standby.
>>
>> Now, an interesting question is - if the server is in this state, and
>> somebody manually drops more WAL into pg_xlog, what happens? And what
>> happens in the similar case where primary_conninfo is set but we can't
>> connect to the master at the moment, and someone drops a pile of WAL
>> on us?
>
> With the recent changes to the retry logic
> (http://archives.postgresql.org/pgsql-committers/2010-03/msg00356.php),
> they will be replayed. Even if neither primary_conninfo or
> restore_command is given, the server will still keep polling pg_xlog,
> and if you copy a WAL file to standby's pg_xlog directory, it will be
> replayed and recovery will make progress.
>
> I wouldn't recommend setting up a standby server like that, but it's not
> totally unreasonable. So the standby always has a potential source of
> WAL, pg_xlog.

OK.

Is it reasonable to think that we can find a way to make it not print
the duplicate messages over and over again?

LOG:  record with zero length at 0/3006B28

Maybe only print that if the location has advanced since the last such message?

Should we make it shut down if it can't immediately read enough WAL to
get to a consistent state, or just figure it's the user's job to fix
it?

...Robert


Re: Parameter name standby_mode

From
Heikki Linnakangas
Date:
Robert Haas wrote:
> Is it reasonable to think that we can find a way to make it not print
> the duplicate messages over and over again?
> 
> LOG:  record with zero length at 0/3006B28
> 
> Maybe only print that if the location has advanced since the last such message?

Yeah, seems reasonable.

> Should we make it shut down if it can't immediately read enough WAL to
> get to a consistent state, or just figure it's the user's job to fix
> it?

I'd say no. In testing, I have done this many times:

pg_start_backup()
copy data directory to server
create recovery.conf
Start standby server.
pg_stop_backup()

The standby doesn't reach consistency before it sees the end-of-backup
record written by pg_stop_backup(), but it does replay up to the last
WAL segment, and connect to the master.

Not sure if that's useful in real life, but there could be situations
where restore_command isn't totally reliable, for example, and it's good
to keep trying.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Parameter name standby_mode

From
Robert Haas
Date:
On Wed, Mar 31, 2010 at 5:23 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Robert Haas wrote:
>> Is it reasonable to think that we can find a way to make it not print
>> the duplicate messages over and over again?
>>
>> LOG:  record with zero length at 0/3006B28
>>
>> Maybe only print that if the location has advanced since the last such message?
>
> Yeah, seems reasonable.
>
>> Should we make it shut down if it can't immediately read enough WAL to
>> get to a consistent state, or just figure it's the user's job to fix
>> it?
>
> I'd say no. In testing, I have done this many times:
>
> pg_start_backup()
> copy data directory to server
> create recovery.conf
> Start standby server.
> pg_stop_backup()
>
> The standby doesn't reach consistency before it sees the end-of-backup
> record written by pg_stop_backup(), but it does replay up to the last
> WAL segment, and connect to the master.
>
> Not sure if that's useful in real life, but there could be situations
> where restore_command isn't totally reliable, for example, and it's good
> to keep trying.

I was only thinking of doing it in the case where there's no
primary_conninfo or restore_command.

...Robert


Re: Parameter name standby_mode

From
Fujii Masao
Date:
On Thu, Apr 1, 2010 at 6:04 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I wouldn't recommend setting up a standby server like that, but it's not
>> totally unreasonable. So the standby always has a potential source of
>> WAL, pg_xlog.
>
> OK.

OK, too. I turn down the patch.

> Is it reasonable to think that we can find a way to make it not print
> the duplicate messages over and over again?
>
> LOG:  record with zero length at 0/3006B28
>
> Maybe only print that if the location has advanced since the last such message?

Agreed. But what log message is repeated depends on the situation.
So message without any location might be output. BTW, In my testing,
the following message was repeated.
   LOG:  invalid magic number 0000 in log file 0, segment 14, offset 9617408

> Should we make it shut down if it can't immediately read enough WAL to
> get to a consistent state, or just figure it's the user's job to fix
> it?

I think that it's difficult for the user to fix it. So I agree to shut
down the server in that case, i.e., throw a FATAL when an invalid WAL
record is found and recovery hasn't reached the safe starting point
even if neither primary_conninfo nor restore_command is given.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Parameter name standby_mode

From
Robert Haas
Date:
On Wed, Mar 31, 2010 at 9:01 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> Agreed. But what log message is repeated depends on the situation.
> So message without any location might be output. BTW, In my testing,
> the following message was repeated.
>
>    LOG:  invalid magic number 0000 in log file 0, segment 14, offset 9617408

Yeah, that's a pain in the neck.  We need to think about a way to
avoid any of these messages repeating.  Not sure how, off the top of
my head.

>> Should we make it shut down if it can't immediately read enough WAL to
>> get to a consistent state, or just figure it's the user's job to fix
>> it?
>
> I think that it's difficult for the user to fix it. So I agree to shut
> down the server in that case, i.e., throw a FATAL when an invalid WAL
> record is found and recovery hasn't reached the safe starting point
> even if neither primary_conninfo nor restore_command is given.

I think that's reasonable.  It's not like this should cause any
problem for the user: they can add the missing WAL while the server is
down just as well as they could if it were up, and Hot Standby isn't
going to come up anyway.  But I could possibly be persuaded to change
my mind on this one, if someone feels strongly otherwise.

...Robert


Re: Parameter name standby_mode

From
Simon Riggs
Date:
On Wed, 2010-03-31 at 23:54 +0300, Heikki Linnakangas wrote:
> Robert Haas wrote:
> > Agreed.  I think if the server starts up in standby mode and it is an
> > inconsistent state with no source of WAL, then the startup process
> > should exit with a suitable error message, which AIUI will result in
> > the whole server shutting down.  However if there is no source of WAL
> > but the server is in a consistent state, then I think we should allow
> > it to start up as a read-only standby.
> >
> > Now, an interesting question is - if the server is in this state, and
> > somebody manually drops more WAL into pg_xlog, what happens? And what
> > happens in the similar case where primary_conninfo is set but we can't
> > connect to the master at the moment, and someone drops a pile of WAL
> > on us?
> 
> With the recent changes to the retry logic
> (http://archives.postgresql.org/pgsql-committers/2010-03/msg00356.php),
> they will be replayed. Even if neither primary_conninfo or
> restore_command is given, the server will still keep polling pg_xlog,
> and if you copy a WAL file to standby's pg_xlog directory, it will be
> replayed and recovery will make progress.
> 
> I wouldn't recommend setting up a standby server like that, but it's not
> totally unreasonable. So the standby always has a potential source of
> WAL, pg_xlog.

I have inadvertently made it impossible to specify   standby_mode && (!primary_conninfo && !restore_command)

I did that because Robert had separately to this thread reported a hang,
caused by this specification. I have verified this.

pg_xlog is a *potential* source of WAL, but if the files requested are
not present then the server just sits and waits with *no* messages. That
is unacceptable, IMHO.

What should we do now?

-- Simon Riggs           www.2ndQuadrant.com



Re: Parameter name standby_mode

From
Fujii Masao
Date:
On Mon, Feb 15, 2010 at 3:45 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Fri, Feb 12, 2010 at 11:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Even more to the point is that some of them, like PGPORT, are highly
>> likely to be set in a server's environment to point to the server
>> itself.  It would be extremely dangerous to automatically try to start
>> replication just because we find those set.  In fact, I would argue that
>> we should fix things so that any such variables inherited from the
>> server environment are intentionally *NOT* used for making SR
>> connections.

This Tom's complaint is listed as a TODO item. How should we treat this?

I'm leaning toward postponing the item to v9.1 or later. Currently the
server during recovery doesn't accept the replication connection. So
it's not so dangerous for walreceiver to use the environment variables
which might point to the server itself, I think. That connection is
always refused.

Let us revisit this issue when we allow the standby server to accept the
replication connection from another standby? And I think that we should
prevent the standby from accepting the connection from its walreceiver,
rather than prevent the standby from using the environment variables.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Parameter name standby_mode

From
Robert Haas
Date:
On Mon, Apr 5, 2010 at 4:46 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Wed, 2010-03-31 at 23:54 +0300, Heikki Linnakangas wrote:
>> Robert Haas wrote:
>> > Agreed.  I think if the server starts up in standby mode and it is an
>> > inconsistent state with no source of WAL, then the startup process
>> > should exit with a suitable error message, which AIUI will result in
>> > the whole server shutting down.  However if there is no source of WAL
>> > but the server is in a consistent state, then I think we should allow
>> > it to start up as a read-only standby.
>> >
>> > Now, an interesting question is - if the server is in this state, and
>> > somebody manually drops more WAL into pg_xlog, what happens? And what
>> > happens in the similar case where primary_conninfo is set but we can't
>> > connect to the master at the moment, and someone drops a pile of WAL
>> > on us?
>>
>> With the recent changes to the retry logic
>> (http://archives.postgresql.org/pgsql-committers/2010-03/msg00356.php),
>> they will be replayed. Even if neither primary_conninfo or
>> restore_command is given, the server will still keep polling pg_xlog,
>> and if you copy a WAL file to standby's pg_xlog directory, it will be
>> replayed and recovery will make progress.
>>
>> I wouldn't recommend setting up a standby server like that, but it's not
>> totally unreasonable. So the standby always has a potential source of
>> WAL, pg_xlog.
>
> I have inadvertently made it impossible to specify
>   standby_mode && (!primary_conninfo && !restore_command)
>
> I did that because Robert had separately to this thread reported a hang,
> caused by this specification. I have verified this.

I don't remember reporting this (or maybe you meant the other Robert);
but there are so many threads on this topic that it's hard to keep
track of them all.  Can you refresh my memory?

> pg_xlog is a *potential* source of WAL, but if the files requested are
> not present then the server just sits and waits with *no* messages. That
> is unacceptable, IMHO.
>
> What should we do now?

Well, actually, what it does for me is sits there and prints the last
xlog location over and over again every 2s.  I'd actually like to get
to "sits and waits with no messages", but it's not clear how to do
that.

...Robert


Re: Parameter name standby_mode

From
Simon Riggs
Date:
On Mon, 2010-04-05 at 07:11 -0400, Robert Haas wrote:
> On Mon, Apr 5, 2010 at 4:46 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > On Wed, 2010-03-31 at 23:54 +0300, Heikki Linnakangas wrote:
> >> Robert Haas wrote:
> >> > Agreed.  I think if the server starts up in standby mode and it is an
> >> > inconsistent state with no source of WAL, then the startup process
> >> > should exit with a suitable error message, which AIUI will result in
> >> > the whole server shutting down.  However if there is no source of WAL
> >> > but the server is in a consistent state, then I think we should allow
> >> > it to start up as a read-only standby.
> >> >
> >> > Now, an interesting question is - if the server is in this state, and
> >> > somebody manually drops more WAL into pg_xlog, what happens? And what
> >> > happens in the similar case where primary_conninfo is set but we can't
> >> > connect to the master at the moment, and someone drops a pile of WAL
> >> > on us?
> >>
> >> With the recent changes to the retry logic
> >> (http://archives.postgresql.org/pgsql-committers/2010-03/msg00356.php),
> >> they will be replayed. Even if neither primary_conninfo or
> >> restore_command is given, the server will still keep polling pg_xlog,
> >> and if you copy a WAL file to standby's pg_xlog directory, it will be
> >> replayed and recovery will make progress.
> >>
> >> I wouldn't recommend setting up a standby server like that, but it's not
> >> totally unreasonable. So the standby always has a potential source of
> >> WAL, pg_xlog.
> >
> > I have inadvertently made it impossible to specify
> >   standby_mode && (!primary_conninfo && !restore_command)
> >
> > I did that because Robert had separately to this thread reported a hang,
> > caused by this specification. I have verified this.
> 
> I don't remember reporting this (or maybe you meant the other Robert);
> but there are so many threads on this topic that it's hard to keep
> track of them all.  Can you refresh my memory?
> 
> > pg_xlog is a *potential* source of WAL, but if the files requested are
> > not present then the server just sits and waits with *no* messages. That
> > is unacceptable, IMHO.
> >
> > What should we do now?
> 
> Well, actually, what it does for me is sits there and prints the last
> xlog location over and over again every 2s.  I'd actually like to get
> to "sits and waits with no messages", but it's not clear how to do
> that.

That's exactly the opposite of your report. Thread you started, on
hackers, in last week or so.

It's not clear to me *why* you would want it to sit there doing nothing,
and even if that has a purpose, saying nothing at all is not useful.
(Note that it cannot enter Hot Standby mode even in that state).

-- Simon Riggs           www.2ndQuadrant.com



Re: Parameter name standby_mode

From
Robert Haas
Date:
On Mon, Apr 5, 2010 at 7:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Mon, 2010-04-05 at 07:11 -0400, Robert Haas wrote:
>> On Mon, Apr 5, 2010 at 4:46 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> > On Wed, 2010-03-31 at 23:54 +0300, Heikki Linnakangas wrote:
>> >> Robert Haas wrote:
>> >> > Agreed.  I think if the server starts up in standby mode and it is an
>> >> > inconsistent state with no source of WAL, then the startup process
>> >> > should exit with a suitable error message, which AIUI will result in
>> >> > the whole server shutting down.  However if there is no source of WAL
>> >> > but the server is in a consistent state, then I think we should allow
>> >> > it to start up as a read-only standby.
>> >> >
>> >> > Now, an interesting question is - if the server is in this state, and
>> >> > somebody manually drops more WAL into pg_xlog, what happens? And what
>> >> > happens in the similar case where primary_conninfo is set but we can't
>> >> > connect to the master at the moment, and someone drops a pile of WAL
>> >> > on us?
>> >>
>> >> With the recent changes to the retry logic
>> >> (http://archives.postgresql.org/pgsql-committers/2010-03/msg00356.php),
>> >> they will be replayed. Even if neither primary_conninfo or
>> >> restore_command is given, the server will still keep polling pg_xlog,
>> >> and if you copy a WAL file to standby's pg_xlog directory, it will be
>> >> replayed and recovery will make progress.
>> >>
>> >> I wouldn't recommend setting up a standby server like that, but it's not
>> >> totally unreasonable. So the standby always has a potential source of
>> >> WAL, pg_xlog.
>> >
>> > I have inadvertently made it impossible to specify
>> >   standby_mode && (!primary_conninfo && !restore_command)
>> >
>> > I did that because Robert had separately to this thread reported a hang,
>> > caused by this specification. I have verified this.
>>
>> I don't remember reporting this (or maybe you meant the other Robert);
>> but there are so many threads on this topic that it's hard to keep
>> track of them all.  Can you refresh my memory?
>>
>> > pg_xlog is a *potential* source of WAL, but if the files requested are
>> > not present then the server just sits and waits with *no* messages. That
>> > is unacceptable, IMHO.
>> >
>> > What should we do now?
>>
>> Well, actually, what it does for me is sits there and prints the last
>> xlog location over and over again every 2s.  I'd actually like to get
>> to "sits and waits with no messages", but it's not clear how to do
>> that.
>
> That's exactly the opposite of your report. Thread you started, on
> hackers, in last week or so.

Which thread?  What was the subject line?  The only thing I remember
saying about this was:

http://archives.postgresql.org/pgsql-hackers/2010-03/msg01247.php

> It's not clear to me *why* you would want it to sit there doing nothing,
> and even if that has a purpose, saying nothing at all is not useful.
> (Note that it cannot enter Hot Standby mode even in that state).

Actually it can, if the database state is consistent.  Anyway, this
was already discussed upthread...  feel free to put in your $0.02.

...Robert


Re: Parameter name standby_mode

From
Simon Riggs
Date:
On Mon, 2010-04-05 at 18:03 +0900, Fujii Masao wrote:

> I'm leaning toward postponing the item to v9.1 or later.

If you want to defer anything, then I'd like to get a summary of what
you are thinking of deferring and why that is acceptable. Right now
there are lots of unfinished items and no movement on them. Yes, I'm
unhappy about that.

My feeling is that we should only default on "port" as part of the
primary_conninfo. All other settings are required or we should reject
streaming. Replication connections should be explicit.

-- Simon Riggs           www.2ndQuadrant.com