Re: recovery_connections cannot start (was Re: master in standby mode croaks) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: recovery_connections cannot start (was Re: master in standby mode croaks)
Date
Msg-id x2t603c8f071004230344i5dfce85fzcfd1ef0412879e38@mail.gmail.com
Whole thread Raw
In response to Re: recovery_connections cannot start (was Re: master in standby mode croaks)  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: recovery_connections cannot start (was Re: master in standby mode croaks)  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-hackers
On Fri, Apr 23, 2010 at 5:24 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Fujii Masao wrote:
>> On Fri, Apr 23, 2010 at 1:04 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> One way we could fix this is use 2 bits rather than 1 for
>>> XLogStandbyInfoMode.  One bit could indicate that either
>>> archive_mode=on or max_wal_senders>0, and the second bit could
>>> indicate that recovery_connections=on.  If the second bit is unset, we
>>> could emit the existing complaint:
>>>
>>> recovery connections cannot start because the recovery_connections
>>> parameter is disabled on the WAL source server
>>>
>>> If the other bit is unset, then we could instead complain:
>>>
>>> recovery connections cannot start because archive_mode=off and
>>> max_wal_senders=0 on the WAL source server
>>>
>>> If we don't want to use two bits there, it's hard to really describe
>>> all the possibilities in a reasonable number of characters.  The only
>>> thing I can think of is to print a message and a hint:
>>>
>>> recovery_connections cannot start due to incorrect settings on the WAL
>>> source server
>>> HINT: make sure recovery_connections=on and either archive_mode=on or
>>> max_wal_senders>0
>>>
>>> I haven't checked whether the hint would be displayed in the log on
>>> the standby, but presumably we could make that be the case if it's not
>>> already.
>>>
>>> I think the first way is better because it gives the user more
>>> specific information about what they need to fix.  Thinking about how
>>> each case might happen, since the default for recovery_connections is
>>> 'on', it seems that recovery_connections=off will likely only be an
>>> issue if the user has explicitly turned it off.  The other case, where
>>> archive_mode=off and max_wal_senders=0, will likely only occur if
>>> someone takes a snapshot of the master without first setting up
>>> archiving or SR.  Both of these will probably happen relatively
>>> rarely, but since we're burning a whole byte for XLogStandbyInfoMode
>>> (plus 3 more bytes of padding?), it seems like we might as well snag
>>> one more bit for clarity.
>>>
>>> Thoughts?
>>
>> I like the second choice since it's  simpler and enough for me.
>> But I have no objection to the first.
>>
>> When we encounter the error, we would need to not only change
>> those parameter values but also take a fresh base backup and
>> restart the standby using it. The description of this required
>> procedure needs to be in the document or error message, I think.
>
> I quite liked Robert's proposal to add an explicit GUC to control what
> extra information is logged
> (http://archives.postgresql.org/pgsql-hackers/2010-04/msg00509.php). It
> is quite difficult to explain the current behavior, a simple explicit
> wal_mode GUC would be a lot simpler. It wouldn't add any extra steps to
> setting the system up, you currently need to set archive_mode='on'
> anyway to enable archiving. You would just set wal_mode='archive' or
> wal_mode='standby' instead, depending on what you want to do with the WAL.

I liked it, too, but I sort of decided it didn't buy much.  There are
three separate sets of things that need to be controlled:

1. What WAL to emit - (a) just enough for crash recovery, (b) enough
for log shipping, (c) enough for log shipping with recovery
connections.

2. Whether to run the archiver.

3. Whether to allow streaming replication connections (and if so, how many).

If the answer to (1) is "just enough for crash recovery", then (2) and
(3) must be "no".  But if (1) is either of the other two options, then
any combination of answers for (2) and (3) is seemingly sensible,
though having both (2) and (3) as no is probably of limited utility.
But at a mimium, you could certainly have:

crash recovery/no archiver/no SR
log shipping/archiver/no SR
log shipping/no archiver/SR
log shipping/archiver/SR
recovery connections/archiver/no SR
recovery connections/no archiver/SR
recovery connections/archiver/SR

I don't see any reasonable way to package all of that up in a single
GUC.  Thoughts?

...Robert


pgsql-hackers by date:

Previous
From: Piyush Newe
Date:
Subject: Issue with ReRaise in PG
Next
From: Heikki Linnakangas
Date:
Subject: Re: recovery_connections cannot start (was Re: master in standby mode croaks)