> From: Jose Ildefonso Camargo Tolosa [ildefonso.camargo@gmail.com]
> Sent: Saturday, July 14, 2012 9:36 AM
>On Fri, Jul 13, 2012 at 11:12 PM, Amit kapila <amit.kapila@huawei.com> wrote:
> From: pgsql-hackers-owner@postgresql.org [pgsql-hackers-owner@postgresql.org] on behalf of Jose Ildefonso Camargo
Tolosa[ildefonso.camargo@gmail.com]
> Sent: Saturday, July 14, 2012 6:08 AM
> On Fri, Jul 13, 2012 at 10:22 AM, Bruce Momjian <bruce@momjian.us> wrote:
>> On Fri, Jul 13, 2012 at 09:12:56AM +0200, Hampus Wessman wrote:
>>
>>>> So how about this for a Postgres TODO:
>>>>
>>>> Add configuration variable to allow Postgres to disable synchronous
>>>> replication after a specified timeout, and add variable to alert
>>>> administrators of the change.
>
>>> I agree we need a TODO for this, but... I think timeout-only is not
>>> the best choice, there should be a maximum timeout (as a last
>>> resource: the maximum time we are willing to wait for standby, this
>>> have to have the option of "forever"), but certainly PostgreSQL have
>>> to detect the *complete* disconnection of the standby (or all standbys
>>> on the synchronous_standby_names), if it detects that no standbys are
>>> eligible for sync standby AND the option to do fallback to async is
>>> enabled = it will go into standalone mode (as if
>>> synchronous_standby_names were empty), otherwise (if option is
>>> disabled) it will just continue to wait for ever (the "last resource"
>>> timeout is ignored if the fallback option is disabled).... I would
>>> call this "soft_synchronous_standby", and
>>> "soft_synchronous_standby_timeout" (in seconds, 0=forever, a sane
>>> value would be ~5 seconds) or something like that (I'm quite bad at
>>> picking names :( ).
>
> >After it has gone to standalone mode, if the standby came back will it be able to return back to sync mode with it.
> That's the idea, yes, after the standby comes back, the master would
> act as if the sync standby connected for the first time: first going
> through the "catchup" mode, and "once the lag between standby and
> primary reaches zero "(...)" we move to real-time streaming state"
> (from 9.1 docs), at that point: normal sync behavior is restored.
Idea wise, it looks okay, but are you sure that in the current code/design, it can handle the way you are suggesting.
I am not sure it can work because it might be the case that due to network instability, the master has gone in
standalonemode
and now after standy is able to communicate back, it might be expecting to get more data rather than go in cacthup
mode.
I believe some person who is expert of this code area can comment here to make it more concrete.
With Regards,
Amit Kapila.