Re: Synchronous Standalone Master Redoux - Mailing list pgsql-hackers
From | Jose Ildefonso Camargo Tolosa |
---|---|
Subject | Re: Synchronous Standalone Master Redoux |
Date | |
Msg-id | CAETJ_S_GReZ05SCy=dzAGN5+KAQ5gGmS5q-v2D7fU0_PkGJmtg@mail.gmail.com Whole thread Raw |
In response to | Re: Synchronous Standalone Master Redoux (Amit kapila <amit.kapila@huawei.com>) |
Responses |
Re: Synchronous Standalone Master Redoux
|
List | pgsql-hackers |
On Sat, Jul 14, 2012 at 12:42 AM, Amit kapila <amit.kapila@huawei.com> wrote: >> From: Jose Ildefonso Camargo Tolosa [ildefonso.camargo@gmail.com] >> Sent: Saturday, July 14, 2012 9:36 AM >>On Fri, Jul 13, 2012 at 11:12 PM, Amit kapila <amit.kapila@huawei.com> wrote: >> From: pgsql-hackers-owner@postgresql.org [pgsql-hackers-owner@postgresql.org] on behalf of Jose Ildefonso Camargo Tolosa[ildefonso.camargo@gmail.com] >> Sent: Saturday, July 14, 2012 6:08 AM >> On Fri, Jul 13, 2012 at 10:22 AM, Bruce Momjian <bruce@momjian.us> wrote: >>> On Fri, Jul 13, 2012 at 09:12:56AM +0200, Hampus Wessman wrote: >>> >>>>> So how about this for a Postgres TODO: >>>>> >>>>> Add configuration variable to allow Postgres to disable synchronous >>>>> replication after a specified timeout, and add variable to alert >>>>> administrators of the change. >> >>>> I agree we need a TODO for this, but... I think timeout-only is not >>>> the best choice, there should be a maximum timeout (as a last >>>> resource: the maximum time we are willing to wait for standby, this >>>> have to have the option of "forever"), but certainly PostgreSQL have >>>> to detect the *complete* disconnection of the standby (or all standbys >>>> on the synchronous_standby_names), if it detects that no standbys are >>>> eligible for sync standby AND the option to do fallback to async is >>>> enabled = it will go into standalone mode (as if >>>> synchronous_standby_names were empty), otherwise (if option is >>>> disabled) it will just continue to wait for ever (the "last resource" >>>> timeout is ignored if the fallback option is disabled).... I would >>>> call this "soft_synchronous_standby", and >>>> "soft_synchronous_standby_timeout" (in seconds, 0=forever, a sane >>>> value would be ~5 seconds) or something like that (I'm quite bad at >>>> picking names :( ). >> >> >After it has gone to standalone mode, if the standby came back will it be able to return back to sync mode with it. > >> That's the idea, yes, after the standby comes back, the master would >> act as if the sync standby connected for the first time: first going >> through the "catchup" mode, and "once the lag between standby and >> primary reaches zero "(...)" we move to real-time streaming state" >> (from 9.1 docs), at that point: normal sync behavior is restored. > > Idea wise, it looks okay, but are you sure that in the current code/design, it can handle the way you are suggesting. > I am not sure it can work because it might be the case that due to network instability, the master has gone in standalonemode > and now after standy is able to communicate back, it might be expecting to get more data rather than go in cacthup mode. > I believe some person who is expert of this code area can comment here to make it more concrete. Well, I'd need to dive into the code, but as far as I know, is the master who decides to be on "catchup" mode, and standby just takes care of sending feedback to master. Also, it has to handle the situation, because currently, if master goes away because it crashed, or because of network issues, the standby doesn't really know why, and will reconnect to master and do whatever it needs to do to get in sync with master again (be it: try to reconnect several times while master is restarting, or that it just reconnect to a waiting master, and request pending WAL segments). There have to be code in place to handle those issues, because it is already working. I'm trying to get a solution that is as non-intrusive as possible, with lower amount of code added, so that performance doesn't suffer by reusing current logic and actions, with small alterations. > > With Regards, > Amit Kapila. -- Ildefonso Camargo Command Prompt, Inc. - http://www.commandprompt.com/ PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC @cmdpromptinc - 509-416-6579
pgsql-hackers by date: