On Fri, Oct 8, 2010 at 8:44 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> Additional code? Yes. Foot-gun? Yes. Timeout should be disabled by
> default so that you get wait forever unless you ask for something different?
> Probably. Unneeded? This is where we don't agree anymore. The example
> that Josh Berkus just sent to the list is a typical example of what I expect
> people to do here. They'll use Sync Rep to maximize the odds a system
> failure doesn't cause any transaction loss. They'll use good quality
> hardware on the master so it's unlikely to fail. But when the database
> finds the standby unreachable, and it's left with the choice between either
> degrading into async rep or coming to a complete halt, you must give people
> the option of choosing to degrade instead after a timeout. Let them set off
> the red flashing lights, sound the alarms, and pray the master doesn't go
> down until you can fix the problem. But the choice to allow uptime concerns
> to win over the normal sync rep preferences, that's a completely valid
> business decision people will absolutely want to make in a way opposite of
> your personal preference here.
Definitely agreed.
> I don't see this as needing any implementation any more complicated than the
> usual way such timeouts are handled. Note how long you've been trying to
> reach the standby. Default to -1 for forever. And if you hit the timeout,
> mark the standby as degraded and force them to do a proper resync when they
> disconnect. Once that's done, then they can re-enter sync rep mode again,
> via the same process a new node would have done so.
Fair enough.
One question is when this timeout is applied. Obviously it should be applied
when the standby goes down. But timeout should be applied when we initially
start the master, and when no standby has not connected to new master yet after
failover?
I guess that people who want wait-forever would want to use "timeout = -1"
for all those cases. Otherwise they cannot ensure their no data loss.
OTOH, people who don't want wait-forever would not want to wait for timeout
in the latter two cases. So ISTM that something like enable_wait_forever or
reaction_after_timeout parameter is required separately from the timeout.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center