Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> You're not considering the possibility of a transient communication
> >> failure.
>
> > Can't the master re-send the request after a timeout?
>
> Not "it can", but "it has to". The master *must* keep hold of that
> request forever (or until the slave responds, or until we reconfigure
> the system not to consider that slave valid anymore). Similarly, the
> slave cannot forget the maybe-committed transaction on pain of not being
> a valid slave anymore. You can make this work, but the resource costs
> are steep. For instance, in Postgres, you don't get to truncate the WAL
> log, for what could be a really really long time --- more disk space
> than you wanted to spend on WAL anyway. The locks held by the
> maybe-committed transaction are another potentially unpleasant problem;
> you can't release them, no matter what else they are blocking.
I think we would need a configurable timeout to say a slave is no longer
valid, like 60 seconds, and then let everyone release. We can let the
administrator decide how long he wants to try to keep two hosts
communicating. I don't see this as much different from multi-master
replication problems.
-- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610)
359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square,
Pennsylvania19073