Re: time-delayed standbys - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: time-delayed standbys
Date
Msg-id BANLkTikSyWhLtK2eUE=e0Sg0dcBRnjongw@mail.gmail.com
Whole thread Raw
In response to Re: time-delayed standbys  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: time-delayed standbys
List pgsql-hackers
On Fri, Jun 17, 2011 at 3:29 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Even if that were not an issue, I'm still more or less of the opinion
> that trying to solve the time synchronization problem is a rathole
> anyway.  To really solve this problem well, you're going to need the
> standby to send a message containing a timestamp, get a reply back
> from the master that contains that timestamp and a master timestamp,
> and then compute based on those two timestamps plus the reply
> timestamp the maximum and minimum possible lag between the two
> machines.  Then you're going to need to guess, based on several cycles
> of this activity, what the actual lag is, and adjust it over time (but
> not too quckly, unless of course a large manual step has occurred) as
> the clocks potentially drift apart from each other.  This is basically
> what ntpd does, except that it can be virtually guaranteed that our
> implementation will suck by comparison.  Time synchronization is
> neither easy nor our core competency, and I think trying to include it
> in this feature is going to result in a net loss of reliability.

Agreed. You've already added the note about time synchronization into
the document. That's enough, I think.

>>> errmsg("parameter \"%s\" requires a temporal value", "recovery_time_delay"),
>>
>> We should s/"a temporal"/"an Integer"?
>
> It seems strange to ask for an integer when what we want is an amount
> of time in seconds or minutes...

OK.

>> http://forge.mysql.com/worklog/task.php?id=344
>> According to the above page, one purpose of time-delayed replication is to
>> protect against user mistakes on master. But, when an user notices his wrong
>> operation on master, what should he do next? The WAL records of his wrong
>> operation might have already arrived at the standby, so neither "promote" nor
>> "restart" doesn't cancel that wrong operation. Instead, probably he should
>> shutdown the standby, investigate the timestamp of XID of the operation
>> he'd like to cancel, set recovery_target_time and restart the standby.
>> Something like this procedures should be documented? Or, we should
>> implement new "promote" mode which finishes a recovery as soon as
>> "promote" is requested (i.e., not replay all the available WAL records)?
>
> I like the idea of a new promote mode;

Are you going to implement that mode in this CF? or next one?

> and documenting the other
> approach you mention doesn't sound bad either.  Either one sounds like
> a job for a separate patch, though.
>
> The other option is to pause recovery and run pg_dump...

Yes, please.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: planinstr, showing planner time on EXPLAIN
Next
From: Robert Haas
Date:
Subject: Re: ALTER TABLE lock strength reduction patch is unsafe