Re: time-delayed standbys - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: time-delayed standbys
Date
Msg-id BANLkTikFhVOhmf_XKERe=MkU3Ht_K7posQ@mail.gmail.com
Whole thread Raw
In response to Re: time-delayed standbys  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: time-delayed standbys
Re: time-delayed standbys
Re: time-delayed standbys
Re: time-delayed standbys
List pgsql-hackers
On Thu, Apr 21, 2011 at 12:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Apr 20, 2011 at 11:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> I am a bit concerned about the reliability of this approach.  If there
>>> is some network lag, or some lag in processing from the master, we
>>> could easily get the idea that there is time skew between the machines
>>> when there really isn't.  And our perception of the time skew could
>>> easily bounce around from message to message, as the lag varies.  I
>>> think it would be tremendously ironic of the two machines were
>>> actually synchronized to the microsecond, but by trying to be clever
>>> about it we managed to make the lag-time accurate only to within
>>> several seconds.
>>
>> Well, if walreceiver concludes that there is no more than a few seconds'
>> difference between the clocks, it'd probably be OK to take the master
>> timestamps at face value.  The problem comes when the skew gets large
>> (compared to the configured time delay, I guess).
>
> I suppose.  Any bound on how much lag there can be before we start
> applying to skew correction is going to be fairly arbitrary.

When the replication connection is terminated, the standby tries to read
WAL files from the archive. In this case, there is no walreceiver process,
so how does the standby calculate the clock difference?

> errmsg("parameter \"%s\" requires a temporal value", "recovery_time_delay"),

We should s/"a temporal"/"an Integer"?

After we run "pg_ctl promote", time-delayed replication should be disabled?
Otherwise, failover might take very long time when we set recovery_time_delay
to high value.

http://forge.mysql.com/worklog/task.php?id=344
According to the above page, one purpose of time-delayed replication is to
protect against user mistakes on master. But, when an user notices his wrong
operation on master, what should he do next? The WAL records of his wrong
operation might have already arrived at the standby, so neither "promote" nor
"restart" doesn't cancel that wrong operation. Instead, probably he should
shutdown the standby, investigate the timestamp of XID of the operation
he'd like to cancel, set recovery_target_time and restart the standby.
Something like this procedures should be documented? Or, we should
implement new "promote" mode which finishes a recovery as soon as
"promote" is requested (i.e., not replay all the available WAL records)?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgsql-hackers by date:

Previous
From: Jesper Krogh
Date:
Subject: Re: pg_upgrade using appname to lock out other users
Next
From: Jaime Casanova
Date:
Subject: Re: creating CHECK constraints as NOT VALID