Re: pg_rewind failure by file deletion in source server - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: pg_rewind failure by file deletion in source server
Date
Msg-id CAB7nPqSvt7HHRouhE2n8y2gFRi6KuVJKiqJO3NzYYVpXMwTC6g@mail.gmail.com
Whole thread Raw
In response to Re: pg_rewind failure by file deletion in source server  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: pg_rewind failure by file deletion in source server
List pgsql-hackers
On Fri, Jun 12, 2015 at 3:50 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Fri, Jun 12, 2015 at 3:17 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Thu, Jun 11, 2015 at 5:48 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> On Thu, Jun 11, 2015 at 2:14 PM, Michael Paquier
>>> <michael.paquier@gmail.com> wrote:
>>>> On Thu, Jun 11, 2015 at 1:51 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>>> Shouldn't pg_rewind ignore that failure of operation? If the file is not
>>>>> found in source server, the file doesn't need to be copied to destination
>>>>> server obviously. So ISTM that pg_rewind safely can skip copying that file.
>>>>> Thought?
>>>>
>>>> I think that you should fail. Let's imagine that the master to be
>>>> rewound has removed a relation file before being stopped cleanly after
>>>> its standby has been promoted that was here at the last checkpoint
>>>> before forking, and that the standby still has the relation file after
>>>> promotion. You should be able to copy it to be able to replay WAL on
>>>> it. If the standby has removed a file in the file map after taking the
>>>> file map, I guess that the best thing to do is fail because the file
>>>> that should be here for the rewound node cannot be fetched.
>>>
>>> In this case, why do you think that the file should exist in the old master?
>>> Even if it doesn't exist, ISTM that the old master can safely replay the WAL
>>> records related to the file when it restarts. So what's the problem
>>> if the file doesn't exist in the old master?
>>
>> Well, some user may want to rewind the master down to the point where
>> WAL forked, and then recover it immediately when a consistent point is
>> reached just at restart instead of replugging it into the cluster. In
>> this case I think that you need the relation file of the dropped
>> relation to get a consistent state. That's still cheaper than
>> recreating a node from a fresh base backup in some cases, particularly
>> if the last base backup taken is far in the past for this cluster.
>
> So it's the case where a user wants to recover old master up to the point
> BEFORE the file in question is deleted in new master. At that point,
> since the file must exist, pg_rewind should fail if the file cannot be copied
> from new master. Is my understanding right?

Yep. We are on the same line.

> As far as I read the code of pg_rewind, ISTM that your scenario never happens.
> Because pg_rewind sets the minimum recovery point to the latest WAL location
> in new master, i.e., AFTER the file is deleted. So old master cannot stop
> recovering before the file is deleted in new master. If the recovery stops
> at that point, it fails because the minimum recovery point is not reached yet.
>
> IOW, after pg_rewind runs, the old master has to replay the WAL records
> which were generated by the deletion of the file in the new master.
> So it's okay if the old master doesn't have the file after pg_rewind runs,
> I think.

Ah, right. I withdraw, indeed what I thought can not happen:       /*        * Update control file of target. Make it
readyto perform archive        * recovery when restarting.        *        * minRecoveryPoint is set to the current WAL
insertlocation in the        * source server. Like in an online backup, it's important
 
that we recover        * all the WAL that was generated while we copied the files over.        */
So a rewound node will replay WAL up to the current insert location of
the source, and will fail at recovery if recovery target is older than
this insert location..

You want to draft a patch? Should I? I think that we should have a
test case as well in pg_rewind/t/.
-- 
Michael



pgsql-hackers by date:

Previous
From: Abhijit Menon-Sen
Date:
Subject: Re: skipping pg_log in basebackup (was Re: pg_basebackup and pg_stat_tmp directory)
Next
From: Michael Paquier
Date:
Subject: Re: Why does replication need the old history file?