Re: pg_rewind failure by file deletion in source server - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: pg_rewind failure by file deletion in source server
Date
Msg-id CAHGQGwE-ZzSWszxcTQ7MQp6Dv3OQvdTcpFqS+3Xpr5PH3MU8+A@mail.gmail.com
Whole thread Raw
In response to Re: pg_rewind failure by file deletion in source server  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On Fri, Jul 17, 2015 at 12:28 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Jul 1, 2015 at 9:31 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Wed, Jul 1, 2015 at 2:21 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>> On 06/29/2015 09:44 AM, Michael Paquier wrote:
>>>>
>>>> On Mon, Jun 29, 2015 at 4:55 AM, Heikki Linnakangas wrote:
>>>>>
>>>>> But we'll still need to handle the pg_xlog symlink case somehow. Perhaps
>>>>> it
>>>>> would be enough to special-case pg_xlog for now.
>>>>
>>>>
>>>> Well, sure, pg_rewind does not copy the soft links either way. Now it
>>>> would be nice to have an option to be able to recreate the soft link
>>>> of at least pg_xlog even if it can be scripted as well after a run.
>>>
>>>
>>> Hmm. I'm starting to think that pg_rewind should ignore pg_xlog entirely. In
>>> any non-trivial scenarios, just copying all the files from pg_xlog isn't
>>> enough anyway, and you need to set up a recovery.conf after running
>>> pg_rewind that contains a restore_command or primary_conninfo, to fetch the
>>> WAL. So you can argue that by not copying pg_xlog automatically, we're
>>> actually doing a favour to the DBA, by forcing him to set up the
>>> recovery.conf file correctly. Because if you just test simple scenarios
>>> where not much time has passed between the failover and running pg_rewind,
>>> it might be enough to just copy all the WAL currently in pg_xlog, but it
>>> would not be enough if more time had passed and not all the required WAL is
>>> present in pg_xlog anymore.  And by not copying the WAL, we can avoid some
>>> copying, as restore_command or streaming replication will only copy what's
>>> needed, while pg_rewind would copy all WAL it can find the target's data
>>> directory.
>>>
>>> pg_basebackup also doesn't include any WAL, unless you pass the --xlog
>>> option. It would be nice to also add an optional --xlog option to pg_rewind,
>>> but with pg_rewind it's possible that all the required WAL isn't present in
>>> the pg_xlog directory anymore, so you wouldn't always achieve the same
>>> effect of making the backup self-contained.
>>>
>>> So, I propose the attached. It makes pg_rewind ignore the pg_xlog directory
>>> in both the source and the target.
>>
>> If pg_xlog is simply ignored, some old WAL files may remain in target server.
>> Don't these old files cause the subsequent startup of target server as new
>> standby to fail? That is, it's the case where the WAL file with the same name
>> but different content exist both in target and source. If that's harmfull,
>> pg_rewind also should remove the files in pg_xlog of target server.
>
> This would reduce usability. The rewound node will replay WAL from the
> previous checkpoint where WAL forked up to the minimum recovery point
> of source node where pg_rewind has been run. Hence if we remove
> completely the contents of pg_xlog we'd lose a portion of the logs
> that need to be replayed until timeline is switched on the rewound
> node when recovering it (while streaming from the promoted standby,
> whatever).

Even if we remove the WAL files in *target server", we don't lose any
files in *source server" that we will need to replay later.

> I don't really see why recycled segments would be a
> problem, as that's perhaps what you are referring to, but perhaps I am
> missing something.

Please imagine the case where the WAL files with the same name were
created in both servers after the fork. Their contents may be different.
After pg_rewind is executed successfully, the rewound server
(i.e., target server) should retrieve and replay that WAL file from
the *source* server. But the problem is that the rewound server tries to
replay the WAL file from its local since the file exists locally (even
if primary_conninfo is specified).

Regards,

-- 
Fujii Masao



pgsql-hackers by date:

Previous
From: Ildus Kurbangaliev
Date:
Subject: Re: RFC: replace pg_stat_activity.waiting with something more descriptive
Next
From: Stephen Frost
Date:
Subject: Re: RLS restrictive hook policies