Re: pg_rewind failure by file deletion in source server - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: pg_rewind failure by file deletion in source server
Date
Msg-id 55BD1788.3090803@iki.fi
Whole thread Raw
In response to Re: pg_rewind failure by file deletion in source server  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: pg_rewind failure by file deletion in source server  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On 07/17/2015 06:28 AM, Michael Paquier wrote:
> On Wed, Jul 1, 2015 at 9:31 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Wed, Jul 1, 2015 at 2:21 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>> On 06/29/2015 09:44 AM, Michael Paquier wrote:
>>>>
>>>> On Mon, Jun 29, 2015 at 4:55 AM, Heikki Linnakangas wrote:
>>>>>
>>>>> But we'll still need to handle the pg_xlog symlink case somehow. Perhaps
>>>>> it
>>>>> would be enough to special-case pg_xlog for now.
>>>>
>>>>
>>>> Well, sure, pg_rewind does not copy the soft links either way. Now it
>>>> would be nice to have an option to be able to recreate the soft link
>>>> of at least pg_xlog even if it can be scripted as well after a run.
>>>
>>> Hmm. I'm starting to think that pg_rewind should ignore pg_xlog entirely. In
>>> any non-trivial scenarios, just copying all the files from pg_xlog isn't
>>> enough anyway, and you need to set up a recovery.conf after running
>>> pg_rewind that contains a restore_command or primary_conninfo, to fetch the
>>> WAL. So you can argue that by not copying pg_xlog automatically, we're
>>> actually doing a favour to the DBA, by forcing him to set up the
>>> recovery.conf file correctly. Because if you just test simple scenarios
>>> where not much time has passed between the failover and running pg_rewind,
>>> it might be enough to just copy all the WAL currently in pg_xlog, but it
>>> would not be enough if more time had passed and not all the required WAL is
>>> present in pg_xlog anymore.  And by not copying the WAL, we can avoid some
>>> copying, as restore_command or streaming replication will only copy what's
>>> needed, while pg_rewind would copy all WAL it can find the target's data
>>> directory.
>>>
>>> pg_basebackup also doesn't include any WAL, unless you pass the --xlog
>>> option. It would be nice to also add an optional --xlog option to pg_rewind,
>>> but with pg_rewind it's possible that all the required WAL isn't present in
>>> the pg_xlog directory anymore, so you wouldn't always achieve the same
>>> effect of making the backup self-contained.
>>>
>>> So, I propose the attached. It makes pg_rewind ignore the pg_xlog directory
>>> in both the source and the target.
>>
>> If pg_xlog is simply ignored, some old WAL files may remain in target server.
>> Don't these old files cause the subsequent startup of target server as new
>> standby to fail? That is, it's the case where the WAL file with the same name
>> but different content exist both in target and source. If that's harmfull,
>> pg_rewind also should remove the files in pg_xlog of target server.
>
> This would reduce usability. The rewound node will replay WAL from the
> previous checkpoint where WAL forked up to the minimum recovery point
> of source node where pg_rewind has been run. Hence if we remove
> completely the contents of pg_xlog we'd lose a portion of the logs
> that need to be replayed until timeline is switched on the rewound
> node when recovering it (while streaming from the promoted standby,
> whatever). I don't really see why recycled segments would be a
> problem, as that's perhaps what you are referring to, but perhaps I am
> missing something.

Hmm. My thinking was that you need to set up restore_command or 
primary_conninfo anyway, to fetch the old WAL, so there's no need to 
copy any WAL. But there's a problem with that: you might have WAL files 
in the source server that haven't been archived yet, and you need them 
to recover the rewound target node. That's OK for libpq mode, I think as 
the server is still running and presumably and you can fetch the WAL 
with streaming replication, but for copy-mode, that's not a good 
assumption. You might be relying on a WAL archive, and the file might 
not be archived yet.

Perhaps it's best if we copy all the WAL files from source in copy-mode, 
but not in libpq mode. Regarding old WAL files in the target, it's 
probably best to always leave them alone. They should do no harm, and as 
a general principle it's best to avoid destroying evidence.

It'd be nice to get some fix for this for alpha2, so I'll commit a fix 
to do that on Monday, unless we come to a different conclusion before that.

- Heikki



pgsql-hackers by date:

Previous
From: Andreas Seltenreich
Date:
Subject: Re: [sqlsmith] Failed assertion in joinrels.c
Next
From: Piotr Stefaniak
Date:
Subject: Null pointer passed as source to memcpy() in numeric.c:make_result() and numeric:set_var_from_var()