Re: pg_rewind failure by file deletion in source server - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: pg_rewind failure by file deletion in source server |
Date | |
Msg-id | CAHGQGwE-ZzSWszxcTQ7MQp6Dv3OQvdTcpFqS+3Xpr5PH3MU8+A@mail.gmail.com Whole thread Raw |
In response to | Re: pg_rewind failure by file deletion in source server (Michael Paquier <michael.paquier@gmail.com>) |
List | pgsql-hackers |
On Fri, Jul 17, 2015 at 12:28 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Wed, Jul 1, 2015 at 9:31 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Wed, Jul 1, 2015 at 2:21 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: >>> On 06/29/2015 09:44 AM, Michael Paquier wrote: >>>> >>>> On Mon, Jun 29, 2015 at 4:55 AM, Heikki Linnakangas wrote: >>>>> >>>>> But we'll still need to handle the pg_xlog symlink case somehow. Perhaps >>>>> it >>>>> would be enough to special-case pg_xlog for now. >>>> >>>> >>>> Well, sure, pg_rewind does not copy the soft links either way. Now it >>>> would be nice to have an option to be able to recreate the soft link >>>> of at least pg_xlog even if it can be scripted as well after a run. >>> >>> >>> Hmm. I'm starting to think that pg_rewind should ignore pg_xlog entirely. In >>> any non-trivial scenarios, just copying all the files from pg_xlog isn't >>> enough anyway, and you need to set up a recovery.conf after running >>> pg_rewind that contains a restore_command or primary_conninfo, to fetch the >>> WAL. So you can argue that by not copying pg_xlog automatically, we're >>> actually doing a favour to the DBA, by forcing him to set up the >>> recovery.conf file correctly. Because if you just test simple scenarios >>> where not much time has passed between the failover and running pg_rewind, >>> it might be enough to just copy all the WAL currently in pg_xlog, but it >>> would not be enough if more time had passed and not all the required WAL is >>> present in pg_xlog anymore. And by not copying the WAL, we can avoid some >>> copying, as restore_command or streaming replication will only copy what's >>> needed, while pg_rewind would copy all WAL it can find the target's data >>> directory. >>> >>> pg_basebackup also doesn't include any WAL, unless you pass the --xlog >>> option. It would be nice to also add an optional --xlog option to pg_rewind, >>> but with pg_rewind it's possible that all the required WAL isn't present in >>> the pg_xlog directory anymore, so you wouldn't always achieve the same >>> effect of making the backup self-contained. >>> >>> So, I propose the attached. It makes pg_rewind ignore the pg_xlog directory >>> in both the source and the target. >> >> If pg_xlog is simply ignored, some old WAL files may remain in target server. >> Don't these old files cause the subsequent startup of target server as new >> standby to fail? That is, it's the case where the WAL file with the same name >> but different content exist both in target and source. If that's harmfull, >> pg_rewind also should remove the files in pg_xlog of target server. > > This would reduce usability. The rewound node will replay WAL from the > previous checkpoint where WAL forked up to the minimum recovery point > of source node where pg_rewind has been run. Hence if we remove > completely the contents of pg_xlog we'd lose a portion of the logs > that need to be replayed until timeline is switched on the rewound > node when recovering it (while streaming from the promoted standby, > whatever). Even if we remove the WAL files in *target server", we don't lose any files in *source server" that we will need to replay later. > I don't really see why recycled segments would be a > problem, as that's perhaps what you are referring to, but perhaps I am > missing something. Please imagine the case where the WAL files with the same name were created in both servers after the fork. Their contents may be different. After pg_rewind is executed successfully, the rewound server (i.e., target server) should retrieve and replay that WAL file from the *source* server. But the problem is that the rewound server tries to replay the WAL file from its local since the file exists locally (even if primary_conninfo is specified). Regards, -- Fujii Masao
pgsql-hackers by date: