Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files - Mailing list pgsql-hackers
From | Stephen Frost |
---|---|
Subject | Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files |
Date | |
Msg-id | 20180125123837.GH2416@tamriel.snowman.net Whole thread Raw |
In response to | Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files (Michael Paquier <michael.paquier@gmail.com>) |
Responses |
Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files
|
List | pgsql-hackers |
Michael, all, * Michael Paquier (michael.paquier@gmail.com) wrote: > On Wed, Jan 24, 2018 at 12:43:51PM -0500, Stephen Frost wrote: > > * chenhj (chjischj@163.com) wrote: > >> At 2018-01-23 09:56:48, "Stephen Frost" <sfrost@snowman.net> wrote: > >>> I've only read through the thread to try and understand what's going on > >>> and the first thing that comes to mind is that you're changing > >>> pg_rewind to not remove the WAL from before the divergence (split) > >>> point, but I'm not sure why. As noted, that WAL isn't needed for > >>> anything (it's from before the split, after all), so why keep it? Is > >>> there something in this optimization that depends on the old WAL being > >>> there and, if so, what and why? > >> > >> After run pg_rewind, the first startup of postgres will do crash recovery. > >> And crash recovery will begin from the previous redo point preceding the divergence. > >> So, the WAL after the redo point and before the divergence is needed. > > > > Right. > > Most of the time, and particularly since v11 has removed the need to > retain more past segments than one completed checkpoint, those segments > have less chances to be on the source server, limiting more the impact > of the patch discussed on this thread. Good point. > >>> That's also different from how pg_basebackup works, which I don't think > >>> is good (seems like pg_rewind should operate in a pretty similar manner > >>> to pg_basebackup). > >> > >> Thanks for your comments! > >> I also considered copy WAL just like how pg_basebackup does,but a > >> implement similar to pg_basebackup's manner may be not so simple. > > > > Using the replication protocol to fetch WAL would be a good thing to do > > (actually, making pg_rewind entirely work through a connection to the > > current primary would be great) but that's independent of what I'm > > asking for here. Here I'm just suggesting that we not change what > > pg_rewind is doing today when it comes to the existing WAL on the > > old-primary. > > Yes, superuser is necessary now, if we could get to a point where only a > replication permission is needed that would be nice. Now we could do > things differently. We could have a system role dedicated to pg_rewind > which works only on the functions from genfile.c that pg_rewind needs, > in order to leverage the need of a superuser. Actually, the other work I'm doing nearby wrt removing the explicit superuser() checks in those functions would allow a non-superuser role to be created which could be used by pg_rewind. We could even add an explicit 'pg_rewind' default role if we wanted to, but is that the route we want to go with this or should we be thinking about changing pg_rewind to use the replication protocol and a replication user instead..? The only reason that it doesn't today is that there isn't an easy way for it to do so with the existing replication protocol, as I understand it. Then again, there's this whole question of if we should even keep the replication protocol or if we should be getting rid of it in favor of have regular connections that can support replication. There's been discussion of that recently, as I recall, though I can't remember where. > >> And the WAL which contains the previous redo point preceding the > >> divergence may be only exists in target server and had been recycled > >> in source. That's different between pg_rewind and pg_basebackup. > > > > Hm, pg_rewind was removing that and expecting it to be on the new > > primary? If that's the case then I could see an argument for keeping > > WAL that's from the divergence point onward, but I still don't think > > we should have pg_rewind just leave all of the prior WAL in place. > > Another thing that we could as well do is simply not fetching any WAL > files at all during a rewind, then let the startup process of the > rewound server decide by itself what it needs. This would leverage the > data transfered in all cases. It is easy to define the start point of > WAL segments needed for a rewound server because the last checkpoint > record before WAL forked is calculated before transferring any data. The > finish point cannot be exact though because you don't know up to which > point you should transfer it. In some ways, this is close to a base > backup. We could as well define an end point to minimize the amount of > WAL as the last completed segment before data transfer begins, but then > you need to worry about WAL segment holes and such. At the end of the > day, just not transferring any data from pg_wal looks more solid to me > as a first step if we need to worry about data that is transferred but > finishes by being useless. Having the rewound server decide by itself what it needs actually sounds like it would be better and would allow whatever the restore command is to fetch any missing WAL necessary (which it might be able to do more efficiently, from an archive server and possibly even in parallel as compared to pg_rewind doing it serially from the primary...). We don't have any particularly easy way to prevent needed WAL from disappearing off of the primary anyway, so it seems like it would be necessary to have a working restore command for pg_rewind to be reliable. While we could create a physical replication slot for pg_rewind when it starts, that may already be too late. Thanks! Stephen
Attachment
pgsql-hackers by date: