Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files |
Date | |
Msg-id | 20180125010513.GC23081@paquier.xyz Whole thread Raw |
In response to | Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files (Stephen Frost <sfrost@snowman.net>) |
Responses |
Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files
|
List | pgsql-hackers |
On Wed, Jan 24, 2018 at 12:43:51PM -0500, Stephen Frost wrote: > * chenhj (chjischj@163.com) wrote: >> At 2018-01-23 09:56:48, "Stephen Frost" <sfrost@snowman.net> wrote: >>> I've only read through the thread to try and understand what's going on >>> and the first thing that comes to mind is that you're changing >>> pg_rewind to not remove the WAL from before the divergence (split) >>> point, but I'm not sure why. As noted, that WAL isn't needed for >>> anything (it's from before the split, after all), so why keep it? Is >>> there something in this optimization that depends on the old WAL being >>> there and, if so, what and why? >> >> After run pg_rewind, the first startup of postgres will do crash recovery. >> And crash recovery will begin from the previous redo point preceding the divergence. >> So, the WAL after the redo point and before the divergence is needed. > > Right. Most of the time, and particularly since v11 has removed the need to retain more past segments than one completed checkpoint, those segments have less chances to be on the source server, limiting more the impact of the patch discussed on this thread. >> Of course, the WAL before the redo point is not needed, but in my point of view, >> recycling unwanted WAL does not have to be done by pg_rewind. > > That's what pg_rewind has been doing though, isn't it? And it's not > like that WAL is useful for anything, is it? That's also how > pg_basebackup works. As of HEAD, pg_rewind handles data in pg_wal similarly to other paths which are not relation files: the files from the source are just blindly copied to the target. After the rewind and once recovery begins, we just let the startup process do the cleanup instead of pg_rewind. Regarding that pg_basebackup is different, you get the choice to do what you want using --wal-method, and you actually get the segments that you only need to get a self-contained base backup. >>> That's also different from how pg_basebackup works, which I don't think >>> is good (seems like pg_rewind should operate in a pretty similar manner >>> to pg_basebackup). >> >> Thanks for your comments! >> I also considered copy WAL just like how pg_basebackup does,but a >> implement similar to pg_basebackup's manner may be not so simple. > > Using the replication protocol to fetch WAL would be a good thing to do > (actually, making pg_rewind entirely work through a connection to the > current primary would be great) but that's independent of what I'm > asking for here. Here I'm just suggesting that we not change what > pg_rewind is doing today when it comes to the existing WAL on the > old-primary. Yes, superuser is necessary now, if we could get to a point where only a replication permission is needed that would be nice. Now we could do things differently. We could have a system role dedicated to pg_rewind which works only on the functions from genfile.c that pg_rewind needs, in order to leverage the need of a superuser. >> And the WAL which contains the previous redo point preceding the >> divergence may be only exists in target server and had been recycled >> in source. That's different between pg_rewind and pg_basebackup. > > Hm, pg_rewind was removing that and expecting it to be on the new > primary? If that's the case then I could see an argument for keeping > WAL that's from the divergence point onward, but I still don't think > we should have pg_rewind just leave all of the prior WAL in place. Another thing that we could as well do is simply not fetching any WAL files at all during a rewind, then let the startup process of the rewound server decide by itself what it needs. This would leverage the data transfered in all cases. It is easy to define the start point of WAL segments needed for a rewound server because the last checkpoint record before WAL forked is calculated before transferring any data. The finish point cannot be exact though because you don't know up to which point you should transfer it. In some ways, this is close to a base backup. We could as well define an end point to minimize the amount of WAL as the last completed segment before data transfer begins, but then you need to worry about WAL segment holes and such. At the end of the day, just not transferring any data from pg_wal looks more solid to me as a first step if we need to worry about data that is transferred but finishes by being useless. -- Michael
Attachment
pgsql-hackers by date: