Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files
Date
Msg-id 20180125010513.GC23081@paquier.xyz
Whole thread Raw
In response to Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files  (Stephen Frost <sfrost@snowman.net>)
Responses Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files
List pgsql-hackers
On Wed, Jan 24, 2018 at 12:43:51PM -0500, Stephen Frost wrote:
> * chenhj (chjischj@163.com) wrote:
>> At 2018-01-23 09:56:48, "Stephen Frost" <sfrost@snowman.net> wrote:
>>> I've only read through the thread to try and understand what's going on
>>> and the first thing that comes to mind is that you're changing
>>> pg_rewind to not remove the WAL from before the divergence (split)
>>> point, but I'm not sure why.  As noted, that WAL isn't needed for
>>> anything (it's from before the split, after all), so why keep it?  Is
>>> there something in this optimization that depends on the old WAL being
>>> there and, if so, what and why?
>>
>> After run pg_rewind, the first startup of postgres will do crash recovery.
>> And crash recovery will begin from the previous redo point preceding the divergence.
>> So, the WAL after the redo point and before the divergence is needed.
>
> Right.

Most of the time, and particularly since v11 has removed the need to
retain more past segments than one completed checkpoint, those segments
have less chances to be on the source server, limiting more the impact
of the patch discussed on this thread.

>> Of course, the WAL before the redo point is not needed, but in my point of view,
>> recycling unwanted WAL does not have to be done by pg_rewind.
>
> That's what pg_rewind has been doing though, isn't it?  And it's not
> like that WAL is useful for anything, is it?  That's also how
> pg_basebackup works.

As of HEAD, pg_rewind handles data in pg_wal similarly to other
paths which are not relation files: the files from the source are just
blindly copied to the target. After the rewind and once recovery begins,
we just let the startup process do the cleanup instead of
pg_rewind. Regarding that pg_basebackup is different, you get the choice
to do what you want using --wal-method, and you actually get the
segments that you only need to get a self-contained base backup.

>>> That's also different from how pg_basebackup works, which I don't think
>>> is good (seems like pg_rewind should operate in a pretty similar manner
>>> to pg_basebackup).
>>
>> Thanks for your comments!
>> I also considered copy WAL just like how pg_basebackup does,but a
>> implement similar to pg_basebackup's manner may be not so simple.
>
> Using the replication protocol to fetch WAL would be a good thing to do
> (actually, making pg_rewind entirely work through a connection to the
> current primary would be great) but that's independent of what I'm
> asking for here.  Here I'm just suggesting that we not change what
> pg_rewind is doing today when it comes to the existing WAL on the
> old-primary.

Yes, superuser is necessary now, if we could get to a point where only a
replication permission is needed that would be nice. Now we could do
things differently. We could have a system role dedicated to pg_rewind
which works only on the functions from genfile.c that pg_rewind needs,
in order to leverage the need of a superuser.

>> And the WAL which contains the previous redo point preceding the
>> divergence may be only exists in target server and had been recycled
>> in source. That's different between pg_rewind and pg_basebackup.
>
> Hm, pg_rewind was removing that and expecting it to be on the new
> primary?  If that's the case then I could see an argument for keeping
> WAL that's from the divergence point onward, but I still don't think
> we should have pg_rewind just leave all of the prior WAL in place.

Another thing that we could as well do is simply not fetching any WAL
files at all during a rewind, then let the startup process of the
rewound server decide by itself what it needs. This would leverage the
data transfered in all cases. It is easy to define the start point of
WAL segments needed for a rewound server because the last checkpoint
record before WAL forked is calculated before transferring any data. The
finish point cannot be exact though because you don't know up to which
point you should transfer it. In some ways, this is close to a base
backup. We could as well define an end point to minimize the amount of
WAL as the last completed segment before data transfer begins, but then
you need to worry about WAL segment holes and such. At the end of the
day, just not transferring any data from pg_wal looks more solid to me
as a first step if we need to worry about data that is transferred but
finishes by being useless.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: copy.c allocation constant
Next
From: Michael Paquier
Date:
Subject: Re: [HACKERS] Optional message to user when terminating/cancellingbackend