Re: Making pg_rewind faster - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Making pg_rewind faster
Date
Msg-id CA+TgmoZkLC_oZifggwfTcZ7JhVJSK2+BUMfQPJ03VGhxsB=yww@mail.gmail.com
Whole thread Raw
In response to Re: Making pg_rewind faster  (John H <johnhyvr@gmail.com>)
List pgsql-hackers
On Mon, Jul 7, 2025 at 2:46 PM John H <johnhyvr@gmail.com> wrote:
> I've created the commitfest entry here:
> https://commitfest.postgresql.org/patch/5902/
>
> I realized I wasn't testing properly in 0006. I've updated the latest
> one to move it into a separate test file to make the results cleaner.

Thanks to everyone who has worked on this patch. I think this is a
real problem about which we ought to try to do something. I think you
could even defend calling this a back-patchable bug fix, since copying
a lot of files that we know have to be identical is a serious
efficiency issue that can affect the usability of pg_upgrade. There
are more aggressive things that we could try to do here that might be
even better, e.g. we could copy just the WAL segment or segments that
contain the last checkpoint record and let the server fetch the rest
as needed after it starts up. However, that would be a definitional
change, which we certainly wouldn't want to back-patch, and which
might have downsides for some people, so perhaps it would need to be
optional behavior. But the patch as proposed is just an optimization
that shouldn't change the outcome for anyone. Perhaps it's still best
to treat it as a master-only improvement, but calling it a fix for a
performance bug isn't completely crazy either, IMHO.

Moving on to the patch contents, the approach taken makes sense to me.
Any WAL files that exist in both source and target from prior to the
point of divergence should not need to be copied, because they should
necessarily be identical. If anyone sees a flaw in this reasoning,
please point it out! While I agree with the basic approach, I think
that the patch could be hardened in two ways. First, isRelDataFile
tests the complete pathname to decide whether it has a relation file,
but getFileContentType only tests the final component of the pathname.
I think it should test the whole pathname also, in case there's a
stray file with a file name that looks like a WAL file in some other
directory. In fact, I think that these two functions should be
integrated. Let's rename the isRelDataFile() to getFileContentType()
and put all the logic in that function, including appropriate tests of
the path name. Second, in decide_file_action(), I think we should
check that entry->target_size == entry->source_size and return
FILE_ACTION_COPY if not. This case really shouldn't happen, but it's
very cheap to check and might offer a bit of protection in some weird
scenario.

I am not a huge fan of the way that the patch stuffs last_common_segno
into a global variable that is then accessed by
decide_wal_file_action. I think it would be better to find a way to
pass this information down through the call stack, either as a
separate argument or as part of some struct we're already passing, if
there's a reasonable and not-too-invasive way to do that. If it looks
too ugly, maybe this is the best option, but I think we should try to
find something better.

It does not appear to me that the test case tests what it purports to
test, and I don't think that what it purports to test is the right
thing anyway. It claims that it is testing that a certain WAL file
(which it erroneously calls a "WAL file entry") is not copied to the
target, but I see no logic to actually check this. I also don't think
it's the correct behavior. I think what the patch is doing and should
do is ensure that we don't copy the file when it's already present. If
the file isn't present in the target cluster, it should still be
copied. Granted, I haven't verified that this is the actual behavior,
but the "Handle cases where the file is missing from one of the
systems" code in decide_file_actiomn() seems to be higher up than the
code the patch changes, so I assume that logic should be applied
first. If that's correct, I don't quite know what a test case here can
usefully verify, since the only difference would be performance, but
maybe I'm misunderstanding something?

As an administrative note, the naming convention for the patches
posted to the thread is not consistent and not correct. Use "git
format-patch -v$VERSION" to generate patches. In a name like
v7-0003-do-something.patch, "v7" is the version of the patch set as a
whole, and "0003" identifies a specific patch within that version of
the patch set. In other words, that would be the third patch in the
series from the seventh time that the patch set was posted. This patch
set so far contains only one patch, so the file name should be
vX-0001-whatever.patch, where whatever is the subject of the patch and
X is a number that increases by one every time it's reposted. Let git
format-patch do the work for you, though.

On a related note, something a committer would need to do if they
wanted to commit this patch is figure out who should be credited with
writing it. I see that a number of different people have posted
versions of the patch; it would be nice if "Author:" or
"Co-authored-by:" tags were added to the commit message to reflect
whose code is present in a given version.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: delimiter inconsistency in generate-wait_event_types.pl
Next
From: Tom Lane
Date:
Subject: Avoiding roundoff error in pg_sleep()