Re: Making pg_rewind faster - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Making pg_rewind faster |
Date | |
Msg-id | CA+TgmoZkLC_oZifggwfTcZ7JhVJSK2+BUMfQPJ03VGhxsB=yww@mail.gmail.com Whole thread Raw |
In response to | Re: Making pg_rewind faster (John H <johnhyvr@gmail.com>) |
List | pgsql-hackers |
On Mon, Jul 7, 2025 at 2:46 PM John H <johnhyvr@gmail.com> wrote: > I've created the commitfest entry here: > https://commitfest.postgresql.org/patch/5902/ > > I realized I wasn't testing properly in 0006. I've updated the latest > one to move it into a separate test file to make the results cleaner. Thanks to everyone who has worked on this patch. I think this is a real problem about which we ought to try to do something. I think you could even defend calling this a back-patchable bug fix, since copying a lot of files that we know have to be identical is a serious efficiency issue that can affect the usability of pg_upgrade. There are more aggressive things that we could try to do here that might be even better, e.g. we could copy just the WAL segment or segments that contain the last checkpoint record and let the server fetch the rest as needed after it starts up. However, that would be a definitional change, which we certainly wouldn't want to back-patch, and which might have downsides for some people, so perhaps it would need to be optional behavior. But the patch as proposed is just an optimization that shouldn't change the outcome for anyone. Perhaps it's still best to treat it as a master-only improvement, but calling it a fix for a performance bug isn't completely crazy either, IMHO. Moving on to the patch contents, the approach taken makes sense to me. Any WAL files that exist in both source and target from prior to the point of divergence should not need to be copied, because they should necessarily be identical. If anyone sees a flaw in this reasoning, please point it out! While I agree with the basic approach, I think that the patch could be hardened in two ways. First, isRelDataFile tests the complete pathname to decide whether it has a relation file, but getFileContentType only tests the final component of the pathname. I think it should test the whole pathname also, in case there's a stray file with a file name that looks like a WAL file in some other directory. In fact, I think that these two functions should be integrated. Let's rename the isRelDataFile() to getFileContentType() and put all the logic in that function, including appropriate tests of the path name. Second, in decide_file_action(), I think we should check that entry->target_size == entry->source_size and return FILE_ACTION_COPY if not. This case really shouldn't happen, but it's very cheap to check and might offer a bit of protection in some weird scenario. I am not a huge fan of the way that the patch stuffs last_common_segno into a global variable that is then accessed by decide_wal_file_action. I think it would be better to find a way to pass this information down through the call stack, either as a separate argument or as part of some struct we're already passing, if there's a reasonable and not-too-invasive way to do that. If it looks too ugly, maybe this is the best option, but I think we should try to find something better. It does not appear to me that the test case tests what it purports to test, and I don't think that what it purports to test is the right thing anyway. It claims that it is testing that a certain WAL file (which it erroneously calls a "WAL file entry") is not copied to the target, but I see no logic to actually check this. I also don't think it's the correct behavior. I think what the patch is doing and should do is ensure that we don't copy the file when it's already present. If the file isn't present in the target cluster, it should still be copied. Granted, I haven't verified that this is the actual behavior, but the "Handle cases where the file is missing from one of the systems" code in decide_file_actiomn() seems to be higher up than the code the patch changes, so I assume that logic should be applied first. If that's correct, I don't quite know what a test case here can usefully verify, since the only difference would be performance, but maybe I'm misunderstanding something? As an administrative note, the naming convention for the patches posted to the thread is not consistent and not correct. Use "git format-patch -v$VERSION" to generate patches. In a name like v7-0003-do-something.patch, "v7" is the version of the patch set as a whole, and "0003" identifies a specific patch within that version of the patch set. In other words, that would be the third patch in the series from the seventh time that the patch set was posted. This patch set so far contains only one patch, so the file name should be vX-0001-whatever.patch, where whatever is the subject of the patch and X is a number that increases by one every time it's reposted. Let git format-patch do the work for you, though. On a related note, something a committer would need to do if they wanted to commit this patch is figure out who should be credited with writing it. I see that a number of different people have posted versions of the patch; it would be nice if "Author:" or "Co-authored-by:" tags were added to the commit message to reflect whose code is present in a given version. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: