Thread: Making pg_rewind faster
Hi Hackers,
I have been using pg_rewind in production for 2 years. One of the things that I noticed in pg_rewind is if it doesn't know what to do with a file "it copies". I understand it's the more safer option. After all, the alternative, pg_basebackup copies all the files from source to target.
However, this is making pg_rewind inefficient when we have a high number of WAL files. Majority of the data (in most of my cases 95%+) that it copies are WAL files which are anyway same between the source and target. Skipping those same WAL files from copying will improve the speed of pg_rewind a lot.
1. Does pg_rewind need to copy WAL files before the WAL that contains the last common check point?
Heikki's presentation https://pgsessions.com/assets/archives/pg_rewind-presentation-paris.pdf gave me a good overview and also explained the behavior what I mentioned.
Thanks,
Vignesh
Hi everyone!
Here's the attached patch submission to optimize pg_rewind performance when many WAL files are retained on server. This patch avoids replaying (copying over) older WAL segment files that fall before the point of divergence between the source and target servers.
Thanks,
Justin
From: Justin Kwan <jkwan@cloudflare.com>
Sent: July 15, 2022 6:13 PM
To: vignesh ravichandran <admin@viggy28.dev>
Cc: pgsql-hackers <pgsql-hackers@postgresql.org>; vignesh <vignesh@cloudflare.com>; justinpkwan@outlook.com <justinpkwan@outlook.com>
Subject: Re: Making pg_rewind faster
Sent: July 15, 2022 6:13 PM
To: vignesh ravichandran <admin@viggy28.dev>
Cc: pgsql-hackers <pgsql-hackers@postgresql.org>; vignesh <vignesh@cloudflare.com>; justinpkwan@outlook.com <justinpkwan@outlook.com>
Subject: Re: Making pg_rewind faster
Looping in my other email.
On Thu, Jun 30, 2022 at 6:22 AM vignesh ravichandran <admin@viggy28.dev> wrote:
Hi Hackers,I have been using pg_rewind in production for 2 years. One of the things that I noticed in pg_rewind is if it doesn't know what to do with a file "it copies". I understand it's the more safer option. After all, the alternative, pg_basebackup copies all the files from source to target.However, this is making pg_rewind inefficient when we have a high number of WAL files. Majority of the data (in most of my cases 95%+) that it copies are WAL files which are anyway same between the source and target. Skipping those same WAL files from copying will improve the speed of pg_rewind a lot.1. Does pg_rewind need to copy WAL files before the WAL that contains the last common check point?Heikki's presentation https://pgsessions.com/assets/archives/pg_rewind-presentation-paris.pdf gave me a good overview and also explained the behavior what I mentioned.Thanks,Vignesh
Attachment
Hi everyone!
I've also attached the pg_rewind optimization patch file for Postgres version 14.4. The previous patch file targets version Postgres version 15 Beta 1/2.
Thanks,
Justin
From: Justin Kwan <jkwan@cloudflare.com>
Sent: July 15, 2022 6:13 PM
To: vignesh ravichandran <admin@viggy28.dev>
Cc: pgsql-hackers <pgsql-hackers@postgresql.org>; vignesh <vignesh@cloudflare.com>; justinpkwan@outlook.com <justinpkwan@outlook.com>
Subject: Re: Making pg_rewind faster
Sent: July 15, 2022 6:13 PM
To: vignesh ravichandran <admin@viggy28.dev>
Cc: pgsql-hackers <pgsql-hackers@postgresql.org>; vignesh <vignesh@cloudflare.com>; justinpkwan@outlook.com <justinpkwan@outlook.com>
Subject: Re: Making pg_rewind faster
Looping in my other email.
On Thu, Jun 30, 2022 at 6:22 AM vignesh ravichandran <admin@viggy28.dev> wrote:
Hi Hackers,I have been using pg_rewind in production for 2 years. One of the things that I noticed in pg_rewind is if it doesn't know what to do with a file "it copies". I understand it's the more safer option. After all, the alternative, pg_basebackup copies all the files from source to target.However, this is making pg_rewind inefficient when we have a high number of WAL files. Majority of the data (in most of my cases 95%+) that it copies are WAL files which are anyway same between the source and target. Skipping those same WAL files from copying will improve the speed of pg_rewind a lot.1. Does pg_rewind need to copy WAL files before the WAL that contains the last common check point?Heikki's presentation https://pgsessions.com/assets/archives/pg_rewind-presentation-paris.pdf gave me a good overview and also explained the behavior what I mentioned.Thanks,Vignesh
Attachment
Justin Kwan <justinpkwan@outlook.com> writes: > I've also attached the pg_rewind optimization patch file for Postgres version 14.4. The previous patch file targets versionPostgres version 15 Beta 1/2. It's very unlikely that we would consider committing such changes into released branches. In fact, it's too late even for v15. You should be submitting non-bug-fix patches against master (v16-to-be). regards, tom lane
Hi Tom,
Thank you for taking a look at this and that sounds good. I will send over a patch compatible with Postgres v16.
Justin
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: July 17, 2022 2:40 PM
To: Justin Kwan <justinpkwan@outlook.com>
Cc: pgsql-hackers <pgsql-hackers@postgresql.org>; vignesh <vignesh@cloudflare.com>; jkwan@cloudflare.com <jkwan@cloudflare.com>; vignesh ravichandran <admin@viggy28.dev>; hlinnaka@iki.fi <hlinnaka@iki.fi>
Subject: Re: Making pg_rewind faster
Sent: July 17, 2022 2:40 PM
To: Justin Kwan <justinpkwan@outlook.com>
Cc: pgsql-hackers <pgsql-hackers@postgresql.org>; vignesh <vignesh@cloudflare.com>; jkwan@cloudflare.com <jkwan@cloudflare.com>; vignesh ravichandran <admin@viggy28.dev>; hlinnaka@iki.fi <hlinnaka@iki.fi>
Subject: Re: Making pg_rewind faster
Justin Kwan <justinpkwan@outlook.com> writes:
> I've also attached the pg_rewind optimization patch file for Postgres version 14.4. The previous patch file targets version Postgres version 15 Beta 1/2.
It's very unlikely that we would consider committing such changes into
released branches. In fact, it's too late even for v15. You should
be submitting non-bug-fix patches against master (v16-to-be).
regards, tom lane
> I've also attached the pg_rewind optimization patch file for Postgres version 14.4. The previous patch file targets version Postgres version 15 Beta 1/2.
It's very unlikely that we would consider committing such changes into
released branches. In fact, it's too late even for v15. You should
be submitting non-bug-fix patches against master (v16-to-be).
regards, tom lane
On Mon, Jul 18, 2022 at 05:14:00PM +0000, Justin Kwan wrote: > Thank you for taking a look at this and that sounds good. I will > send over a patch compatible with Postgres v16. +$node_2->psql( + 'postgres', + "SELECT extract(epoch from modification) FROM pg_stat_file('pg_wal/000000010000000000000003');", + stdout => \my $last_common_tli1_wal_last_modified_at); Please note that you should not rely on the FS-level stats for anything that touches the WAL segments. A rough guess about what you could here to make sure that only the set of WAL segments you are looking for is being copied over would be to either: - Scan the logs produced by pg_rewind and see if the segments are copied or not, depending on the divergence point (aka the last checkpoint before WAL forked). - Clean up pg_wal/ in the target node before running pg_rewind, checking that only the segments you want are available once the operation completes. -- Michael
Attachment
Hi Michael,
Thank you for your feedback, I've incorporated your suggestions by scanning the logs produced from pg_rewind when asserting that certain WAL segment files were skipped from being copied over to the target server.
I've also updated the pg_rewind patch file to target the Postgres master branch (version 16 to be). Please see attached.
Thanks,
Justin
From: Michael Paquier
Sent: Tuesday, July 19, 2022 1:36 AM
To: Justin Kwan
Cc: Tom Lane; pgsql-hackers; vignesh; jkwan@cloudflare.com; vignesh ravichandran; hlinnaka@iki.fi
Subject: Re: Making pg_rewind faster
On Mon, Jul 18, 2022 at 05:14:00PM +0000, Justin Kwan wrote:
> Thank you for taking a look at this and that sounds good. I will
> send over a patch compatible with Postgres v16.
+$node_2->psql(
+ 'postgres',
+ "SELECT extract(epoch from modification) FROM pg_stat_file('pg_wal/000000010000000000000003');",
+ stdout => \my $last_common_tli1_wal_last_modified_at);
Please note that you should not rely on the FS-level stats for
anything that touches the WAL segments. A rough guess about what you
could here to make sure that only the set of WAL segments you are
looking for is being copied over would be to either:
- Scan the logs produced by pg_rewind and see if the segments are
copied or not, depending on the divergence point (aka the last
checkpoint before WAL forked).
- Clean up pg_wal/ in the target node before running pg_rewind,
checking that only the segments you want are available once the
operation completes.
--
Michael
> Thank you for taking a look at this and that sounds good. I will
> send over a patch compatible with Postgres v16.
+$node_2->psql(
+ 'postgres',
+ "SELECT extract(epoch from modification) FROM pg_stat_file('pg_wal/000000010000000000000003');",
+ stdout => \my $last_common_tli1_wal_last_modified_at);
Please note that you should not rely on the FS-level stats for
anything that touches the WAL segments. A rough guess about what you
could here to make sure that only the set of WAL segments you are
looking for is being copied over would be to either:
- Scan the logs produced by pg_rewind and see if the segments are
copied or not, depending on the divergence point (aka the last
checkpoint before WAL forked).
- Clean up pg_wal/ in the target node before running pg_rewind,
checking that only the segments you want are available once the
operation completes.
--
Michael
Attachment
Hi Michael,
Not sure if this email went through previously but thank you for your feedback, I've incorporated your suggestions by scanning the logs produced from pg_rewind when asserting that certain WAL segment files were skipped from being copied over to the target server.
I've also updated the pg_rewind patch file to target the Postgres master branch (version 16 to be). Please see attached.
Thanks,
From: Justin Kwan <justinpkwan@outlook.com>
Sent: July 18, 2022 1:14 PM
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: pgsql-hackers <pgsql-hackers@postgresql.org>; vignesh <vignesh@cloudflare.com>; jkwan@cloudflare.com <jkwan@cloudflare.com>; vignesh ravichandran <admin@viggy28.dev>; hlinnaka@iki.fi <hlinnaka@iki.fi>
Subject: Re: Making pg_rewind faster
Sent: July 18, 2022 1:14 PM
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: pgsql-hackers <pgsql-hackers@postgresql.org>; vignesh <vignesh@cloudflare.com>; jkwan@cloudflare.com <jkwan@cloudflare.com>; vignesh ravichandran <admin@viggy28.dev>; hlinnaka@iki.fi <hlinnaka@iki.fi>
Subject: Re: Making pg_rewind faster
Hi Tom,
Thank you for taking a look at this and that sounds good. I will send over a patch compatible with Postgres v16.
Justin
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: July 17, 2022 2:40 PM
To: Justin Kwan <justinpkwan@outlook.com>
Cc: pgsql-hackers <pgsql-hackers@postgresql.org>; vignesh <vignesh@cloudflare.com>; jkwan@cloudflare.com <jkwan@cloudflare.com>; vignesh ravichandran <admin@viggy28.dev>; hlinnaka@iki.fi <hlinnaka@iki.fi>
Subject: Re: Making pg_rewind faster
Sent: July 17, 2022 2:40 PM
To: Justin Kwan <justinpkwan@outlook.com>
Cc: pgsql-hackers <pgsql-hackers@postgresql.org>; vignesh <vignesh@cloudflare.com>; jkwan@cloudflare.com <jkwan@cloudflare.com>; vignesh ravichandran <admin@viggy28.dev>; hlinnaka@iki.fi <hlinnaka@iki.fi>
Subject: Re: Making pg_rewind faster
Justin Kwan <justinpkwan@outlook.com> writes:
> I've also attached the pg_rewind optimization patch file for Postgres version 14.4. The previous patch file targets version Postgres version 15 Beta 1/2.
It's very unlikely that we would consider committing such changes into
released branches. In fact, it's too late even for v15. You should
be submitting non-bug-fix patches against master (v16-to-be).
regards, tom lane
> I've also attached the pg_rewind optimization patch file for Postgres version 14.4. The previous patch file targets version Postgres version 15 Beta 1/2.
It's very unlikely that we would consider committing such changes into
released branches. In fact, it's too late even for v15. You should
be submitting non-bug-fix patches against master (v16-to-be).
regards, tom lane
Attachment
Hi, Justin! On Fri, Jul 29, 2022 at 1:05 PM Justin Kwan <justinpkwan@outlook.com> wrote: > Not sure if this email went through previously but thank you for your feedback, I've incorporated your suggestions by scanningthe logs produced from pg_rewind when asserting that certain WAL segment files were skipped from being copied overto the target server. > > I've also updated the pg_rewind patch file to target the Postgres master branch (version 16 to be). Please see attached. Thank you for the revision. I've taken a look at this patch. Overall it looks good to me. I also don't see any design objections in the thread. A couple of points from me: 1) I would prefer to evade hard-coded names for WAL segments in the tap tests. Could we calculate the names in the tap tests based on the diverge point, etc.? 2) Patch contains some indentation with spaces, which should be done in tabs. Please consider either manually fixing this or running pgindent over modified files. ------ Regards, Alexander Korotkov
Hi, On 2022-09-13 20:50:20 +0300, Alexander Korotkov wrote: > On Fri, Jul 29, 2022 at 1:05 PM Justin Kwan <justinpkwan@outlook.com> wrote: > > Not sure if this email went through previously but thank you for your feedback, I've incorporated your suggestions byscanning the logs produced from pg_rewind when asserting that certain WAL segment files were skipped from being copiedover to the target server. > > > > I've also updated the pg_rewind patch file to target the Postgres master branch (version 16 to be). Please see attached. > > Thank you for the revision. > > I've taken a look at this patch. Overall it looks good to me. I also > don't see any design objections in the thread. > > A couple of points from me: > 1) I would prefer to evade hard-coded names for WAL segments in the > tap tests. Could we calculate the names in the tap tests based on the > diverge point, etc.? > 2) Patch contains some indentation with spaces, which should be done > in tabs. Please consider either manually fixing this or running > pgindent over modified files. This patch currently fails because it hasn't been adjusted for commit c47885bd8b6 Author: Andres Freund <andres@anarazel.de> Date: 2022-09-19 18:03:17 -0700 Split TESTDIR into TESTLOGDIR and TESTDATADIR The adjustment is trivial. Attached, together with also producing an error message rather than just dying wordlessly. It doesn't seem quite right to read pg_rewind's logs by reading regress_log_001_basic. Too easy to confuse different runs of pg_rewind etc. I'd suggest trying to redirect the log to a different file. With regard to Alexander's point about whitespace: .git/rebase-apply/patch:25: indent with spaces. /* Handle WAL segment file. */ .git/rebase-apply/patch:26: indent with spaces. const char *fname; .git/rebase-apply/patch:27: indent with spaces. char *slash; .git/rebase-apply/patch:29: indent with spaces. /* Split filepath into directory & filename. */ .git/rebase-apply/patch:30: indent with spaces. slash = strrchr(path, '/'); warning: squelched 29 whitespace errors warning: 34 lines add whitespace errors. Greetings, Andres Freund
Attachment
On Sun, Oct 02, 2022 at 10:44:25AM -0700, Andres Freund wrote: > It doesn't seem quite right to read pg_rewind's logs by reading > regress_log_001_basic. Too easy to confuse different runs of pg_rewind > etc. I'd suggest trying to redirect the log to a different file. Hardcoding log file names in the test increases the overall maintenance, even if renaming these would be easy to track and fix if the naming convention is changed. Anyway, I think that what this patch should do is to use command_checks_all() in RewindTest.pm as it is the test API able to check after a status and multiple expected outputs, which is what the changes in 001 and 008 are doing. RewindTest::run_pg_rewind() needs to be a bit tweaked to accept these regex patterns in input. + if (file_segno < last_common_segno) + { + pg_log_debug("WAL file entry \"%s\" not copied to target", fname); + return FILE_ACTION_NONE; + } There may be something I am missing here, but there is no need to care about segments with a TLI older than lastcommontliIndex, no? decide_wal_file_action() assumes that the WAL segment exists on the target and the source. This looks bug-prone to me without at least an assertion. file_entry_t has an entry to track if a file is a relation file. I think that it would be much cleaner to track if we are handling a WAL segment when inserting an entry in insert_filehash_entry(), so isrelfile could be replaced by an enum with three values: relation file, WAL segment and the rest. -- Michael
Attachment
On Thu, Oct 06, 2022 at 04:08:45PM +0900, Michael Paquier wrote: > file_entry_t has an entry to track if a file is a relation file. I > think that it would be much cleaner to track if we are handling a WAL > segment when inserting an entry in insert_filehash_entry(), so > isrelfile could be replaced by an enum with three values: relation > file, WAL segment and the rest. This review has been done a few weeks ago, and there has been no update since, so I am marking this entry as returned with feedback in the CF app. -- Michael