Thread: Making pg_rewind faster

Making pg_rewind faster

From
vignesh ravichandran
Date:
Hi Hackers,

I have been using pg_rewind in production for 2 years. One of the things that I noticed in pg_rewind is if it doesn't know what to do with a file "it copies". I understand it's the more safer option. After all, the alternative, pg_basebackup copies all the files from source to target.

However, this is making pg_rewind inefficient when we have a high number of WAL files. Majority of the data (in most of my cases 95%+) that it copies are WAL files which are anyway same between the source and target. Skipping those same WAL files from copying will improve the speed of pg_rewind a lot.

1. Does pg_rewind need to copy WAL files before the WAL that contains the last common check point?

Heikki's presentation https://pgsessions.com/assets/archives/pg_rewind-presentation-paris.pdf gave me a good overview and also explained the behavior what I mentioned.

Thanks,
Vignesh

Re: Making pg_rewind faster

From
Justin Kwan
Date:
Hi everyone!

Here's the attached patch submission to optimize pg_rewind performance when many WAL files are retained on server. This patch avoids replaying (copying over) older WAL segment files that fall before the point of divergence between the source and target servers.

Thanks,
Justin

From: Justin Kwan <jkwan@cloudflare.com>
Sent: July 15, 2022 6:13 PM
To: vignesh ravichandran <admin@viggy28.dev>
Cc: pgsql-hackers <pgsql-hackers@postgresql.org>; vignesh <vignesh@cloudflare.com>; justinpkwan@outlook.com <justinpkwan@outlook.com>
Subject: Re: Making pg_rewind faster
 
Looping in my other email.

On Thu, Jun 30, 2022 at 6:22 AM vignesh ravichandran <admin@viggy28.dev> wrote:
Hi Hackers,

I have been using pg_rewind in production for 2 years. One of the things that I noticed in pg_rewind is if it doesn't know what to do with a file "it copies". I understand it's the more safer option. After all, the alternative, pg_basebackup copies all the files from source to target.

However, this is making pg_rewind inefficient when we have a high number of WAL files. Majority of the data (in most of my cases 95%+) that it copies are WAL files which are anyway same between the source and target. Skipping those same WAL files from copying will improve the speed of pg_rewind a lot.

1. Does pg_rewind need to copy WAL files before the WAL that contains the last common check point?

Heikki's presentation https://pgsessions.com/assets/archives/pg_rewind-presentation-paris.pdf gave me a good overview and also explained the behavior what I mentioned.

Thanks,
Vignesh

Attachment

Re: Making pg_rewind faster

From
Justin Kwan
Date:
Hi everyone!

I've also attached the pg_rewind optimization patch file for Postgres version 14.4. The previous patch file targets version Postgres version 15 Beta 1/2.

Thanks,
Justin

From: Justin Kwan <jkwan@cloudflare.com>
Sent: July 15, 2022 6:13 PM
To: vignesh ravichandran <admin@viggy28.dev>
Cc: pgsql-hackers <pgsql-hackers@postgresql.org>; vignesh <vignesh@cloudflare.com>; justinpkwan@outlook.com <justinpkwan@outlook.com>
Subject: Re: Making pg_rewind faster
 
Looping in my other email.

On Thu, Jun 30, 2022 at 6:22 AM vignesh ravichandran <admin@viggy28.dev> wrote:
Hi Hackers,

I have been using pg_rewind in production for 2 years. One of the things that I noticed in pg_rewind is if it doesn't know what to do with a file "it copies". I understand it's the more safer option. After all, the alternative, pg_basebackup copies all the files from source to target.

However, this is making pg_rewind inefficient when we have a high number of WAL files. Majority of the data (in most of my cases 95%+) that it copies are WAL files which are anyway same between the source and target. Skipping those same WAL files from copying will improve the speed of pg_rewind a lot.

1. Does pg_rewind need to copy WAL files before the WAL that contains the last common check point?

Heikki's presentation https://pgsessions.com/assets/archives/pg_rewind-presentation-paris.pdf gave me a good overview and also explained the behavior what I mentioned.

Thanks,
Vignesh

Attachment

Re: Making pg_rewind faster

From
Tom Lane
Date:
Justin Kwan <justinpkwan@outlook.com> writes:
> I've also attached the pg_rewind optimization patch file for Postgres version 14.4. The previous patch file targets
versionPostgres version 15 Beta 1/2. 

It's very unlikely that we would consider committing such changes into
released branches.  In fact, it's too late even for v15.  You should
be submitting non-bug-fix patches against master (v16-to-be).

            regards, tom lane



Re: Making pg_rewind faster

From
Justin Kwan
Date:
Hi Tom,

Thank you for taking a look at this and that sounds good. I will send over a patch compatible with Postgres v16.

Justin

From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: July 17, 2022 2:40 PM
To: Justin Kwan <justinpkwan@outlook.com>
Cc: pgsql-hackers <pgsql-hackers@postgresql.org>; vignesh <vignesh@cloudflare.com>; jkwan@cloudflare.com <jkwan@cloudflare.com>; vignesh ravichandran <admin@viggy28.dev>; hlinnaka@iki.fi <hlinnaka@iki.fi>
Subject: Re: Making pg_rewind faster
 
Justin Kwan <justinpkwan@outlook.com> writes:
> I've also attached the pg_rewind optimization patch file for Postgres version 14.4. The previous patch file targets version Postgres version 15 Beta 1/2.

It's very unlikely that we would consider committing such changes into
released branches.  In fact, it's too late even for v15.  You should
be submitting non-bug-fix patches against master (v16-to-be).

                        regards, tom lane

Re: Making pg_rewind faster

From
Michael Paquier
Date:
On Mon, Jul 18, 2022 at 05:14:00PM +0000, Justin Kwan wrote:
> Thank you for taking a look at this and that sounds good. I will
> send over a patch compatible with Postgres v16.

+$node_2->psql(
+       'postgres',
+       "SELECT extract(epoch from modification) FROM pg_stat_file('pg_wal/000000010000000000000003');",
+       stdout => \my $last_common_tli1_wal_last_modified_at);
Please note that you should not rely on the FS-level stats for
anything that touches the WAL segments.  A rough guess about what you
could here to make sure that only the set of WAL segments you are
looking for is being copied over would be to either:
- Scan the logs produced by pg_rewind and see if the segments are
copied or not, depending on the divergence point (aka the last
checkpoint before WAL forked).
- Clean up pg_wal/ in the target node before running pg_rewind,
checking that only the segments you want are available once the
operation completes.
--
Michael

Attachment

Re: Making pg_rewind faster

From
Justin Kwan
Date:
Hi Michael,

Thank you for your feedback, I've incorporated your suggestions by scanning the logs produced from pg_rewind when asserting that certain WAL segment files were skipped from being copied over to the target server.

I've also updated the pg_rewind patch file to target the Postgres master branch (version 16 to be). Please see attached.

Thanks,
Justin




From: Michael Paquier
Sent: Tuesday, July 19, 2022 1:36 AM
To: Justin Kwan
Cc: Tom Lane; pgsql-hackers; vignesh; jkwan@cloudflare.com; vignesh ravichandran; hlinnaka@iki.fi
Subject: Re: Making pg_rewind faster

On Mon, Jul 18, 2022 at 05:14:00PM +0000, Justin Kwan wrote:
> Thank you for taking a look at this and that sounds good. I will
> send over a patch compatible with Postgres v16.

+$node_2->psql(
+       'postgres',
+       "SELECT extract(epoch from modification) FROM pg_stat_file('pg_wal/000000010000000000000003');",
+       stdout => \my $last_common_tli1_wal_last_modified_at);
Please note that you should not rely on the FS-level stats for
anything that touches the WAL segments.  A rough guess about what you
could here to make sure that only the set of WAL segments you are
looking for is being copied over would be to either:
- Scan the logs produced by pg_rewind and see if the segments are
copied or not, depending on the divergence point (aka the last
checkpoint before WAL forked).
- Clean up pg_wal/ in the target node before running pg_rewind,
checking that only the segments you want are available once the
operation completes.
--
Michael
Attachment

Re: Making pg_rewind faster

From
Justin Kwan
Date:
Hi Michael,

Not sure if this email went through previously but thank you for your feedback, I've incorporated your suggestions by scanning the logs produced from pg_rewind when asserting that certain WAL segment files were skipped from being copied over to the target server.

I've also updated the pg_rewind patch file to target the Postgres master branch (version 16 to be). Please see attached.

Thanks,
Justin


From: Justin Kwan <justinpkwan@outlook.com>
Sent: July 18, 2022 1:14 PM
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: pgsql-hackers <pgsql-hackers@postgresql.org>; vignesh <vignesh@cloudflare.com>; jkwan@cloudflare.com <jkwan@cloudflare.com>; vignesh ravichandran <admin@viggy28.dev>; hlinnaka@iki.fi <hlinnaka@iki.fi>
Subject: Re: Making pg_rewind faster
 
Hi Tom,

Thank you for taking a look at this and that sounds good. I will send over a patch compatible with Postgres v16.

Justin

From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: July 17, 2022 2:40 PM
To: Justin Kwan <justinpkwan@outlook.com>
Cc: pgsql-hackers <pgsql-hackers@postgresql.org>; vignesh <vignesh@cloudflare.com>; jkwan@cloudflare.com <jkwan@cloudflare.com>; vignesh ravichandran <admin@viggy28.dev>; hlinnaka@iki.fi <hlinnaka@iki.fi>
Subject: Re: Making pg_rewind faster
 
Justin Kwan <justinpkwan@outlook.com> writes:
> I've also attached the pg_rewind optimization patch file for Postgres version 14.4. The previous patch file targets version Postgres version 15 Beta 1/2.

It's very unlikely that we would consider committing such changes into
released branches.  In fact, it's too late even for v15.  You should
be submitting non-bug-fix patches against master (v16-to-be).

                        regards, tom lane
Attachment

Re: Making pg_rewind faster

From
Alexander Korotkov
Date:
Hi, Justin!

On Fri, Jul 29, 2022 at 1:05 PM Justin Kwan <justinpkwan@outlook.com> wrote:
> Not sure if this email went through previously but thank you for your feedback, I've incorporated your suggestions by
scanningthe logs produced from pg_rewind when asserting that certain WAL segment files were skipped from being copied
overto the target server. 
>
> I've also updated the pg_rewind patch file to target the Postgres master branch (version 16 to be). Please see
attached.

Thank you for the revision.

I've taken a look at this patch. Overall it looks good to me. I also
don't see any design objections in the thread.

A couple of points from me:
1) I would prefer to evade hard-coded names for WAL segments in the
tap tests. Could we calculate the names in the tap tests based on the
diverge point, etc.?
2) Patch contains some indentation with spaces, which should be done
in tabs. Please consider either manually fixing this or running
pgindent over modified files.

------
Regards,
Alexander Korotkov



Re: Making pg_rewind faster

From
Andres Freund
Date:
Hi,

On 2022-09-13 20:50:20 +0300, Alexander Korotkov wrote:
> On Fri, Jul 29, 2022 at 1:05 PM Justin Kwan <justinpkwan@outlook.com> wrote:
> > Not sure if this email went through previously but thank you for your feedback, I've incorporated your suggestions
byscanning the logs produced from pg_rewind when asserting that certain WAL segment files were skipped from being
copiedover to the target server.
 
> >
> > I've also updated the pg_rewind patch file to target the Postgres master branch (version 16 to be). Please see
attached.
> 
> Thank you for the revision.
> 
> I've taken a look at this patch. Overall it looks good to me. I also
> don't see any design objections in the thread.
> 
> A couple of points from me:
> 1) I would prefer to evade hard-coded names for WAL segments in the
> tap tests. Could we calculate the names in the tap tests based on the
> diverge point, etc.?
> 2) Patch contains some indentation with spaces, which should be done
> in tabs. Please consider either manually fixing this or running
> pgindent over modified files.

This patch currently fails because it hasn't been adjusted for
commit c47885bd8b6
Author: Andres Freund <andres@anarazel.de>
Date:   2022-09-19 18:03:17 -0700
 
    Split TESTDIR into TESTLOGDIR and TESTDATADIR

The adjustment is trivial. Attached, together with also producing an error
message rather than just dying wordlessly.


It doesn't seem quite right to read pg_rewind's logs by reading
regress_log_001_basic. Too easy to confuse different runs of pg_rewind
etc. I'd suggest trying to redirect the log to a different file.


With regard to Alexander's point about whitespace:

.git/rebase-apply/patch:25: indent with spaces.
                /* Handle WAL segment file. */
.git/rebase-apply/patch:26: indent with spaces.
                const char  *fname;
.git/rebase-apply/patch:27: indent with spaces.
                char        *slash;
.git/rebase-apply/patch:29: indent with spaces.
                /* Split filepath into directory & filename. */
.git/rebase-apply/patch:30: indent with spaces.
                slash = strrchr(path, '/');
warning: squelched 29 whitespace errors
warning: 34 lines add whitespace errors.

Greetings,

Andres Freund

Attachment

Re: Making pg_rewind faster

From
Michael Paquier
Date:
On Sun, Oct 02, 2022 at 10:44:25AM -0700, Andres Freund wrote:
> It doesn't seem quite right to read pg_rewind's logs by reading
> regress_log_001_basic. Too easy to confuse different runs of pg_rewind
> etc. I'd suggest trying to redirect the log to a different file.

Hardcoding log file names in the test increases the overall
maintenance, even if renaming these would be easy to track and fix if
the naming convention is changed.  Anyway, I think that what this
patch should do is to use command_checks_all() in RewindTest.pm as it
is the test API able to check after a status and multiple expected
outputs, which is what the changes in 001 and 008 are doing.
RewindTest::run_pg_rewind() needs to be a bit tweaked to accept these
regex patterns in input.

+    if (file_segno < last_common_segno)
+    {
+        pg_log_debug("WAL file entry \"%s\" not copied to target", fname);
+        return FILE_ACTION_NONE;
+    }
There may be something I am missing here, but there is no need to care
about segments with a TLI older than lastcommontliIndex, no?

decide_wal_file_action() assumes that the WAL segment exists on the
target and the source.  This looks bug-prone to me without at least an
assertion.

file_entry_t has an entry to track if a file is a relation file.  I
think that it would be much cleaner to track if we are handling a WAL
segment when inserting an entry in insert_filehash_entry(), so
isrelfile could be replaced by an enum with three values: relation
file, WAL segment and the rest.
--
Michael

Attachment

Re: Making pg_rewind faster

From
Michael Paquier
Date:
On Thu, Oct 06, 2022 at 04:08:45PM +0900, Michael Paquier wrote:
> file_entry_t has an entry to track if a file is a relation file.  I
> think that it would be much cleaner to track if we are handling a WAL
> segment when inserting an entry in insert_filehash_entry(), so
> isrelfile could be replaced by an enum with three values: relation
> file, WAL segment and the rest.

This review has been done a few weeks ago, and there has been no
update since, so I am marking this entry as returned with feedback in
the CF app.
--
Michael

Attachment