Thread: pgsql: Wait for WAL summarization to catch up before creating .partial

Wait for WAL summarization to catch up before creating .partial file.

When a standby is promoted, CleanupAfterArchiveRecovery() may decide
to rename the final WAL file from the old timeline by adding ".partial"
to the name. If WAL summarization is enabled and this file is renamed
before its partial contents are summarized, WAL summarization breaks:
the summarizer gets stuck at that point in the WAL stream and just
errors out.

To fix that, first make the startup process wait for WAL summarization
to catch up before renaming the file. Generally, this should be quick,
and if it's not, the user can shut off summarize_wal and try again.
To make this fix work, also teach the WAL summarizer that after a
promotion has occurred, no more WAL can appear on the previous
timeline: previously, the WAL summarizer wouldn't switch to the new
timeline until we actually started writing WAL there, but that meant
that when the startup process was waiting for the WAL summarizer, it
was waiting for an action that the summarizer wasn't yet prepared to
take.

In the process of fixing these bugs, I realized that the logic to wait
for WAL summarization to catch up was spread out in a way that made
it difficult to reuse properly, so this code refactors things to make
it easier.

Finally, add a test case that would have caught this bug and the
previously-fixed bug that WAL summarization sometimes needs to back up
when the timeline changes.

Discussion: https://postgr.es/m/CA+TgmoZGEsZodXC4f=XZNkAeyuDmWTSkpkjCEOcF19Am0mt_OA@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/8a53539bd603e5fe8fa52bdbb7277f6f49724522

Modified Files
--------------
src/backend/access/transam/xlog.c           |  33 +++++++
src/backend/backup/basebackup_incremental.c |  90 ++----------------
src/backend/postmaster/walsummarizer.c      | 142 +++++++++++++++++++++++-----
src/bin/pg_combinebackup/meson.build        |   1 +
src/bin/pg_combinebackup/t/008_promote.pl   |  81 ++++++++++++++++
src/include/access/xlog.h                   |   1 +
src/include/postmaster/walsummarizer.h      |   3 +-
7 files changed, 241 insertions(+), 110 deletions(-)


Re: pgsql: Wait for WAL summarization to catch up before creating .partial

From
Alexander Korotkov
Date:
On Fri, Jul 26, 2024 at 10:01 PM Robert Haas <rhaas@postgresql.org> wrote:
> Wait for WAL summarization to catch up before creating .partial file.
>
> When a standby is promoted, CleanupAfterArchiveRecovery() may decide
> to rename the final WAL file from the old timeline by adding ".partial"
> to the name. If WAL summarization is enabled and this file is renamed
> before its partial contents are summarized, WAL summarization breaks:
> the summarizer gets stuck at that point in the WAL stream and just
> errors out.
>
> To fix that, first make the startup process wait for WAL summarization
> to catch up before renaming the file. Generally, this should be quick,
> and if it's not, the user can shut off summarize_wal and try again.
> To make this fix work, also teach the WAL summarizer that after a
> promotion has occurred, no more WAL can appear on the previous
> timeline: previously, the WAL summarizer wouldn't switch to the new
> timeline until we actually started writing WAL there, but that meant
> that when the startup process was waiting for the WAL summarizer, it
> was waiting for an action that the summarizer wasn't yet prepared to
> take.
>
> In the process of fixing these bugs, I realized that the logic to wait
> for WAL summarization to catch up was spread out in a way that made
> it difficult to reuse properly, so this code refactors things to make
> it easier.
>
> Finally, add a test case that would have caught this bug and the
> previously-fixed bug that WAL summarization sometimes needs to back up
> when the timeline changes.

It appears that I was late with my review [1].  But the new tap test
could still use pgperltidy.

Links.
1. https://www.postgresql.org/message-id/CAPpHfduW3du0W%3D3noztdaJ6evGP9gqT1AGk_rwXrqDyus1zZoQ%40mail.gmail.com

------
Regards,
Alexander Korotkov
Supabase



Alexander Korotkov <aekorotkov@gmail.com> writes:
> It appears that I was late with my review [1].  But the new tap test
> could still use pgperltidy.

I believe our current policy is that we're asking committers to
maintain pgindent cleanliness, but not pgperltidy (which is why
BF member koel isn't checking pgperltidy).  We might get to that
eventually, but we're not there yet.

            regards, tom lane



Re: pgsql: Wait for WAL summarization to catch up before creating .partial

From
Alexander Korotkov
Date:
On Sun, Jul 28, 2024 at 7:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexander Korotkov <aekorotkov@gmail.com> writes:
> > It appears that I was late with my review [1].  But the new tap test
> > could still use pgperltidy.
>
> I believe our current policy is that we're asking committers to
> maintain pgindent cleanliness, but not pgperltidy (which is why
> BF member koel isn't checking pgperltidy).  We might get to that
> eventually, but we're not there yet.

Got it, thank you.  Sorry for noise, then.

------
Regards,
Alexander Korotkov
Supabase