Thread: pgsql: Wait for WAL summarization to catch up before creating .partial
Wait for WAL summarization to catch up before creating .partial file. When a standby is promoted, CleanupAfterArchiveRecovery() may decide to rename the final WAL file from the old timeline by adding ".partial" to the name. If WAL summarization is enabled and this file is renamed before its partial contents are summarized, WAL summarization breaks: the summarizer gets stuck at that point in the WAL stream and just errors out. To fix that, first make the startup process wait for WAL summarization to catch up before renaming the file. Generally, this should be quick, and if it's not, the user can shut off summarize_wal and try again. To make this fix work, also teach the WAL summarizer that after a promotion has occurred, no more WAL can appear on the previous timeline: previously, the WAL summarizer wouldn't switch to the new timeline until we actually started writing WAL there, but that meant that when the startup process was waiting for the WAL summarizer, it was waiting for an action that the summarizer wasn't yet prepared to take. In the process of fixing these bugs, I realized that the logic to wait for WAL summarization to catch up was spread out in a way that made it difficult to reuse properly, so this code refactors things to make it easier. Finally, add a test case that would have caught this bug and the previously-fixed bug that WAL summarization sometimes needs to back up when the timeline changes. Discussion: https://postgr.es/m/CA+TgmoZGEsZodXC4f=XZNkAeyuDmWTSkpkjCEOcF19Am0mt_OA@mail.gmail.com Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/8a53539bd603e5fe8fa52bdbb7277f6f49724522 Modified Files -------------- src/backend/access/transam/xlog.c | 33 +++++++ src/backend/backup/basebackup_incremental.c | 90 ++---------------- src/backend/postmaster/walsummarizer.c | 142 +++++++++++++++++++++++----- src/bin/pg_combinebackup/meson.build | 1 + src/bin/pg_combinebackup/t/008_promote.pl | 81 ++++++++++++++++ src/include/access/xlog.h | 1 + src/include/postmaster/walsummarizer.h | 3 +- 7 files changed, 241 insertions(+), 110 deletions(-)
Re: pgsql: Wait for WAL summarization to catch up before creating .partial
From
Alexander Korotkov
Date:
On Fri, Jul 26, 2024 at 10:01 PM Robert Haas <rhaas@postgresql.org> wrote: > Wait for WAL summarization to catch up before creating .partial file. > > When a standby is promoted, CleanupAfterArchiveRecovery() may decide > to rename the final WAL file from the old timeline by adding ".partial" > to the name. If WAL summarization is enabled and this file is renamed > before its partial contents are summarized, WAL summarization breaks: > the summarizer gets stuck at that point in the WAL stream and just > errors out. > > To fix that, first make the startup process wait for WAL summarization > to catch up before renaming the file. Generally, this should be quick, > and if it's not, the user can shut off summarize_wal and try again. > To make this fix work, also teach the WAL summarizer that after a > promotion has occurred, no more WAL can appear on the previous > timeline: previously, the WAL summarizer wouldn't switch to the new > timeline until we actually started writing WAL there, but that meant > that when the startup process was waiting for the WAL summarizer, it > was waiting for an action that the summarizer wasn't yet prepared to > take. > > In the process of fixing these bugs, I realized that the logic to wait > for WAL summarization to catch up was spread out in a way that made > it difficult to reuse properly, so this code refactors things to make > it easier. > > Finally, add a test case that would have caught this bug and the > previously-fixed bug that WAL summarization sometimes needs to back up > when the timeline changes. It appears that I was late with my review [1]. But the new tap test could still use pgperltidy. Links. 1. https://www.postgresql.org/message-id/CAPpHfduW3du0W%3D3noztdaJ6evGP9gqT1AGk_rwXrqDyus1zZoQ%40mail.gmail.com ------ Regards, Alexander Korotkov Supabase
Alexander Korotkov <aekorotkov@gmail.com> writes: > It appears that I was late with my review [1]. But the new tap test > could still use pgperltidy. I believe our current policy is that we're asking committers to maintain pgindent cleanliness, but not pgperltidy (which is why BF member koel isn't checking pgperltidy). We might get to that eventually, but we're not there yet. regards, tom lane
Re: pgsql: Wait for WAL summarization to catch up before creating .partial
From
Alexander Korotkov
Date:
On Sun, Jul 28, 2024 at 7:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alexander Korotkov <aekorotkov@gmail.com> writes: > > It appears that I was late with my review [1]. But the new tap test > > could still use pgperltidy. > > I believe our current policy is that we're asking committers to > maintain pgindent cleanliness, but not pgperltidy (which is why > BF member koel isn't checking pgperltidy). We might get to that > eventually, but we're not there yet. Got it, thank you. Sorry for noise, then. ------ Regards, Alexander Korotkov Supabase