Re: standby promotion can create unreadable WAL - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: standby promotion can create unreadable WAL
Date
Msg-id CAFiTN-s0FGJTs7GYm-uw5bzhRZ=d1m0jQ_zruAw2jYHBgP5G1w@mail.gmail.com
Whole thread Raw
In response to Re: standby promotion can create unreadable WAL  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
On Fri, Aug 26, 2022 at 6:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Aug 23, 2022 at 12:06 AM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> However, if anything
> > did try to look at file #4 it would get confused. Maybe that can
> > happen if this is a streaming standby, where we only write an
> > end-of-recovery record upon promotion, rather than a checkpoint, or
> > maybe if there are cascading standbys someone could try to actually
> > use the 000000020000000000000004 file for something. I'm not sure. But
> > unless I'm missing something, that file is bogus, and our only hope of
> > not having problems is that perhaps no one will ever look at it.

I tried to see the problem with the cascading standby, basically the
setup is like below
pgprimary->pgstandby(archive only)->pgcascade(streaming + archive).

The second node has to be archive only because this 0 filled gap is
created in archive only mode.  With that I have noticed that the when
cascading standby is getting that 0 filled gap it report same error
what we seen with pg_waldump and that it keep waiting forever on that
file.  I have attached a test case, but I think timing is not done
perfectly in this test so before the cascading standby setup some of
the WAL file get removed by the pgstandby so I just put direct return
in RemoveOldXlogFiles() to test this[2].  And this problem is getting
resolved with the patch given by Robert upthread.

[1]
2022-08-25 16:21:26.413 IST [18235] LOG:  invalid record length at
0/FFFFEA8: wanted 24, got 0

[2]
diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index eb5115f..990a879 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3558,6 +3558,7 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr
lastredoptr, XLogRecPtr endptr,
        XLogSegNo       endlogSegNo;
        XLogSegNo       recycleSegNo;

+       return;
        /* Initialize info about where to try to recycle to */
        XLByteToSeg(endptr, endlogSegNo, wal_segment_size);
        recycleSegNo = XLOGfileslop(lastredoptr);

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Reducing the chunk header sizes on all memory context types
Next
From: Thomas Munro
Date:
Subject: Re: logical decoding and replication of sequences