Re: Add two missing tests in 035_standby_logical_decoding.pl - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Add two missing tests in 035_standby_logical_decoding.pl
Date
Msg-id CAA4eK1+-uuH+sC42_=wA=xPyzCN+U3VNeW_d-BRGvt-pGkNT+Q@mail.gmail.com
Whole thread Raw
In response to Re: Add two missing tests in 035_standby_logical_decoding.pl  ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>)
Responses Re: Add two missing tests in 035_standby_logical_decoding.pl
List pgsql-hackers
On Fri, May 5, 2023 at 7:53 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
>
>
> On 5/5/23 2:28 PM, Amit Kapila wrote:
> > On Fri, May 5, 2023 at 5:36 PM Drouvot, Bertrand
> > <bertranddrouvot.pg@gmail.com> wrote:
> >
> > It seems due to some reason the current wal file is not switched due
> > to some reason.
>
> Oh wait, here is a NON failing one: https://cirrus-ci.com/task/5086849685782528 (I modified the
> .cirrus.yml so that we can download the "testrun.zip" file even if the test is not failing).
>
> So, in this testrun.zip we can see, that the test is ok:
>
> $ grep -i retain ./build/testrun/recovery/035_standby_logical_decoding/log/regress_log_035_standby_logical_decoding
> [10:06:08.789](0.000s) ok 19 - invalidated logical slots do not lead to retaining WAL
>
> and that the WAL file we expect to be removed is:
>
> $ grep "WAL file is"
./build/testrun/recovery/035_standby_logical_decoding/log/regress_log_035_standby_logical_decoding
> [10:06:08.789](0.925s) # BDT WAL file is
/Users/admin/pgsql/build/testrun/recovery/035_standby_logical_decoding/data/t_035_standby_logical_decoding_standby_data/pgdata/pg_wal/000000010000000000000003
>
> This WAL file has been removed by the standby:
>
> $ grep -i 000000010000000000000003
./build/testrun/recovery/035_standby_logical_decoding/log/035_standby_logical_decoding_standby.log| grep -i recy 
> 2023-05-05 10:06:08.787 UTC [17521][checkpointer] DEBUG:  00000: recycled write-ahead log file
"000000010000000000000003"
>
> But on the primary, it has been recycled way after that time:
>
> $ grep -i 000000010000000000000003
./build/testrun/recovery/035_standby_logical_decoding/log/035_standby_logical_decoding_primary.log| grep -i recy 
> 2023-05-05 10:06:13.370 UTC [16785][checkpointer] DEBUG:  00000: recycled write-ahead log file
"000000010000000000000003"
>
> As, the checkpoint on the primary after the WAL file switch only recycled (001 and 002):
>
> $ grep -i recycled ./build/testrun/recovery/035_standby_logical_decoding/log/035_standby_logical_decoding_primary.log
> 2023-05-05 10:05:57.196 UTC [16785][checkpointer] LOG:  00000: checkpoint complete: wrote 4 buffers (3.1%); 0 WAL
file(s)added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.027 s; sync files=0, longest=0.000 s,
average=0.000s; distance=11219 kB, estimate=11219 kB; lsn=0/2000060, redo lsn=0/2000028 
> 2023-05-05 10:06:08.138 UTC [16785][checkpointer] DEBUG:  00000: recycled write-ahead log file
"000000010000000000000001"
> 2023-05-05 10:06:08.138 UTC [16785][checkpointer] DEBUG:  00000: recycled write-ahead log file
"000000010000000000000002"
> 2023-05-05 10:06:08.138 UTC [16785][checkpointer] LOG:  00000: checkpoint complete: wrote 20 buffers (15.6%); 0 WAL
file(s)added, 0 removed, 2 recycled; write=0.001 s, sync=0.001 s, total=0.003 s; sync files=0, longest=0.000 s,
average=0.000s; distance=32768 kB, estimate=32768 kB; lsn=0/40000D0, redo lsn=0/4000098 
>
>
> So, even on a successful test, we can see that the WAL file we expect to be removed on the standby has not been
recycledon the primary before the test. 
>

Okay, one possibility of not removing on primary is that at the time
of checkpoint (when we compute RedoRecPtr), the wal_swtich and insert
is not yet performed because in that case it will compute the
RedoRecPtr as a location before those operations which would be 0000*3
file. However, it is not clear how is that possible except from a
background checkpoint happening at that point but from LOGs, it
appears that the checkpoint triggered by test has recycled the wal
files.

> > I think we need to add more DEBUG info to find that
> > out. Can you please try to print 'RedoRecPtr', '_logSegNo', and
> > recptr?
> >
>
> Yes, will do.
>

Okay, thanks, please try to print similar locations on standby in
CreateRestartPoint().

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Order changes in PG16 since ICU introduction
Next
From: Michael Paquier
Date:
Subject: Re: Autogenerate some wait events code and documentation