A failure in t/038_save_logical_slots_shutdown.pl - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject A failure in t/038_save_logical_slots_shutdown.pl
Date
Msg-id CALj2ACVLzH5CN-h9=S26mdRHPuJ9yDLUw70yh4JOiPw03WL0CQ@mail.gmail.com
Whole thread Raw
Responses Re: A failure in t/038_save_logical_slots_shutdown.pl
Re: A failure in t/038_save_logical_slots_shutdown.pl
List pgsql-hackers
Hi,

I've been observing a failure in t/038_save_logical_slots_shutdown.pl
of late on my developer system:

t/038_save_logical_slots_shutdown.pl .. 1/?
#   Failed test 'Check that the slot's confirmed_flush LSN is the same
as the latest_checkpoint location'
#   at t/038_save_logical_slots_shutdown.pl line 35.
# Looks like you failed 1 test of 2.
t/038_save_logical_slots_shutdown.pl .. Dubious, test returned 1
(wstat 256, 0x100)
Failed 1/2 subtests

I did a quick analysis of the failure and commit
https://github.com/postgres/postgres/commit/e0b2eed047df9045664da6f724cb42c10f8b12f0
that introduced this test. I think the issue is that the slot's
confirmed_flush LSN (0/1508000) and shutdown checkpoint LSN
(0/1508018) are not the same:

tmp_check/log/038_save_logical_slots_shutdown_pub.log:

2024-01-10 07:55:44.539 UTC [57621] sub LOG:  starting logical
decoding for slot "sub"
2024-01-10 07:55:44.539 UTC [57621] sub DETAIL:  Streaming
transactions committing after 0/1508000, reading WAL from 0/1507FC8.
2024-01-10 07:55:44.539 UTC [57621] sub STATEMENT:  START_REPLICATION
SLOT "sub" LOGICAL 0/0 (proto_version '4', origin 'any',
publication_names '"pub"')

ubuntu:~/postgres$ pg17/bin/pg_controldata -D
src/test/recovery/tmp_check/t_038_save_logical_slots_shutdown_pub_data/pgdata/
Database cluster state:               in production
pg_control last modified:             Wed Jan 10 07:55:44 2024
Latest checkpoint location:           0/1508018
Latest checkpoint's REDO location:    0/1508018

But the tests added by t/038_save_logical_slots_shutdown.pl expects
both LSNs to be same:

sub compare_confirmed_flush
{
    # Is it same as the value read from log?
    ok( $latest_checkpoint eq $confirmed_flush_from_log,
        "Check that the slot's confirmed_flush LSN is the same as the
latest_checkpoint location"
    );

I suspect that it's quite not right to expect the slot's
confirmed_flush and latest checkpoint location to be same in the test.
This is because the shutdown checkpoint gets an LSN that's greater
than the slot's confirmed_flush LSN - see the shutdown checkpoint
record getting inserted into WAL after the slot is marked dirty in
CheckPointReplicationSlots().

With this analysis in mind, I think the tests need to do something
like the following:

diff --git a/src/test/recovery/t/038_save_logical_slots_shutdown.pl
b/src/test/recovery/t/038_save_logical_slots_shut
down.pl
index 5a4f5dc1d4..d49e6014fc 100644
--- a/src/test/recovery/t/038_save_logical_slots_shutdown.pl
+++ b/src/test/recovery/t/038_save_logical_slots_shutdown.pl
@@ -32,7 +32,7 @@ sub compare_confirmed_flush
          unless defined($latest_checkpoint);

        # Is it same as the value read from log?
-       ok( $latest_checkpoint eq $confirmed_flush_from_log,
+       ok( $latest_checkpoint ge $confirmed_flush_from_log,
                "Check that the slot's confirmed_flush LSN is the same
as the latest_checkpoint location"
        );

Thoughts?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Commitfest 2024-01 first week update
Next
From: vignesh C
Date:
Subject: Re: Track in pg_replication_slots the reason why slots conflict?