Re: Catalog_xmin is not advanced when a logical slot is lost - Mailing list pgsql-hackers

From sirisha chamarthi
Subject Re: Catalog_xmin is not advanced when a logical slot is lost
Date
Msg-id CAKrAKeVW2uw70aXCANWz16QCVY38kOzCj7uhmm7HhqaSWiGspA@mail.gmail.com
Whole thread Raw
In response to Re: Catalog_xmin is not advanced when a logical slot is lost  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: Catalog_xmin is not advanced when a logical slot is lost
List pgsql-hackers


On Mon, Nov 21, 2022 at 9:12 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-Nov-21, sirisha chamarthi wrote:

> On Mon, Nov 21, 2022 at 8:05 AM Alvaro Herrera <alvherre@alvh.no-ip.org>
> wrote:

> > Thank you.  I had pushed mine for CirrusCI to test, and it failed the
> > assert I added in slot.c:
> > https://cirrus-ci.com/build/4786354503548928
> > Not yet sure why, looking into it.
>
> Can this be because restart_lsn is not set to InvalidXLogRecPtr for the
> physical slots?

Hmm, that makes no sense.  Is that yet another bug?  Looking.

It appears to be. wal_sender is setting restart_lsn to a valid LSN even when the slot is invalidated.

postgres=# select pg_Create_physical_replication_slot('s1');
 pg_create_physical_replication_slot
-------------------------------------
 (s1,)
(1 row)

postgres=# select * from pg_replication_slots;
 slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+------------+---------------+-----------
 s1        |        | physical  |        |          | f         | f      |            |      |              |             |                     |            |   -8254390272 | f
(1 row)

postgres=# checkpoint;
CHECKPOINT
postgres=# select * from pg_replication_slots;
 slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+------------+---------------+-----------
 s1        |        | physical  |        |          | f         | f      |            |      |              |             |                     |            |   -8374095064 | f
(1 row)

postgres=# \q
postgres@pgvm:~$ /usr/local/pgsql/bin/pg_receivewal -S s1 -D .
pg_receivewal: error: unexpected termination of replication stream: ERROR:  requested WAL segment 0000000100000000000000EB has already been removed
pg_receivewal: disconnected; waiting 5 seconds to try again
^Cpostgres@pgvm:~$ /usr/local/pgsql/bin/psql
psql (16devel)
Type "help" for help.

postgres=# select * from pg_replication_slots;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
!?> ^C
!?> 


In the log:
2022-11-21 17:31:48.159 UTC [3953664] STATEMENT:  START_REPLICATION SLOT "s1" 0/EB000000 TIMELINE 1
TRAP: failed Assert("XLogRecPtrIsInvalid(slot_contents.data.restart_lsn)"), File: "slotfuncs.c", Line: 371, PID: 3953707
 

--
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"No es bueno caminar con un hombre muerto"

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pgsql: Prevent instability in contrib/pageinspect's regression test.
Next
From: Robert Haas
Date:
Subject: Re: Damage control for planner's get_actual_variable_endpoint() runaway