Thread: Logical replication slot wal_status "lost" with max_slot_wal_keep_size = -1


My project's replication is failing with the following error:

2024-10-15 14:03:38.446 UTC [2840947] STATEMENT:  SELECT pg_catalog.set_config('search_path', '', false);
2024-10-15 14:03:38.446 UTC [2840947] ERROR:  cannot read from logical replication slot "track_subscription"
2024-10-15 14:03:38.446 UTC [2840947] DETAIL:  This slot has been invalidated because it exceeded the maximum reserved size.
2024-10-15 14:03:38.446 UTC [2840947] STATEMENT:  START_REPLICATION SLOT "track_subscription" LOGICAL 1380B/CBFAEFF0 (proto_version '2', publication_names '"track_ingestion"')


trackdb=# select * from pg_replication_slots;
     slot_name      |  plugin  | slot_type | datoid | database | temporary | active | active_pid | xmin |
 catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase
--------------------+----------+-----------+--------+----------+-----------+--------+------------+------+
--------------+-------------+---------------------+------------+---------------+-----------
 track_subscription | pgoutput | logical   |  16402 | trackdb  | f         | f      |            |      |
    406428081 |             | 1380B/BAB7B328      | lost       |               | f

Publisher and Subscriber DB versions:
PostgreSQL 14.12 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-22), 64-bit

Publisher System settings:
max_slot_wal_keep_size = -1
max_wal_size = 12GB
wal_keep_size = 0

I have controls in place to prevent the replication lag from growing too much but was surprised to see the wal_status become "lost" given what I read about the default value for max_slot_keep_size.
My search of this problem suggests I should increase max_wal_size to 96GB and perhaps set max_slot_wal_keep_size = 0.
Is this correct or is there something else I should do to prevent this from ever happening again?

Thanks,
Dennis