Postgresql 9.5: Streaming Replication: Secondaries Fail To Start Post WAL Error - Mailing list pgsql-admin

From Mohan NBSPS
Subject Postgresql 9.5: Streaming Replication: Secondaries Fail To Start Post WAL Error
Date
Msg-id CAPCvfWcm0JDC+q54MSW7N90PYvh+PefaP6SxfonbkGcUwpS1+g@mail.gmail.com
Whole thread Raw
Responses Re: Postgresql 9.5: Streaming Replication: Secondaries Fail To Start Post WAL Error
List pgsql-admin
Dear Community,

I am trying to understand why all the secondary databases failed to start
after seeing a WAL related error for some time.

Timeline:

2024-04-19: WAL errors appear in the secondary database nodes

```
LOG: invalid resource manager ID 55 at 40/F46CBCA8
```

- the secondaries did not lag in replication
  - monitored via query
```
pg_last_xact_replay_timestamp
```

- 2024-05-02; Secondaries reboot and fail to start up

```
FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000010000004100000049 has already been removed
 FATAL:  the database system is starting up
```

from my understanding, the WAL file is streamed over the network (secondary pulls from primary) and creates a WAL file in the secondary.
then it replays the copied WAL file using a different process.

in order for the local WAL file to go out of sync,

1. the primary removed the WAL file, the secondary was streaming
2. the WAL file on the secondary got corrupted
3 ....

Questions

- what do those error messages mean ?
- how can I prevent this from happening ?

- references

Any advice/information is highly appreciated.
thank you
mohan

pgsql-admin by date:

Previous
From: Muhammad Imtiaz
Date:
Subject: Re: Pg_squeze
Next
From: Johannes Truschnigg
Date:
Subject: Re: Postgresql 9.5: Streaming Replication: Secondaries Fail To Start Post WAL Error