Thread: 2nd PostgreSQL server in WAL shipping cluster fails to start

2nd PostgreSQL server in WAL shipping cluster fails to start

From
Samuel VISCAPI
Date:

Dear all,

 

Some years ago I set up a two PostgreSQL 13 nodes cluster on Debian 11. This cluster uses the Write-Ahead Log Shipping method. I’ve just been told the second server (in standby mode) is down and refuses to start again with the following error message (roughly translated from French) :

 

2025-01-09 09:46:35.742 CET [3147382] FATAL:  could not receive data from WAL stream: ERROR:  segment requested from transaction journal, 000000010000000D000000D6, has already been removed

2025-01-09 09:46:40.745 CET [3147395] LOG:  journal flow started from primary at D/D6000000 on timeline 1

2025-01-09 09:46:40.745 CET [3147395] FATAL:  could not receive data from WAL stream : ERROR:  segment requested from transaction journal, 000000010000000D000000D6, has already been removed

2025-01-09 09:46:45.749 CET [3147397] LOG: journal flow started from primary at D/D6000000 on timeline 1

2025-01-09 09:46:45.749 CET [3147397] FATAL:  could not receive data from WAL stream : ERROR:  segment requested from transaction journal, 000000010000000D000000D6, has already been removed

2025-01-09 09:46:50.753 CET [3147424] LOG:  journal flow started from primary at D/D6000000 on timeline 1

2025-01-09 09:46:50.753 CET [3147424] FATAL:  could not receive data from WAL stream : ERROR:  segment requested from transaction journal, 000000010000000D000000D6, has already been removed

 

Would you have any idea how to fix this issue ?

 

Best regards,

 

Samuel

Re: 2nd PostgreSQL server in WAL shipping cluster fails to start

From
Alvaro Herrera
Date:
On 2025-Jan-09, Samuel VISCAPI wrote:

> Dear all,
> 
> Some years ago I set up a two PostgreSQL 13 nodes cluster on Debian 11. This cluster uses the Write-Ahead Log
Shippingmethod. I've just been told the second server (in standby mode) is down and refuses to start again with the
followingerror message (roughly translated from French) :
 
> 
> 2025-01-09 09:46:35.742 CET [3147382] FATAL:  could not receive data from WAL stream: ERROR:  segment requested from
transactionjournal, 000000010000000D000000D6, has already been removed
 
> 2025-01-09 09:46:40.745 CET [3147395] LOG:  journal flow started from primary at D/D6000000 on timeline 1
> 2025-01-09 09:46:40.745 CET [3147395] FATAL:  could not receive data from WAL stream : ERROR:  segment requested from
transactionjournal, 000000010000000D000000D6, has already been removed
 
> 2025-01-09 09:46:45.749 CET [3147397] LOG: journal flow started from primary at D/D6000000 on timeline 1
> 2025-01-09 09:46:45.749 CET [3147397] FATAL:  could not receive data from WAL stream : ERROR:  segment requested from
transactionjournal, 000000010000000D000000D6, has already been removed
 
> 2025-01-09 09:46:50.753 CET [3147424] LOG:  journal flow started from primary at D/D6000000 on timeline 1
> 2025-01-09 09:46:50.753 CET [3147424] FATAL:  could not receive data from WAL stream : ERROR:  segment requested from
transactionjournal, 000000010000000D000000D6, has already been removed
 

This means the standby is requesting a segment that was already removed.
You may be able to find those files in a WAL archive, if you have
archive_command set in the primary.  If you do, then it would work to
copy those to the standby's pg_wal/ subdirectory.  If you don't have
them, then the replica must be rebuilt.


Note that it's not a good idea to translate the error messages when
posting -- that is, it's better to post exactly what the log file has.
For helpfulness you could post a translation for the lines separately.
But the developers can find the translated messages in the source code,
if they need them ... but if you translate them yourself, there's no way
to know exactly which ones they are.  Sometimes subtle differences are
important cues.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/