Thread: WAL replay asking for very old WAL

WAL replay asking for very old WAL

From
Stephen Kuntz
Date:

Hello,

 

I have a 2 servers that are geographically separated be about 570KM which are connected via LAN extension. The slave at location 2 fell behind and I’ve had to restore it with pg_basebackup then replay my WAL files. Part way through the replay I get an error and then it starts asking for an old out of sequence WAL file that I no longer have. It continues to restore the WAL files in the sequence but continually askes for the old one, even after it has caught up to the live environment. I’ve tried it three separate times and it is always the same file name. There is un unlisten command that seems to trigger this to happen, but I’m not sure where it came from.

 

Any help is appreciated.

 

PostgreSQL 9.3.3

 

2016-03-06 00:25:45 UTC LOG:  restored log file "000000010000578800000038" from archive

2016-03-06 00:25:45 UTC FATAL:  the database system is starting up

2016-03-06 00:25:45 UTC FATAL:  the database system is starting up

2016-03-06 00:25:45 UTC FATAL:  the database system is starting up

2016-03-06 00:25:45 UTC FATAL:  the database system is starting up

2016-03-06 00:25:45 UTC FATAL:  the database system is starting up

2016-03-06 00:25:47 UTC LOG:  restored log file "000000010000578800000039" from archive

2016-03-06 00:25:48 UTC LOG:  restored log file "00000001000057880000003A" from archive

2016-03-06 00:25:49 UTC FATAL:  the database system is starting up

2016-03-06 00:25:50 UTC LOG:  restored log file "00000001000057880000003B" from archive

2016-03-06 00:25:50 UTC FATAL:  the database system is starting up

2016-03-06 00:25:51 UTC LOG:  restored log file "00000001000057880000003C" from archive

2016-03-06 00:25:53 UTC LOG:  restored log file "00000001000057880000003D" from archive

2016-03-06 00:25:53 UTC LOG:  consistent recovery state reached at 5788/3DCB5EF8

2016-03-06 00:25:53 UTC LOG:  database system is ready to accept read only connections

2016-03-06 00:25:54 UTC ERROR:  cannot execute UNLISTEN during recovery

2016-03-06 00:25:54 UTC STATEMENT:  unlisten *

2016-03-06 00:25:54 UTC ERROR:  cannot execute UNLISTEN during recovery

2016-03-06 00:25:54 UTC STATEMENT:  unlisten *

2016-03-06 00:25:54 UTC LOG:  restored log file "00000001000057880000003E" from archive

2016-03-06 00:25:54 UTC ERROR:  requested WAL segment 00000001000054540000007E has already been removed

2016-03-06 00:25:55 UTC ERROR:  cannot execute UNLISTEN during recovery

2016-03-06 00:25:55 UTC STATEMENT:  unlisten *

2016-03-06 00:25:55 UTC ERROR:  cannot execute UNLISTEN during recovery

2016-03-06 00:25:55 UTC STATEMENT:  unlisten *

2016-03-06 00:25:55 UTC ERROR:  cannot execute UNLISTEN during recovery

2016-03-06 00:25:55 UTC STATEMENT:  unlisten *

2016-03-06 00:25:55 UTC ERROR:  cannot execute UNLISTEN during recovery

2016-03-06 00:25:55 UTC STATEMENT:  unlisten *

2016-03-06 00:25:56 UTC LOG:  restored log file "00000001000057880000003F" from archive

2016-03-06 00:25:57 UTC LOG:  restored log file "000000010000578800000040" from archive

2016-03-06 00:25:58 UTC LOG:  restored log file "000000010000578800000041" from archive

2016-03-06 00:26:00 UTC ERROR:  requested WAL segment 00000001000054540000007E has already been removed

2016-03-06 00:26:00 UTC LOG:  restored log file "000000010000578800000042" from archive

2016-03-06 00:26:00 UTC ERROR:  cannot execute UNLISTEN during recovery

2016-03-06 00:26:00 UTC STATEMENT:  unlisten *

2016-03-06 00:26:01 UTC LOG:  restored log file "000000010000578800000043" from archive

2016-03-06 00:26:02 UTC LOG:  restored log file "000000010000578800000044" from archive

2016-03-06 00:26:03 UTC ERROR:  cannot execute UNLISTEN during recovery

2016-03-06 00:26:03 UTC STATEMENT:  unlisten *

2016-03-06 00:26:03 UTC ERROR:  cannot execute UNLISTEN during recovery

2016-03-06 00:26:03 UTC STATEMENT:  unlisten *

2016-03-06 00:26:03 UTC ERROR:  cannot execute UNLISTEN during recovery

2016-03-06 00:26:03 UTC STATEMENT:  unlisten *

2016-03-06 00:26:03 UTC LOG:  restored log file "000000010000578800000045" from archive

2016-03-06 00:26:04 UTC LOG:  restored log file "000000010000578800000046" from archive

2016-03-06 00:26:05 UTC ERROR:  requested WAL segment 00000001000054540000007E has already been removed

2016-03-06 00:26:06 UTC LOG:  restored log file "000000010000578800000047" from archive

2016-03-06 00:26:07 UTC LOG:  restored log file "000000010000578800000048" from archive

2016-03-06 00:26:07 UTC LOG:  restored log file "000000010000578800000049" from archive

2016-03-06 00:26:09 UTC LOG:  restored log file "00000001000057880000004A" from archive

2016-03-06 00:26:10 UTC ERROR:  requested WAL segment 00000001000054540000007E has already been removed….

Re: WAL replay asking for very old WAL

From
MirrorX
Date:
could it be that you have some other slave trying to replicate from this one
(that was last built) and it has fallen behind and is asking for that old
WAL file?






--
View this message in context: http://postgresql.nabble.com/WAL-replay-asking-for-very-old-WAL-tp5891050p5891066.html
Sent from the PostgreSQL - admin mailing list archive at Nabble.com.


Re: WAL replay asking for very old WAL

From
Stephen Kuntz
Date:
I think you're correct, I did have another server replication off of this one, but could this cause the server to stay
inrecover mode? 

In the log below it has caught up but doesn't not leave recovery mode even after a restart of the service.

2016-03-07 16:04:52 UTC FATAL: the database system is starting up
2016-03-07 16:04:52 UTC LOG: restored log file "000000010000581D00000082" from archive
2016-03-07 16:04:53 UTC FATAL: the database system is starting up
2016-03-07 16:04:53 UTC FATAL: the database system is starting up
2016-03-07 16:04:54 UTC LOG: restored log file "000000010000581D00000083" from archive
2016-03-07 16:04:55 UTC FATAL: the database system is starting up
2016-03-07 16:04:55 UTC LOG: restored log file "000000010000581D00000084" from archive
2016-03-07 16:04:55 UTC LOG: consistent recovery state reached at 581D/84D5FD40
2016-03-07 16:04:55 UTC LOG: database system is ready to accept read only connections
2016-03-07 16:04:56 UTC LOG: unexpected pageaddr 581B/44000000 in log segment 000000010000581D00000085, offset 0
2016-03-07 16:04:56 UTC LOG: started streaming WAL from primary at 581D/84000000 on timeline 1
2016-03-07 16:04:56 UTC ERROR: cannot execute UNLISTEN during recovery
2016-03-07 16:04:56 UTC STATEMENT: unlisten *
2016-03-07 16:04:56 UTC ERROR: cannot execute UNLISTEN during recovery
2016-03-07 16:04:56 UTC STATEMENT: unlisten *
2016-03-07 16:04:57 UTC ERROR: cannot execute UNLISTEN during recovery
2016-03-07 16:04:57 UTC STATEMENT: unlisten *
2016-03-07 16:04:57 UTC ERROR: cannot execute UNLISTEN during recovery
2016-03-07 16:04:57 UTC STATEMENT: unlisten *
2016-03-07 16:04:57 UTC ERROR: cannot execute UNLISTEN during recovery
2016-03-07 16:04:57 UTC STATEMENT: unlisten *
2016-03-07 16:04:58 UTC ERROR: cannot execute UNLISTEN during recovery
2016-03-07 16:04:58 UTC STATEMENT: unlisten *
2016-03-07 16:04:58 UTC ERROR: cannot execute UNLISTEN during recovery
2016-03-07 16:04:58 UTC STATEMENT: unlisten *
2016-03-07 16:05:03 UTC ERROR: cannot execute UNLISTEN during recovery
2016-03-07 16:05:03 UTC STATEMENT: unlisten *
2016-03-07 16:05:06 UTC ERROR: cannot execute UNLISTEN during recovery
2016-03-07 16:05:06 UTC STATEMENT: unlisten *
2016-03-07 16:05:08 UTC ERROR: cannot execute UNLISTEN during recovery
2016-03-07 16:05:08 UTC STATEMENT: unlisten *
2016-03-07 16:05:08 UTC ERROR: cannot execute UNLISTEN during recovery
2016-03-07 16:05:08 UTC STATEMENT: unlisten *
2016-03-07 16:05:13 UTC ERROR: cannot execute UNLISTEN during recovery
2016-03-07 16:05:13 UTC STATEMENT: unlisten *

Stephen Kuntz | Systems Administrator
Pelmorex Media Inc.
T: 905.829.1159 x1376

-----Original Message-----
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of MirrorX
Sent: Monday, March 7, 2016 10:48 AM
To: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] WAL replay asking for very old WAL

could it be that you have some other slave trying to replicate from this one (that was last built) and it has fallen
behindand is asking for that old WAL file? 






--
View this message in context: http://postgresql.nabble.com/WAL-replay-asking-for-very-old-WAL-tp5891050p5891066.html
Sent from the PostgreSQL - admin mailing list archive at Nabble.com.


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


Re: WAL replay asking for very old WAL

From
MirrorX
Date:
sorry i don't follow.
the server will stay in recovery mode as long as a recovery.conf file exists
and the host is set up to receive xlogs via streaming from the master. this
seems to be the case according to the logs, if i am not wrong.

what it is you want to happen? the server to stop applying records and come
online in r/w mode?
in this case you could promote the server. or if you want just to apply some
wal files and then come online then i think you have misconfigured your
recovery.conf. i would suggest you paste here what your recovery.conf
contains and what the objective is



Best,



--
View this message in context: http://postgresql.nabble.com/WAL-replay-asking-for-very-old-WAL-tp5891050p5891074.html
Sent from the PostgreSQL - admin mailing list archive at Nabble.com.


Re: WAL replay asking for very old WAL

From
Stephen Kuntz
Date:
Sorry for the confusion, I am by no means a postgresql guru.

In my other slave servers I don't have the message:

2016-03-07 16:23:36 UTC ERROR: cannot execute UNLISTEN during recovery
2016-03-07 16:23:36 UTC STATEMENT: unlisten *

In the logs once they are caught up to the master. It also doesn't seem to switch to standby mode properly and stream
fromthe master. When I restart, it replays the WAL files since the last restart from the archive. This data should have
beenstreamed from the master, no? 

standby_mode = 'on'
primary_conninfo = 'user=<replicationuser> password=<mypassword> host=masterserver port=5432 sslmode=prefer
sslcompression=1krbsrvname=postgres' 
restore_command = 'copy "\\\\archivefileserver\\BACKUP\\DatabaseBackups\\wal_archive\\%f" "%p"'


Stephen Kuntz | Systems Administrator
Pelmorex Media Inc.
T: 905.829.1159 x1376

-----Original Message-----
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of MirrorX
Sent: Monday, March 7, 2016 11:13 AM
To: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] WAL replay asking for very old WAL

sorry i don't follow.
the server will stay in recovery mode as long as a recovery.conf file exists and the host is set up to receive xlogs
viastreaming from the master. this seems to be the case according to the logs, if i am not wrong. 

what it is you want to happen? the server to stop applying records and come online in r/w mode?
in this case you could promote the server. or if you want just to apply some wal files and then come online then i
thinkyou have misconfigured your recovery.conf. i would suggest you paste here what your recovery.conf contains and
whatthe objective is 



Best,



--
View this message in context: http://postgresql.nabble.com/WAL-replay-asking-for-very-old-WAL-tp5891050p5891074.html
Sent from the PostgreSQL - admin mailing list archive at Nabble.com.


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


Re: WAL replay asking for very old WAL

From
MirrorX
Date:
this message ->
--
2016-03-07 16:04:56 UTC LOG: started streaming WAL from primary at
581D/84000000 on timeline 1
--

that the server has connected to the master via streaming replication so it
should get the new xlogs without restarting. you can check that this is
happening if you check the pg_last_xlog_receive_location() and
pg_last_xlog_replay_location().

if the only problem is that the database is not online in read-only mode,
then maybe the only thing missing is to enable the 'hot_standby = on' in the
postgresql.conf






--
View this message in context: http://postgresql.nabble.com/WAL-replay-asking-for-very-old-WAL-tp5891050p5891083.html
Sent from the PostgreSQL - admin mailing list archive at Nabble.com.


Re: WAL replay asking for very old WAL

From
Stephen Kuntz
Date:
I guess the only issue I have existed between the keyboard and chair. Thanks for your time and patients.

Stephen Kuntz | Systems Administrator
Pelmorex Media Inc.
T: 905.829.1159 x1376

-----Original Message-----
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of MirrorX
Sent: Monday, March 7, 2016 12:01 PM
To: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] WAL replay asking for very old WAL

this message ->
--
2016-03-07 16:04:56 UTC LOG: started streaming WAL from primary at
581D/84000000 on timeline 1
--

that the server has connected to the master via streaming replication so it should get the new xlogs without
restarting.you can check that this is happening if you check the pg_last_xlog_receive_location() and
pg_last_xlog_replay_location().

if the only problem is that the database is not online in read-only mode, then maybe the only thing missing is to
enablethe 'hot_standby = on' in the postgresql.conf 






--
View this message in context: http://postgresql.nabble.com/WAL-replay-asking-for-very-old-WAL-tp5891050p5891083.html
Sent from the PostgreSQL - admin mailing list archive at Nabble.com.


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin