The following bug has been logged on the website:
Bug reference: 14538
Logged by: Vladimir Svedov
Email address: vodevsh@gmail.com
PostgreSQL version: 9.3.11
Operating system: Linux
Description:
Hi,
I posted formatted text as a question
here:http://dba.stackexchange.com/questions/163735/streaming-replication-same-wal-missing
Several days ago I saw this in master log:
replica 2017-02-06 16:10:55 UTC ERROR: requested WAL segment
000000030000096D00000052 has already been removed
This message was repeating until I stopped slave.
Checking pg_stat_activity on prod showed autovacuum task on table ~250GB, I
decided that it produced huge amount of WALs. Checking them confirmed my
assumption - all wal_keep_segments=500 were last minute. So I thought that
network bandwidth was not sufficient to send WALs fast enough to slave to
replay them. Master has archive_command = cd ., so no way back.
I did not panic, I recreate replication. I:
delete all data files on slave
pg_start_backup() on prod
rsync -zahv data
pg_start slave
pg_stop_backup() on prod and check logs.
Imagine my desperation to see that slave has this:
-bash-4.2$ head -n 20 /pg/data93/pg_log/postgresql-2017-02-09_121514.log
LOG: could not open usermap file "/pg/data93/pg_ident.conf": No such file
or directory
LOG: database system was shut down in recovery at 2017-02-09 12:15:07 UTC
LOG: entering standby mode
LOG: redo starts at 982/39074640
FATAL: the database system is starting up
FATAL: the database system is starting up
FATAL: the database system is starting up
LOG: consistent recovery state reached at 982/3AEC60F8
LOG: unexpected pageaddr 97C/83EC8000 in log segment
00000003000009820000003A, offset 15499264
LOG: database system is ready to accept read only connections
LOG: started streaming WAL from primary at 982/3A000000 on timeline 3
ERROR: requested WAL segment 000000030000096D00000052 has already been
removed
ERROR: requested WAL segment 000000030000096D00000052 has already been
removed
ERROR: requested WAL segment 000000030000096D00000052 has already been
removed
ERROR: requested WAL segment 000000030000096D00000052 has already been
removed
ERROR: requested WAL segment 000000030000096D00000052 has already been
removed
ERROR: requested WAL segment 000000030000096D00000052 has already been
removed
and so on... 3 days passed! That segment has gone, but why is it looking for
it??? I copied whole data directory.
All pages are readable (pg_dump works on all dbs), changes from master go to
slave. ERROR keeps appearing each several seconds on slave.
Please, tell me I did something stupid and it is not a bug. It happened on
prod.
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs