SR Slave leaves off every 32nd wal segment ?! - Mailing list pgsql-general
From | hubert depesz lubaczewski |
---|---|
Subject | SR Slave leaves off every 32nd wal segment ?! |
Date | |
Msg-id | 20130521201744.GA13666@depesz.com Whole thread Raw |
List | pgsql-general |
Hi, We have following situation: Pg master on US east coast. Pg slave on US west coast. Replication set using omnipitr + streaming replication. Setup on slave: postgres=# select name, setting from pg_settings where name ~ '(checkpo|wal)'; name | setting ------------------------------+------------- checkpoint_completion_target | 0.9 checkpoint_segments | 30 checkpoint_timeout | 300 checkpoint_warning | 30 log_checkpoints | on max_wal_senders | 3 wal_block_size | 8192 wal_buffers | 2048 wal_keep_segments | 0 wal_level | hot_standby wal_receiver_status_interval | 10 wal_segment_size | 2048 wal_sync_method | fdatasync wal_writer_delay | 200 (14 rows) Replication is working usually well: $ grep -c 'FATAL:..could not receive data from WAL stream:' postgresql-2013-05-* postgresql-2013-05-14.log:7 postgresql-2013-05-15.log:7 postgresql-2013-05-16.log:9 postgresql-2013-05-17.log:9 postgresql-2013-05-18.log:8 postgresql-2013-05-19.log:8 postgresql-2013-05-20.log:7 postgresql-2013-05-21.log:7 After each such break, Pg switches to omnipitr, which does recover some wal files, and then it switches back to SR. All looksfine. But. In pg_xlog on slave we have lots of files - almost 500 now. I.e. WAL segments. What's weird is their numbering. List of 500 files is too big for now, so just last 50: pg_xlog$ ls -l | tail -n 50 -rw------- 1 postgres postgres 16777216 May 16 22:41 000000010000008B00000009 -rw------- 1 postgres postgres 16777216 May 17 01:16 000000010000008B00000029 -rw------- 1 postgres postgres 16777216 May 17 03:56 000000010000008B00000049 -rw------- 1 postgres postgres 16777216 May 17 06:36 000000010000008B00000069 -rw------- 1 postgres postgres 16777216 May 17 09:16 000000010000008B00000089 -rw------- 1 postgres postgres 16777216 May 17 11:56 000000010000008B000000A9 -rw------- 1 postgres postgres 16777216 May 17 14:36 000000010000008B000000C9 -rw------- 1 postgres postgres 16777216 May 17 17:16 000000010000008B000000E9 -rw------- 1 postgres postgres 16777216 May 17 19:56 000000010000008C0000000A -rw------- 1 postgres postgres 16777216 May 17 22:36 000000010000008C0000002A -rw------- 1 postgres postgres 16777216 May 18 01:12 000000010000008C0000004A -rw------- 1 postgres postgres 16777216 May 18 03:52 000000010000008C0000006A -rw------- 1 postgres postgres 16777216 May 18 06:32 000000010000008C0000008A -rw------- 1 postgres postgres 16777216 May 18 09:13 000000010000008C000000AA -rw------- 1 postgres postgres 16777216 May 18 14:33 000000010000008C000000EA -rw------- 1 postgres postgres 16777216 May 18 17:13 000000010000008D0000000B -rw------- 1 postgres postgres 16777216 May 18 19:53 000000010000008D0000002B -rw------- 1 postgres postgres 16777216 May 18 22:33 000000010000008D0000004B -rw------- 1 postgres postgres 16777216 May 19 01:05 000000010000008D0000006B -rw------- 1 postgres postgres 16777216 May 19 03:45 000000010000008D0000008B -rw------- 1 postgres postgres 16777216 May 19 06:25 000000010000008D000000AB -rw------- 1 postgres postgres 16777216 May 19 09:05 000000010000008D000000CB -rw------- 1 postgres postgres 16777216 May 19 11:45 000000010000008D000000EB -rw------- 1 postgres postgres 16777216 May 19 14:25 000000010000008E0000000C -rw------- 1 postgres postgres 16777216 May 19 17:05 000000010000008E0000002C -rw------- 1 postgres postgres 16777216 May 19 19:45 000000010000008E0000004C -rw------- 1 postgres postgres 16777216 May 19 22:25 000000010000008E0000006C -rw------- 1 postgres postgres 16777216 May 20 01:01 000000010000008E0000008C -rw------- 1 postgres postgres 16777216 May 20 03:41 000000010000008E000000AC -rw------- 1 postgres postgres 16777216 May 20 06:21 000000010000008E000000CC -rw------- 1 postgres postgres 16777216 May 20 09:01 000000010000008E000000EC -rw------- 1 postgres postgres 16777216 May 20 11:41 000000010000008F0000000D -rw------- 1 postgres postgres 16777216 May 20 14:21 000000010000008F0000002D -rw------- 1 postgres postgres 16777216 May 20 17:01 000000010000008F0000004D -rw------- 1 postgres postgres 16777216 May 20 19:41 000000010000008F0000006D -rw------- 1 postgres postgres 16777216 May 20 22:21 000000010000008F0000008D -rw------- 1 postgres postgres 16777216 May 21 01:00 000000010000008F000000AD -rw------- 1 postgres postgres 16777216 May 21 03:35 000000010000008F000000CD -rw------- 1 postgres postgres 16777216 May 21 06:15 000000010000008F000000ED -rw------- 1 postgres postgres 16777216 May 21 08:55 00000001000000900000000E -rw------- 1 postgres postgres 16777216 May 21 11:35 00000001000000900000002E -rw------- 1 postgres postgres 16777216 May 21 14:15 00000001000000900000004E -rw------- 1 postgres postgres 16777216 May 21 16:55 00000001000000900000006E -rw------- 1 postgres postgres 16777216 May 21 19:35 00000001000000900000008E -rw------- 1 postgres postgres 16777216 May 21 20:00 000000010000009000000093 -rw------- 1 postgres postgres 16777216 May 21 20:05 000000010000009000000094 -rw------- 1 postgres postgres 16777216 May 21 20:10 000000010000009000000095 -rw------- 1 postgres postgres 16777216 May 21 20:14 000000010000009000000096 -rw------- 1 postgres postgres 16777216 May 21 19:55 000000010000009000000097 drwx------ 2 postgres postgres 28672 May 21 20:10 archive_status Aside from couple of last ones, every 32nd wal segment is kept for some reason. Pg is 9.2.4. All these "older" (that is aside from 5 newest) have *.ready 0-byte files in archive_status. archiving on slave is /bin/true only: postgres=# select name, setting from pg_settings where name ~ '(archive)'; name | setting ---------------------------+----------- archive_command | /bin/true archive_mode | on archive_timeout | 3600 max_standby_archive_delay | 30 (4 rows) I am at loss, no idea what could be wrong. I mean - sure, I can remove all those old files, and we'll be fine, but new ones will most likely accumulate, and removing them by hand, or from a cronjob is kind of pointless. Is there anything I could check that could lead to finding cause of this weird behavior? Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/
pgsql-general by date: