SR Slave leaves off every 32nd wal segment ?! - Mailing list pgsql-general

From hubert depesz lubaczewski
Subject SR Slave leaves off every 32nd wal segment ?!
Date
Msg-id 20130521201744.GA13666@depesz.com
Whole thread Raw
List pgsql-general
Hi,
We have following situation:
Pg master on US east coast.
Pg slave on US west coast.

Replication set using omnipitr + streaming replication.

Setup on slave:
postgres=# select name, setting from pg_settings where name ~ '(checkpo|wal)';
             name             |   setting
------------------------------+-------------
 checkpoint_completion_target | 0.9
 checkpoint_segments          | 30
 checkpoint_timeout           | 300
 checkpoint_warning           | 30
 log_checkpoints              | on
 max_wal_senders              | 3
 wal_block_size               | 8192
 wal_buffers                  | 2048
 wal_keep_segments            | 0
 wal_level                    | hot_standby
 wal_receiver_status_interval | 10
 wal_segment_size             | 2048
 wal_sync_method              | fdatasync
 wal_writer_delay             | 200
(14 rows)

Replication is working usually well:
$ grep -c 'FATAL:..could not receive data from WAL stream:' postgresql-2013-05-*
postgresql-2013-05-14.log:7
postgresql-2013-05-15.log:7
postgresql-2013-05-16.log:9
postgresql-2013-05-17.log:9
postgresql-2013-05-18.log:8
postgresql-2013-05-19.log:8
postgresql-2013-05-20.log:7
postgresql-2013-05-21.log:7

After each such break, Pg switches to omnipitr, which does recover some wal files, and then it switches back to SR. All
looksfine. 

But. In pg_xlog on slave we have lots of files - almost 500 now. I.e. WAL segments.

What's weird is their numbering. List of 500 files is too big for now, so just last 50:

pg_xlog$ ls -l | tail -n 50
-rw------- 1 postgres postgres 16777216 May 16 22:41 000000010000008B00000009
-rw------- 1 postgres postgres 16777216 May 17 01:16 000000010000008B00000029
-rw------- 1 postgres postgres 16777216 May 17 03:56 000000010000008B00000049
-rw------- 1 postgres postgres 16777216 May 17 06:36 000000010000008B00000069
-rw------- 1 postgres postgres 16777216 May 17 09:16 000000010000008B00000089
-rw------- 1 postgres postgres 16777216 May 17 11:56 000000010000008B000000A9
-rw------- 1 postgres postgres 16777216 May 17 14:36 000000010000008B000000C9
-rw------- 1 postgres postgres 16777216 May 17 17:16 000000010000008B000000E9
-rw------- 1 postgres postgres 16777216 May 17 19:56 000000010000008C0000000A
-rw------- 1 postgres postgres 16777216 May 17 22:36 000000010000008C0000002A
-rw------- 1 postgres postgres 16777216 May 18 01:12 000000010000008C0000004A
-rw------- 1 postgres postgres 16777216 May 18 03:52 000000010000008C0000006A
-rw------- 1 postgres postgres 16777216 May 18 06:32 000000010000008C0000008A
-rw------- 1 postgres postgres 16777216 May 18 09:13 000000010000008C000000AA
-rw------- 1 postgres postgres 16777216 May 18 14:33 000000010000008C000000EA
-rw------- 1 postgres postgres 16777216 May 18 17:13 000000010000008D0000000B
-rw------- 1 postgres postgres 16777216 May 18 19:53 000000010000008D0000002B
-rw------- 1 postgres postgres 16777216 May 18 22:33 000000010000008D0000004B
-rw------- 1 postgres postgres 16777216 May 19 01:05 000000010000008D0000006B
-rw------- 1 postgres postgres 16777216 May 19 03:45 000000010000008D0000008B
-rw------- 1 postgres postgres 16777216 May 19 06:25 000000010000008D000000AB
-rw------- 1 postgres postgres 16777216 May 19 09:05 000000010000008D000000CB
-rw------- 1 postgres postgres 16777216 May 19 11:45 000000010000008D000000EB
-rw------- 1 postgres postgres 16777216 May 19 14:25 000000010000008E0000000C
-rw------- 1 postgres postgres 16777216 May 19 17:05 000000010000008E0000002C
-rw------- 1 postgres postgres 16777216 May 19 19:45 000000010000008E0000004C
-rw------- 1 postgres postgres 16777216 May 19 22:25 000000010000008E0000006C
-rw------- 1 postgres postgres 16777216 May 20 01:01 000000010000008E0000008C
-rw------- 1 postgres postgres 16777216 May 20 03:41 000000010000008E000000AC
-rw------- 1 postgres postgres 16777216 May 20 06:21 000000010000008E000000CC
-rw------- 1 postgres postgres 16777216 May 20 09:01 000000010000008E000000EC
-rw------- 1 postgres postgres 16777216 May 20 11:41 000000010000008F0000000D
-rw------- 1 postgres postgres 16777216 May 20 14:21 000000010000008F0000002D
-rw------- 1 postgres postgres 16777216 May 20 17:01 000000010000008F0000004D
-rw------- 1 postgres postgres 16777216 May 20 19:41 000000010000008F0000006D
-rw------- 1 postgres postgres 16777216 May 20 22:21 000000010000008F0000008D
-rw------- 1 postgres postgres 16777216 May 21 01:00 000000010000008F000000AD
-rw------- 1 postgres postgres 16777216 May 21 03:35 000000010000008F000000CD
-rw------- 1 postgres postgres 16777216 May 21 06:15 000000010000008F000000ED
-rw------- 1 postgres postgres 16777216 May 21 08:55 00000001000000900000000E
-rw------- 1 postgres postgres 16777216 May 21 11:35 00000001000000900000002E
-rw------- 1 postgres postgres 16777216 May 21 14:15 00000001000000900000004E
-rw------- 1 postgres postgres 16777216 May 21 16:55 00000001000000900000006E
-rw------- 1 postgres postgres 16777216 May 21 19:35 00000001000000900000008E
-rw------- 1 postgres postgres 16777216 May 21 20:00 000000010000009000000093
-rw------- 1 postgres postgres 16777216 May 21 20:05 000000010000009000000094
-rw------- 1 postgres postgres 16777216 May 21 20:10 000000010000009000000095
-rw------- 1 postgres postgres 16777216 May 21 20:14 000000010000009000000096
-rw------- 1 postgres postgres 16777216 May 21 19:55 000000010000009000000097
drwx------ 2 postgres postgres    28672 May 21 20:10 archive_status

Aside from couple of last ones, every 32nd wal segment is kept for some reason.

Pg is 9.2.4.

All these "older" (that is aside from 5 newest) have *.ready 0-byte files in archive_status.

archiving on slave is /bin/true only:

postgres=# select name, setting from pg_settings where name ~ '(archive)';
           name            |  setting
---------------------------+-----------
 archive_command           | /bin/true
 archive_mode              | on
 archive_timeout           | 3600
 max_standby_archive_delay | 30
(4 rows)

I am at loss, no idea what could be wrong.
I mean - sure, I can remove all those old files, and we'll be fine, but new
ones will most likely accumulate, and removing them by hand, or from a cronjob
is kind of pointless.

Is there anything I could check that could lead to finding cause of this weird behavior?

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
                                                             http://depesz.com/


pgsql-general by date:

Previous
From: Dann Corbit
Date:
Subject: Re: [ODBC] ODBC constructs
Next
From: Bruce Momjian
Date:
Subject: Re: replanning Prepared Statements ?