> So that explains the crashes, but it doesn't (directly) explain why you
> had data corruption.
The first crash message:
Oct 5 15:00:21 db01 postgres[75852]: [6406-1] failjava@dbcluster 75852 0: ERROR: could not seek to end of file
"base/16387/18770":Too many open files in system
Oct 5 15:00:22 db01 postgres[75852]: [6413-1] failjava@dbcluster 75852 0: ERROR: could not seek to end of file
"base/16387/20259":Too many open files in system
Oct 5 15:00:23 db01 postgres[75852]: [6424-1] failjava@dbcluster 75852 0: ERROR: could not seek to end of file
"base/16387/20218":Too many open files in system
Oct 5 15:00:23 db01 postgres[75852]: [6431-1] failjava@dbcluster 75852 0: ERROR: could not seek to end of file
"base/16387/20218":Too many open files in system
Oct 5 15:00:24 db01 postgres[75852]: [6442-1] failjava@dbcluster 75852 0: ERROR: could not seek to end of file
"base/16387/20218":Too many open files in system
Oct 5 15:00:24 db01 postgres[75852]: [6449-1] failjava@dbcluster 75852 0: ERROR: could not open file
"base/16387/20302":Too many open files in system
Oct 5 15:00:24 db01 postgres[75852]: [6456-1] failjava@dbcluster 75852 0: ERROR: could not seek to end of file
"base/16387/20013":Too many open files in system
Oct 5 15:00:24 db01 postgres[75852]: [6463-1] failjava@dbcluster 75852 0: ERROR: could not seek to end of file
"base/16387/20259":Too many open files in system
Oct 5 15:00:24 db01 postgres[75852]: [6470-1] failjava@dbcluster 75852 0: ERROR: could not seek to end of file
"base/16387/20259":Too many open files in system
Oct 5 15:00:24 db01 postgres[75852]: [6480-1] failjava@dbcluster 75852 0: ERROR: could not open file
"base/16387/20275":Too many open files in system
Oct 5 15:00:25 db01 postgres[75852]: [6481-1] failjava@dbcluster 75852 0: ERROR: current transaction is aborted,
commandsignored until end of transaction block
Oct 5 15:00:25 db01 postgres[76648]: [5944-1] failjava@dbcluster 76648 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:00:25 db01 postgres[76649]: [5944-1] failjava@dbcluster 76649 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:00:25 db01 postgres[76407]: [5894-1] failjava@dbcluster 76407 0: WARNING: terminating connection because of
crashof another server process
For the sake of completeness, I looked to correlate the above relations with the tables that had problems, and there is
zerocorrelation between seek(2) errors and tables that had VACUUM reported invalid pages.
nspname | relname | relnamespace | relfilenode | relkind | oid
-----------------+-----------------+--------------+-------------+---------+-------
failjava_master | tbl_F | 16388 | 16402 | r | 16402
failjava_db1 | tbl_D | 16417 | 18473 | r | 18473
failjava_db1 | tbl_E | 16417 | 20179 | r | 20179
failjava_db1 | tbl_C | 16417 | 20187 | r | 20187
failjava_db1 | tbl_B | 16417 | 20200 | r | 20200
failjava_db1 | tbl_A | 16417 | 41731 | r | 19533
But I'm wondering if it died a second time before the system had finished its recovery:
Oct 5 15:01:31 db01 postgres[76888]: [6503-1] @ 76888 0: WARNING: could not open statistics file
"pg_stat_tmp/pgstat.stat":Too many open files in system
Oct 5 15:01:31 db01 postgres[77571]: [5978-1] @ 77571 0: FATAL: pipe() failed: Too many open files in system
Oct 5 15:01:31 db01 postgres[77572]: [5979-1] failjava@dbcluster 77572 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:01:31 db01 postgres[77572]: [5980-1] failjava@dbcluster 77572 0: WARNING: terminating connection because of
crashof another server process
Oct 5 15:01:31 db01 postgres[77544]: [7490-1] failjava@dbcluster 77544 0: WARNING: terminating connection because of
crashof another server process
Unfortunately the logs that I have for the first set of crashes are limited to WARNING+ and don't include INFO log
levels(though I have it later on. Due to various IRL constraints - hardware failure of the intended host - the
provisioningof the master was a bit of a cannonball dash).
> I think the uninitialized pages are showing up because you had crashes
> in the midst of relation-extension operations, ie, some other backend
> had successfully done an smgrextend but hadn't yet laid down any valid
> data in the new page. However, this theory would not explain more than
> one uninitialized page per crash, and your previous message seems to
> show rather a lot of uninitialized pages. How many pipe-failed crashes
> did you have?
Assuming ~1min per recovery, it looks like around 38.
Oct 5 15:00:25 db01 postgres[76648]: [5944-1] failjava@dbcluster 76648 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:00:25 db01 postgres[76649]: [5944-1] failjava@dbcluster 76649 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:01:31 db01 postgres[77571]: [5978-1] @ 77571 0: FATAL: pipe() failed: Too many open files in system
Oct 5 15:01:31 db01 postgres[77572]: [5979-1] failjava@dbcluster 77572 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:02:20 db01 postgres[78473]: [5987-1] failjava@dbcluster 78473 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:03:47 db01 postgres[79408]: [6005-1] failjava@dbcluster 79408 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:05:06 db01 postgres[80440]: [6112-1] failjava@dbcluster 80440 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:06:09 db01 postgres[81276]: [6117-1] @ 81276 0: FATAL: pipe() failed: Too many open files in system
Oct 5 15:07:04 db01 postgres[82102]: [6131-1] failjava@dbcluster 82102 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:08:19 db01 postgres[82336]: [6136-1] @ 82336 0: FATAL: could not open archive status directory
"pg_xlog/archive_status":Too many open files in system
Oct 5 15:08:20 db01 postgres[83084]: [6161-1] @ 83084 0: FATAL: pipe() failed: Too many open files in system
Oct 5 15:08:22 db01 postgres[83091]: [6184-1] failjava@dbcluster 83091 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:09:36 db01 postgres[84096]: [6218-1] failjava@dbcluster 84096 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:10:50 db01 postgres[85064]: [6226-1] failjava@dbcluster 85064 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:12:05 db01 postgres[86106]: [6233-1] failjava@dbcluster 86106 0: FATAL: could not open relation mapping file
"global/pg_filenode.map":Too many open files in system
Oct 5 15:12:05 db01 postgres[86107]: [6233-1] failjava@dbcluster 86107 0: FATAL: could not open relation mapping file
"global/pg_filenode.map":Too many open files in system
Oct 5 15:12:05 db01 postgres[86108]: [6233-1] failjava@dbcluster 86108 0: FATAL: could not open relation mapping file
"global/pg_filenode.map":Too many open files in system
Oct 5 15:12:05 db01 postgres[86109]: [6233-1] failjava@dbcluster 86109 0: FATAL: could not open relation mapping file
"global/pg_filenode.map":Too many open files in system
Oct 5 15:12:05 db01 postgres[86105]: [6234-1] failjava@dbcluster 86105 0: FATAL: could not open file
"base/16387/PG_VERSION":Too many open files in system
Oct 5 15:12:08 db01 postgres[86119]: [6252-1] failjava@dbcluster 86119 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:13:36 db01 postgres[87126]: [6295-1] failjava@dbcluster 87126 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:14:23 db01 postgres[88122]: [6364-1] failjava@dbcluster 88122 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:15:31 db01 postgres[88370]: [6369-1] @ 88370 0: FATAL: could not open archive status directory
"pg_xlog/archive_status":Too many open files in system
Oct 5 15:15:31 db01 postgres[89057]: [6398-1] @ 89057 0: FATAL: pipe() failed: Too many open files in system
Oct 5 15:15:34 db01 postgres[89067]: [6426-1] failjava@dbcluster 89067 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:16:38 db01 postgres[90080]: [6430-1] @ 90080 0: FATAL: pipe() failed: Too many open files in system
Oct 5 15:17:19 db01 postgres[90941]: [6446-1] failjava@dbcluster 90941 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:18:43 db01 postgres[91898]: [6484-1] failjava@dbcluster 91898 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:20:02 db01 postgres[93068]: [6514-1] failjava@dbcluster 93068 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:22:03 db01 postgres[94275]: [6519-1] failjava@dbcluster 94275 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:22:03 db01 postgres[94276]: [6519-1] failjava@dbcluster 94276 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:23:39 db01 postgres[95392]: [6525-1] failjava@dbcluster 95392 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:24:52 db01 postgres[96371]: [6533-1] failjava@dbcluster 96371 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:26:22 db01 postgres[97337]: [6570-1] failjava@dbcluster 97337 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:27:48 db01 postgres[98419]: [6607-1] failjava@dbcluster 98419 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:29:02 db01 postgres[99379]: [6706-1] failjava@dbcluster 99379 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:30:20 db01 postgres[467]: [6719-1] failjava@dbcluster 467 0: FATAL: pipe() failed: Too many open files in
system
Oct 5 15:32:08 db01 postgres[1479]: [6769-1] failjava@dbcluster 1479 0: FATAL: pipe() failed: Too many open files in
system
Oct 5 15:33:28 db01 postgres[2714]: [6776-1] failjava@dbcluster 2714 0: FATAL: pipe() failed: Too many open files in
system
Oct 5 15:34:18 db01 postgres[3600]: [6870-1] failjava@dbcluster 3600 0: FATAL: pipe() failed: Too many open files in
system
Oct 5 15:36:07 db01 postgres[4608]: [6896-1] failjava@dbcluster 4608 0: FATAL: pipe() failed: Too many open files in
system
Oct 5 15:37:18 db01 postgres[5652]: [6944-1] failjava@dbcluster 5652 0: FATAL: pipe() failed: Too many open files in
system
Oct 5 15:38:56 db01 postgres[6640]: [7018-1] failjava@dbcluster 6640 0: FATAL: pipe() failed: Too many open files in
system
Oct 5 15:39:46 db01 postgres[7680]: [7025-1] failjava@dbcluster 7680 0: FATAL: pipe() failed: Too many open files in
system
Oct 5 15:41:08 db01 postgres[8609]: [7081-1] failjava@dbcluster 8609 0: FATAL: pipe() failed: Too many open files in
system
Oct 5 15:42:20 db01 postgres[9546]: [7095-1] failjava@dbcluster 9546 0: FATAL: pipe() failed: Too many open files in
system
Oct 5 15:43:53 db01 postgres[10468]: [7127-1] failjava@dbcluster 10468 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:45:32 db01 postgres[11634]: [7159-1] failjava@dbcluster 11634 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:47:10 db01 postgres[12721]: [7184-1] failjava@dbcluster 12721 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:48:47 db01 postgres[13831]: [7194-1] failjava@dbcluster 13831 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:49:47 db01 postgres[14912]: [7217-1] failjava@dbcluster 14912 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:51:32 db01 postgres[15928]: [7223-1] failjava@dbcluster 15928 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:52:34 db01 postgres[16954]: [7231-1] failjava@dbcluster 16954 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:53:49 db01 postgres[17963]: [7247-1] failjava@dbcluster 17963 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:55:36 db01 postgres[19309]: [7269-1] failjava@dbcluster 19309 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:57:08 db01 postgres[20273]: [7286-1] failjava@dbcluster 20273 0: FATAL: pipe() failed: Too many open files
insystem
Oct 5 15:58:34 db01 postgres[20616]: [7291-1] @ 20616 0: FATAL: could not open archive status directory
"pg_xlog/archive_status":Too many open files in system
Oct 5 15:58:34 db01 postgres[21404]: [7293-1] @ 21404 0: FATAL: could not open archive status directory
"pg_xlog/archive_status":Too many open files in system
Oct 5 15:58:47 db01 postgres[21442]: [7294-1] @ 21442 0: FATAL: pipe() failed: Too many open files in system
Oct 5 15:58:49 db01 postgres[21450]: [7318-1] failjava@dbcluster 21450 0: FATAL: pipe() failed: Too many open files
insystem
[ Once I setup monitoring, I became aware that the app (in a 100% idle state) was consuming ~80K fds because the app's
initializationprocedure touches nearly all 500 tables across it's ~2K connections ]
<rant>It would be exceptionally nice if JDBC grew a brain and be written in an intelligent manner such that pgbouncer
couldbe used in transaction mode, thereby allowing exceptionally poorly written apps and database connection code which
establish5-20 connections per thread could be parked on the front of pgbouncer instead of needing to talk directly to a
backendprocess. Use of duplicate names for prepared statements, or preparing statements outside of a transaction really
tiesthe hands of folk who are attempting to insert additional layers in the stack to scale up applications that are
effectivelyin an immutable and broken state of disrepair. <pipedream>A more realistic but disgusting solution would be
toteach pgbouncer to intercept all prepared statements from a given client and on-the-fly rewrite the names to
somethingthat acts as a unique cookie and could be reused across backends if a given backend hasn't seen a unique
preparedstatement before.</pipedream></rant>
I'm trying to determine that now. I actually only included a small snippet of the uninitialized pages. It looks like
thereare 28 total uninitialized pages that VACUUM discovered:
[ autovacuum appears to have caught these two, however the slave wasn't turned up at this point ]
Oct 5 19:42:44 db01 postgres[37748]: [8908-1] @ 37748 0: WARNING: relation "tbl_E" page 4263 is uninitialized ---
fixing
Oct 5 19:43:03 db01 postgres[37324]: [8908-1] @ 37324 0: WARNING: relation "tbl_B" page 2628 is uninitialized ---
fixing
[ And the remaining 26 were caught via an explicitly executed `vacuumdb -az` ]
Oct 6 17:03:24 db01 postgres[73864]: [8911-1] pgsql@dbcluster 73864 0: WARNING: relation "tbl_F" page 65722 is
uninitialized--- fixing
Oct 6 17:03:43 db01 postgres[73864]: [8912-1] pgsql@dbcluster 73864 0: WARNING: relation "tbl_D" page 3563 is
uninitialized--- fixing
Oct 6 17:03:43 db01 postgres[73864]: [8913-1] pgsql@dbcluster 73864 0: WARNING: relation "tbl_D" page 3564 is
uninitialized--- fixing
Oct 6 17:03:43 db01 postgres[73864]: [8914-1] pgsql@dbcluster 73864 0: WARNING: relation "tbl_D" page 3565 is
uninitialized--- fixing
Oct 6 17:03:49 db01 postgres[73864]: [8915-1] pgsql@dbcluster 73864 0: WARNING: relation "tbl_C" page 65305 is
uninitialized--- fixing
Oct 6 17:03:50 db01 postgres[73864]: [8916-1] pgsql@dbcluster 73864 0: WARNING: relation "tbl_C" page 70927 is
uninitialized--- fixing
Oct 6 17:03:50 db01 postgres[73864]: [8917-1] pgsql@dbcluster 73864 0: WARNING: relation "tbl_C" page 75286 is
uninitialized--- fixing
Oct 6 17:03:50 db01 postgres[73864]: [8918-1] pgsql@dbcluster 73864 0: WARNING: relation "tbl_C" page 86601 is
uninitialized--- fixing
Oct 6 17:03:50 db01 postgres[73864]: [8919-1] pgsql@dbcluster 73864 0: WARNING: relation "tbl_C" page 91434 is
uninitialized--- fixing
Oct 6 17:03:50 db01 postgres[73864]: [8920-1] pgsql@dbcluster 73864 0: WARNING: relation "tbl_C" page 91435 is
uninitialized--- fixing
Oct 6 17:03:51 db01 postgres[73864]: [8921-1] pgsql@dbcluster 73864 0: WARNING: relation "tbl_C" page 95639 is
uninitialized--- fixing
Oct 6 17:03:51 db01 postgres[73864]: [8922-1] pgsql@dbcluster 73864 0: WARNING: relation "tbl_C" page 105638 is
uninitialized--- fixing
Oct 6 17:03:51 db01 postgres[73864]: [8923-1] pgsql@dbcluster 73864 0: WARNING: relation "tbl_C" page 114550 is
uninitialized--- fixing
Oct 6 17:03:51 db01 postgres[73864]: [8924-1] pgsql@dbcluster 73864 0: WARNING: relation "tbl_C" page 118380 is
uninitialized--- fixing
Oct 6 17:03:51 db01 postgres[73864]: [8925-1] pgsql@dbcluster 73864 0: WARNING: relation "tbl_C" page 121478 is
uninitialized--- fixing
Oct 6 17:03:51 db01 postgres[73864]: [8926-1] pgsql@dbcluster 73864 0: WARNING: relation "tbl_C" page 122098 is
uninitialized--- fixing
Oct 6 17:03:52 db01 postgres[73864]: [8927-1] pgsql@dbcluster 73864 0: WARNING: relation "tbl_C" page 127195 is
uninitialized--- fixing
Oct 6 17:03:53 db01 postgres[73864]: [8928-1] pgsql@dbcluster 73864 0: WARNING: relation "tbl_C" page 152416 is
uninitialized--- fixing
Oct 6 17:03:55 db01 postgres[73864]: [8929-1] pgsql@dbcluster 73864 0: WARNING: relation "tbl_C" page 230192 is
uninitialized--- fixing
Oct 6 17:03:56 db01 postgres[73864]: [8930-1] pgsql@dbcluster 73864 0: WARNING: relation "tbl_C" page 241091 is
uninitialized--- fixing
Oct 6 17:20:31 db01 postgres[73864]: [8931-1] pgsql@dbcluster 73864 0: WARNING: relation "tbl_A" page 32379133 is
uninitialized--- fixing
Oct 6 17:20:32 db01 postgres[73864]: [8932-1] pgsql@dbcluster 73864 0: WARNING: relation "tbl_A" page 32385954 is
uninitialized--- fixing
Oct 6 17:20:32 db01 postgres[73864]: [8933-1] pgsql@dbcluster 73864 0: WARNING: relation "tbl_A" page 32393740 is
uninitialized--- fixing
Oct 6 17:20:32 db01 postgres[73864]: [8934-1] pgsql@dbcluster 73864 0: WARNING: relation "tbl_A" page 32394599 is
uninitialized--- fixing
Oct 6 17:20:32 db01 postgres[73864]: [8935-1] pgsql@dbcluster 73864 0: WARNING: relation "tbl_A" page 32402510 is
uninitialized--- fixing
Oct 6 17:20:33 db01 postgres[73864]: [8936-1] pgsql@dbcluster 73864 0: WARNING: relation "tbl_A" page 32409328 is
uninitialized--- fixing
>> What's odd to me is not the failure scenarios that come from a system running out of FDs (though seek(2)'ing
consumingan FD seems odd), it's more that it's still possible for a master DB's VACUUM to clean up a bogus or partial
pagewrite, and have the slave crash when the WAL entry is shipped over.
>
> It looks to me like vacuumlazy.c doesn't bother to emit a WAL record at
> all when fixing an all-zeroes heap page. I'm not sure if that's a
> problem or not. The first actual use of such a page ought to result in
> re-init'ing it anyway (cf XLOG_HEAP_INIT_PAGE logic in heap_insert),
> so right offhand I don't see a path from this to the slave-side failures
> you saw. (But on the other hand I'm only firing on one cylinder today
> because of a head cold, so maybe I'm just missing it.)
(Good luck w/ the recovery)
> Do the slave-side failures correspond to pages that were reported as
> "fixed" on the master?
It doesn't appear so.
Oct 6 17:12:03 db02 postgres[15191]: [13-1] @ 15191 0: WARNING: page 2943 of relation base/16387/20196 is
uninitialized
I do see a few other odd or related messages on the slave, however:
[ from a recovery ]
Oct 6 07:14:50 db02 postgres[11972]: [12-1] @ 11972 0: LOG: record with zero length at 9E/699842C0
Oct 6 07:14:50 db02 postgres[11972]: [13-1] @ 11972 0: LOG: redo done at 9E/69984290
Oct 6 07:14:50 db02 postgres[11972]: [14-1] @ 11972 0: LOG: last completed transaction was at log time 2012-10-06
07:12:11.87694+00
Oct 6 07:14:50 db02 postgres[11972]: [15-1] @ 11972 0: LOG: checkpoint starting: end-of-recovery immediate
[ Best to let sleeping dragons lie, I shut the slave down for the night in case there was some form of feedback that
couldbe sent back that could hang the master ]
Oct 6 07:51:32 db02 postgres[15505]: [16-1] @ 15505 0: LOG: database system is shut down
[ I think this is from trying to connect to an ipv6 addr for localhost, but IPv6 has been disabled. Including because
theverbosity level seems excessive. ]
Oct 6 16:36:06 db02 postgres[4529]: [7-1] @ 4529 0: LOG: could not create socket for statistics collector: Protocol
notsupported
Oct 6 16:36:06 db02 postgres[4529]: [8-1] @ 4529 0: LOG: trying another address for the statistics collector
[ Here the database noted the bogus page, but it was still able to start up ]
Oct 6 16:36:06 db02 postgres[4535]: [9-1] @ 4535 0: LOG: database system was interrupted; last known up at 2012-10-06
08:07:41UTC
Oct 6 16:36:06 db02 postgres[4535]: [10-1] @ 4535 0: LOG: entering standby mode
Oct 6 16:36:06 db02 postgres[4535]: [11-1] @ 4535 0: LOG: unexpected pageaddr 88/EE000000 in log file 158, segment
110,offset 0
[ Including this as a small nit: there are two lines emitted for the following log entry, which seems incorrect ]
Oct 6 16:41:31 db02 postgres[6416]: [9-1] @ 6416 0: FATAL: could not connect to the primary server: fe_sendauth: no
passwordsupplied
Oct 6 16:41:31 db02 postgres[6416]: [9-2]
Oct 6 16:41:36 db02 postgres[6445]: [9-1] @ 6445 0: LOG: streaming replication successfully connected to primary
Oct 6 16:41:37 db02 postgres[4535]: [12-1] @ 4535 0: LOG: redo starts at 9E/6E000020
Oct 6 16:41:37 db02 postgres[4535]: [13-1] @ 4535 0: LOG: consistent recovery state reached at 9E/6E003E10
Oct 6 16:41:37 db02 postgres[4529]: [9-1] @ 4529 0: LOG: database system is ready to accept read only connections
[ Here's what the slave sees on a crash ]
Oct 6 17:09:17 db02 postgres[15561]: [10-1] pgsql@postgres 15561 0: FATAL: the database system is starting up
Oct 6 17:11:52 db02 postgres[15191]: [12-1] @ 15191 0: LOG: consistent recovery state reached at 9E/E3DDE2B0
Oct 6 17:11:52 db02 postgres[15185]: [9-1] @ 15185 0: LOG: database system is ready to accept read only connections
Oct 6 17:12:03 db02 postgres[15191]: [13-1] @ 15191 0: WARNING: page 2943 of relation base/16387/20196 is
uninitialized
Oct 6 17:12:03 db02 postgres[15191]: [13-2] @ 15191 0: CONTEXT: xlog redo vacuum: rel 1663/16387/20196; blk 2944,
lastBlockVacuumed2403
Oct 6 17:12:03 db02 postgres[15191]: [14-1] @ 15191 0: PANIC: WAL contains references to invalid pages
Oct 6 17:12:03 db02 postgres[15191]: [14-2] @ 15191 0: CONTEXT: xlog redo vacuum: rel 1663/16387/20196; blk 2944,
lastBlockVacuumed2403
Oct 6 17:12:03 db02 postgres[15185]: [10-1] @ 15185 0: LOG: startup process (PID 15191) was terminated by signal 6:
Aborttrap
[ /me scratches head for a bit ]
Oct 6 17:21:34 db02 postgres[19614]: [9-1] @ 19614 0: LOG: database system was interrupted while in recovery at log
time2012-10-06 08:07:24 UTC
Oct 6 17:21:34 db02 postgres[19614]: [9-2] @ 19614 0: HINT: If this has occurred more than once some data might be
corruptedand you might need to choose an earlier recovery target.
Oct 6 17:21:34 db02 postgres[19614]: [10-1] @ 19614 0: LOG: entering standby mode
Oct 6 17:21:34 db02 postgres[19614]: [11-1] @ 19614 0: LOG: redo starts at 9E/6E000020
[ This is the point that I came up with my hypothesis and enabled full_page_writes ]
Oct 6 17:24:14 db02 postgres[19608]: [9-1] @ 19608 0: LOG: received SIGHUP, reloading configuration files
Oct 6 17:24:14 db02 postgres[19608]: [10-1] @ 19608 0: LOG: parameter "full_page_writes" changed to "on"
Oct 6 17:26:34 db02 postgres[19614]: [12-1] @ 19614 0: LOG: consistent recovery state reached at 9E/E3DDE2B0
Oct 6 17:26:34 db02 postgres[19608]: [11-1] @ 19608 0: LOG: database system is ready to accept read only connections
Oct 6 17:26:44 db02 postgres[19614]: [13-1] @ 19614 0: WARNING: page 2943 of relation base/16387/20196 is
uninitialized
Oct 6 17:26:44 db02 postgres[19614]: [13-2] @ 19614 0: CONTEXT: xlog redo vacuum: rel 1663/16387/20196; blk 2944,
lastBlockVacuumed2403
Oct 6 17:26:44 db02 postgres[19614]: [14-1] @ 19614 0: PANIC: WAL contains references to invalid pages
Oct 6 17:26:44 db02 postgres[19614]: [14-2] @ 19614 0: CONTEXT: xlog redo vacuum: rel 1663/16387/20196; blk 2944,
lastBlockVacuumed2403
Oct 6 17:26:44 db02 postgres[19608]: [12-1] @ 19608 0: LOG: startup process (PID 19614) was terminated by signal 6:
Aborttrap
Oct 6 17:26:44 db02 postgres[19608]: [13-1] @ 19608 0: LOG: terminating any other active server processes
[ Given things were still panic'ing on the same page of data, I enabled full_page_writes on the master, performed a
VACUUM,CHECKPOINT and begin resync'ing data to the slave ]
[ And here I started up the slave again ]
Oct 7 00:32:00 db02 postgres[66904]: [9-1] @ 66904 0: LOG: database system was interrupted; last known up at
2012-10-0617:48:10 UTC
Oct 7 00:32:00 db02 postgres[66904]: [10-1] @ 66904 0: LOG: entering standby mode
Oct 7 00:32:00 db02 postgres[66904]: [11-1] @ 66904 0: LOG: unexpected pageaddr 89/B9000000 in log file 159, segment
55,offset 0
Oct 7 00:32:00 db02 postgres[66906]: [9-1] [unknown]@[unknown] 66906 0: LOG: connection received: host=[local]
Oct 7 00:32:00 db02 postgres[66906]: [10-1] pgsql@postgres 66906 0: FATAL: the database system is starting up
Oct 7 00:32:00 db02 postgres[66905]: [9-1] @ 66905 0: LOG: streaming replication successfully connected to primary
Oct 7 00:32:01 db02 postgres[66913]: [9-1] [unknown]@[unknown] 66913 0: LOG: connection received: host=[local]
Oct 7 00:32:01 db02 postgres[66913]: [10-1] pgsql@postgres 66913 0: FATAL: the database system is starting up
Oct 7 00:32:01 db02 postgres[66904]: [12-1] @ 66904 0: LOG: redo starts at 9F/36000020
Oct 7 00:32:01 db02 postgres[66904]: [13-1] @ 66904 0: LOG: consistent recovery state reached at 9F/36003FD8
Oct 7 00:32:02 db02 postgres[66897]: [9-1] @ 66897 0: LOG: database system is ready to accept read only connections
[ At this point I initiated a VACUUM and rebuilt a 400GB table using pg_reorg(1) to evict all WAL files that would
containpossibly poisonous data ]
And that's about all I know at this point in time. Things have been up and running successfully and without issue ever
since.I'm tempted to pg_reorg(1) all of the tables in order to guarantee that all tables are 100% intact since I can't
easilyVACUUM FULL these tables (several of the largest, most frequently used are several TB in size). LMK if there's
anythingelse you're interested in seeing or having outlined. Again, the master recovered gracefully, didn't require any
manualintervention other than increasing kern.maxfiles, and hasn't skipped a beat once since kern.maxfiles was
increased.Having the slaves crap out when the master executed a VACUUM, however, is of interest. When running the
manualVACUUM, I had a window open to both db01 and db02 and db02's backend panic'ing was clearly linked to the VACUUM
warningson the db01.
In the regression testing environments, is it possible to enable a test mode that runs with full_page_writes disabled
inorder to push the boundaries of disabling full_page_writes? The IO bandwidth savings are substantial for small
mutations,and for folks lucky enough to run on ZFS (i.e. not ufs, ext*), it'd be really cool to be able to realize
thosebandwidth savings. As is, it appears that standalone instances of Pg are full_page_writes safe, but that slaves
arestill making assumptions about full page WAL entries being the MTU during some corner cases. My observations lead me
tobelieve that replication works fine with full_page_writes disabled until the master cleans up a zero-filled or
damagedpage.
For the archives (if anyone makes it this far to the bottom of the note), what I did to operationally "fix" things:
1) On the master I re-enabled full_page_writes and HUP'ed pg to reload the setting.
2) Fired off a VACUUM ANALYZE to generate some WAL data. This step isn't necessary, but I wanted to make sure that
therewas something to CHECKPOINT.
3) Issue a CHECKPOINT & wait until this is completed.
4) Re-sync data to the slave (via rsync(1) or whatever your tool of choice is).
5) Restarted the slave.
6) Verify that the slave is in fact replicating via pg_stat_replication
-sc
--
Sean Chittenden
sean@chittenden.org