Thread: 'full_page_writes=off' , VACUUM and crashing streaming slaves...

'full_page_writes=off' , VACUUM and crashing streaming slaves...

From

Sean Chittenden

Date:

06 October 2012, 17:32:15

Master/slave replication, the master and slave both have full_page_writes disabled because the system is running on
ZFS.

While the master was still being setup and data was being loaded, the system ran out of open file descriptors and Pg
crasheda few times until this was realized and corrected. Streaming replication was setup and was correctly passing
datato the slave: 

repmgr@db01=# select * FROM pg_stat_replication ;
  pid  | usesysid | usename | application_name |   client_addr   | client_hostname | client_port |
backend_start        |   state   | sent_location | write_location | flush_location | replay_location | sync_priority |
sync_state 

-------+----------+---------+------------------+-----------------+-----------------+-------------+-------------------------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
 68057 |    16384 | repmgr  | walreceiver      | 192.168.100.183 |                 |       60743 | 2012-10-06
16:41:36.963769+00| streaming | 9E/E2C81210   | 9E/E2C81210    | 9E/E2C81210    | 9E/E2C80C68     |             0 |
async
(1 row)

Once the host was up and running, I kicked off a manual VACUUM, at which point I see a bunch of the following errors on
themaster: 

INFO:  vacuuming "schma.tbl"
WARNING:  relation "tbl" page 65305 is uninitialized --- fixing
WARNING:  relation "tbl" page 70927 is uninitialized --- fixing
WARNING:  relation "tbl" page 75286 is uninitialized --- fixing
WARNING:  relation "tbl" page 86601 is uninitialized --- fixing
WARNING:  relation "tbl" page 91434 is uninitialized --- fixing
WARNING:  relation "tbl" page 91435 is uninitialized --- fixing
WARNING:  relation "tbl" page 95639 is uninitialized --- fixing
WARNING:  relation "tbl" page 105638 is uninitialized --- fixing
WARNING:  relation "tbl" page 114550 is uninitialized --- fixing
WARNING:  relation "tbl" page 118380 is uninitialized --- fixing
WARNING:  relation "tbl" page 121478 is uninitialized --- fixing
WARNING:  relation "tbl" page 122098 is uninitialized --- fixing
WARNING:  relation "tbl" page 127195 is uninitialized --- fixing
WARNING:  relation "tbl" page 152416 is uninitialized --- fixing
WARNING:  relation "tbl" page 230192 is uninitialized --- fixing
WARNING:  relation "tbl" page 241091 is uninitialized --- fixing
INFO:  scanned index "tbl_pkey" to remove 447 row versions

and on the slave I see:

Oct  6 17:04:02 db02 postgres[4535]: [14-1] @ 4535 0: WARNING:  page 2943 of relation base/16387/20196 is uninitialized
Oct  6 17:04:02 db02 postgres[4535]: [14-2] @ 4535 0: CONTEXT:  xlog redo vacuum: rel 1663/16387/20196; blk 2944,
lastBlockVacuumed2403 
Oct  6 17:04:02 db02 postgres[4535]: [15-1] @ 4535 0: PANIC:  WAL contains references to invalid pages
Oct  6 17:04:02 db02 postgres[4535]: [15-2] @ 4535 0: CONTEXT:  xlog redo vacuum: rel 1663/16387/20196; blk 2944,
lastBlockVacuumed2403 
Oct  6 17:04:02 db02 postgres[4529]: [10-1] @ 4529 0: LOG:  startup process (PID 4535) was terminated by signal 6:
Aborttrap 
Oct  6 17:04:02 db02 postgres[4529]: [11-1] @ 4529 0: LOG:  terminating any other active server processes
Oct  6 17:04:02 db02 postgres[8345]: [26-1] pgsql@repmgr 8345 0: WARNING:  terminating connection because of crash of
anotherserver process 
Oct  6 17:04:02 db02 postgres[8345]: [26-2] pgsql@repmgr 8345 0: DETAIL:  The postmaster has commanded this server
processto roll back the current transaction and exit, because another server process exited abnormally and possibly
corruptedshared memory. 
Oct  6 17:04:02 db02 postgres[8345]: [26-3] pgsql@repmgr 8345 0: HINT:  In a moment you should be able to reconnect to
thedatabase and repeat your command. 

Restarting the slave, I see:

Oct  6 17:11:52 db02 postgres[15191]: [12-1] @ 15191 0: LOG:  consistent recovery state reached at 9E/E3DDE2B0
Oct  6 17:11:52 db02 postgres[15185]: [9-1] @ 15185 0: LOG:  database system is ready to accept read only connections
Oct  6 17:12:03 db02 postgres[15191]: [13-1] @ 15191 0: WARNING:  page 2943 of relation base/16387/20196 is
uninitialized
Oct  6 17:12:03 db02 postgres[15191]: [13-2] @ 15191 0: CONTEXT:  xlog redo vacuum: rel 1663/16387/20196; blk 2944,
lastBlockVacuumed2403 
Oct  6 17:12:03 db02 postgres[15191]: [14-1] @ 15191 0: PANIC:  WAL contains references to invalid pages
Oct  6 17:12:03 db02 postgres[15191]: [14-2] @ 15191 0: CONTEXT:  xlog redo vacuum: rel 1663/16387/20196; blk 2944,
lastBlockVacuumed2403 
Oct  6 17:12:03 db02 postgres[15185]: [10-1] @ 15185 0: LOG:  startup process (PID 15191) was terminated by signal 6:
Aborttrap 
Oct  6 17:12:03 db02 postgres[15185]: [11-1] @ 15185 0: LOG:  terminating any other active server processes

At this point in time, I'm suspecting full_page_writes=off is the culprit and it's violating logic in several places. I
haven'ttracked things down further, but I thought I'd at least bring this to people's attention so that the appropriate
warningscan be put in place. The slave can not be recovered with the corrupted WAL entries, however, and they need to
beresynced with full_page_writes enabled (at least that's my assumption at present): 

Oct  6 17:26:34 db02 postgres[19614]: [12-1] @ 19614 0: LOG:  consistent recovery state reached at 9E/E3DDE2B0
Oct  6 17:26:34 db02 postgres[19608]: [11-1] @ 19608 0: LOG:  database system is ready to accept read only connections
Oct  6 17:26:44 db02 postgres[19614]: [13-1] @ 19614 0: WARNING:  page 2943 of relation base/16387/20196 is
uninitialized
Oct  6 17:26:44 db02 postgres[19614]: [13-2] @ 19614 0: CONTEXT:  xlog redo vacuum: rel 1663/16387/20196; blk 2944,
lastBlockVacuumed2403 
Oct  6 17:26:44 db02 postgres[19614]: [14-1] @ 19614 0: PANIC:  WAL contains references to invalid pages
Oct  6 17:26:44 db02 postgres[19614]: [14-2] @ 19614 0: CONTEXT:  xlog redo vacuum: rel 1663/16387/20196; blk 2944,
lastBlockVacuumed2403 
Oct  6 17:26:44 db02 postgres[19608]: [12-1] @ 19608 0: LOG:  startup process (PID 19614) was terminated by signal 6:
Aborttrap 
Oct  6 17:26:44 db02 postgres[19608]: [13-1] @ 19608 0: LOG:  terminating any other active server processes


Cheers. I'll report back if I dig anything else up. -sc


[ plz CC as I'm not subscribed to the list ]]


--
Sean Chittenden
sean@chittenden.org

Re: 'full_page_writes=off' , VACUUM and crashing streaming slaves...

From

Tom Lane

Date:

06 October 2012, 19:43:44

Sean Chittenden <sean@chittenden.org> writes:
> While the master was still being setup and data was being loaded, the
> system ran out of open file descriptors and Pg crashed a few times
> until this was realized and corrected.

Could you provide a bit more detail than that?  Usually the behavior we
see under out-of-FDs conditions is that everything else on the system
starts to fall over badly, but PG keeps soldiering along to some extent.
I would not expect data corruption to result, anyway.

If you've got the postmaster logs from this episode, it would be useful
to see what complaints got logged.

            regards, tom lane

Re: 'full_page_writes=off' , VACUUM and crashing streaming slaves...

From

Sean Chittenden

Date:

07 October 2012, 00:58:35

>> While the master was still being setup and data was being loaded, the
>> system ran out of open file descriptors and Pg crashed a few times
>> until this was realized and corrected.
>
> Could you provide a bit more detail than that?  Usually the behavior we
> see under out-of-FDs conditions is that everything else on the system
> starts to fall over badly, but PG keeps soldiering along to some extent.
> I would not expect data corruption to result, anyway.

Yup, I ran out of time earlier and wasn't able to dig them out, but I just recovered them. It looks like there are two
errormodes that I ran in to as a result of running out of available FDs. 

> If you've got the postmaster logs from this episode, it would be useful
> to see what complaints got logged.


These failures are against 9.2.1, btw.

The first crash scenario:

Oct  5 15:00:24 db01 postgres[75852]: [6449-2] javafail@dbcluster 75852 0: STATEMENT:  SELECT /* query */ FROM tbl AS
this_WHERE this_.user_id=$1 
Oct  5 15:00:24 db01 postgres[75852]: [6456-1] javafail@dbcluster 75852 0: ERROR:  could not seek to end of file
"base/16387/20013":Too many open files in system 
[snip - lots of could not seek to end of file errors. How does seek(2) consume a file descriptor??? ]
Oct  5 15:00:24 db01 postgres[75852]: [6480-1] javafail@dbcluster 75852 0: ERROR:  could not open file
"base/16387/20275":Too many open files in system 
Oct  5 15:00:24 db01 postgres[75852]: [6480-2] javafail@dbcluster 75852 0: STATEMENT:  SELECT /* some other query */
FROMtbl2 AS this_ WHERE this_.user_id=$1 limit $2 
Oct  5 15:00:25 db01 postgres[75852]: [6481-1] javafail@dbcluster 75852 0: ERROR:  current transaction is aborted,
commandsignored until end of transaction block 
Oct  5 15:00:25 db01 postgres[75852]: [6481-2] javafail@dbcluster 75852 0: STATEMENT:  SELECT 1
Oct  5 15:00:25 db01 postgres[76648]: [5944-1] javafail@dbcluster 76648 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:00:25 db01 postgres[76649]: [5944-1] javafail@dbcluster 76649 0: FATAL:  pipe() failed: Too many open files
insystem 
[ snip - this makes sense, pipe(2)'s able to return ENFILE ]
Oct  5 15:00:25 db01 postgres[76407]: [5894-1] javafail@dbcluster 76407 0: WARNING:  terminating connection because of
crashof another server process 
Oct  5 15:00:25 db01 postgres[76312]: [6295-1] javafail@dbcluster 76312 0: WARNING:  terminating connection because of
crashof another server process 
Oct  5 15:00:25 db01 postgres[76331]: [8612-1] javafail@dbcluster 76331 0: WARNING:  terminating connection because of
crashof another server process 
Oct  5 15:00:25 db01 postgres[76407]: [5894-2] javafail@dbcluster 76407 0: DETAIL:  The postmaster has commanded this
serverprocess to roll back the current transaction and exit, because another server process exited abnormally and
possiblycorrupted shared memory. 
Oct  5 15:00:25 db01 postgres[76312]: [6295-2] javafail@dbcluster 76312 0: DETAIL:  The postmaster has commanded this
serverprocess to roll back the current transaction and exit, because another server process exited abnormally and
possiblycorrupted shared memory. 
Oct  5 15:00:25 db01 postgres[76315]: [7317-1] javafail@dbcluster 76315 0: WARNING:  terminating connection because of
crashof another server process 
Oct  5 15:00:25 db01 postgres[76407]: [5894-3] javafail@dbcluster 76407 0: HINT:  In a moment you should be able to
reconnectto the database and repeat your command. 
Oct  5 15:00:25 db01 postgres[76302]: [10249-1] javafail@dbcluster 76302 0: WARNING:  terminating connection because of
crashof another server process 
Oct  5 15:00:25 db01 postgres[76331]: [8612-2] javafail@dbcluster 76331 0: DETAIL:  The postmaster has commanded this
serverprocess to roll back the current transaction and exit, because another server process exited abnormally and
possiblycorrupted shared memory. 
Oct  5 15:00:25 db01 postgres[76286]: [7471-1] javafail@dbcluster 76286 0: WARNING:  terminating connection because of
crashof another server process 
Oct  5 15:00:25 db01 postgres[76307]: [7628-1] javafail@dbcluster 76307 0: WARNING:  terminating connection because of
crashof another server process 
Oct  5 15:00:25 db01 postgres[76285]: [7034-1] javafail@dbcluster 76285 0: WARNING:  terminating connection because of
crashof another server process 
Oct  5 15:00:25 db01 postgres[76331]: [8612-3] javafail@dbcluster 76331 0: HINT:  In a moment you should be able to
reconnectto the database and repeat your command. 
[snip]
Oct  5 15:00:28 db01 postgres[76875]: [5949-1] javafail@dbcluster 76875 0: FATAL:  the database system is in recovery
mode
[snip]
Oct  5 15:01:28 db01 postgres[76888]: [5951-1] @ 76888 0: WARNING:  could not open statistics file
"pg_stat_tmp/pgstat.stat":Too many open files in system 
Oct  5 15:01:28 db01 postgres[76888]: [5953-1] @ 76888 0: WARNING:  could not open statistics file
"pg_stat_tmp/pgstat.stat":Too many open files in system 
Oct  5 15:01:28 db01 postgres[76888]: [5955-1] @ 76888 0: WARNING:  could not open statistics file
"pg_stat_tmp/pgstat.stat":Too many open files in system 
Oct  5 15:01:28 db01 postgres[76888]: [5957-1] @ 76888 0: WARNING:  could not open statistics file
"pg_stat_tmp/pgstat.stat":Too many open files in system 
Oct  5 15:01:28 db01 postgres[76888]: [5959-1] @ 76888 0: WARNING:  could not open statistics file
"pg_stat_tmp/pgstat.stat":Too many open files in system 
[snip]
Oct  5 15:01:31 db01 postgres[76888]: [6503-1] @ 76888 0: WARNING:  could not open statistics file
"pg_stat_tmp/pgstat.stat":Too m 
any open files in system
Oct  5 15:01:31 db01 postgres[77571]: [5978-1] @ 77571 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:01:31 db01 postgres[77572]: [5979-1] javafail@dbcluster 77572 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:01:31 db01 postgres[77572]: [5980-1] javafail@dbcluster 77572 0: WARNING:  terminating connection because of
crashof anothe 
r server process
Oct  5 15:01:31 db01 postgres[77572]: [5980-2] javafail@dbcluster 77572 0: DETAIL:  The postmaster has commanded this
serverprocess t 
o roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted
sharedmemory. 
Oct  5 15:01:31 db01 postgres[77572]: [5980-3] javafail@dbcluster 77572 0: HINT:  In a moment you should be able to
reconnectto the d 
atabase and repeat your command.
Oct  5 15:01:31 db01 postgres[77544]: [7490-1] javafail@dbcluster 77544 0: WARNING:  terminating connection because of
crashof anothe 
r server process
[snip]
Oct  5 15:01:34 db01 postgres[77811]: [5983-1] javafail@dbcluster 77811 0: FATAL:  the database system is in recovery
mode
Oct  5 15:02:20 db01 postgres[78473]: [5987-1] javafail@dbcluster 78473 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:02:20 db01 postgres[78466]: [5989-1] javafail@dbcluster 78466 0: WARNING:  terminating connection because of
crashof another server process 


What's odd to me is not the failure scenarios that come from a system running out of FDs (though seek(2)'ing consuming
anFD seems odd), it's more that it's still possible for a master DB's VACUUM to clean up a bogus or partial page write,
andhave the slave crash when the WAL entry is shipped over. 

Are there any other bits of info that you want me to hold on to, extract, or things you want me to test? -sc


--
Sean Chittenden
sean@chittenden.org

Re: 'full_page_writes=off' , VACUUM and crashing streaming slaves...

From

Tom Lane

Date:

07 October 2012, 16:16:00

Sean Chittenden <sean@chittenden.org> writes:
>> If you've got the postmaster logs from this episode, it would be useful
>> to see what complaints got logged.

> The first crash scenario:

> Oct  5 15:00:24 db01 postgres[75852]: [6449-2] javafail@dbcluster 75852 0: STATEMENT:  SELECT /* query */ FROM tbl AS
this_WHERE this_.user_id=$1 
> Oct  5 15:00:24 db01 postgres[75852]: [6456-1] javafail@dbcluster 75852 0: ERROR:  could not seek to end of file
"base/16387/20013":Too many open files in system 
> [snip - lots of could not seek to end of file errors. How does seek(2) consume a file descriptor??? ]

It doesn't, but FileSeek() might need to do an open if the file wasn't
currently open.  This isn't that surprising.

> Oct  5 15:00:25 db01 postgres[76648]: [5944-1] javafail@dbcluster 76648 0: FATAL:  pipe() failed: Too many open files
insystem 

This message must be coming from initSelfPipe(), and after poking around
a bit I think the failure must be occurring while a new backend is
attempting to do "OwnLatch(&MyProc->procLatch)" in InitProcess.  The
reason the postmaster treats this as a crash is that the new backend
just armed the dead-man switch (MarkPostmasterChildActive) but it exits
without doing ProcKill which would disarm it.  So this is just an
order-of-operations bug in InitProcess: we're assuming that it can't
fail before reaching "on_shmem_exit(ProcKill, 0)", and the latch
additions broke that.  (Though looking at it, assuming that the
PGSemaphoreReset call cannot fail seems a tad risky too.)

So that explains the crashes, but it doesn't (directly) explain why you
had data corruption.

I think the uninitialized pages are showing up because you had crashes
in the midst of relation-extension operations, ie, some other backend
had successfully done an smgrextend but hadn't yet laid down any valid
data in the new page.  However, this theory would not explain more than
one uninitialized page per crash, and your previous message seems to
show rather a lot of uninitialized pages.  How many pipe-failed crashes
did you have?

> What's odd to me is not the failure scenarios that come from a system running out of FDs (though seek(2)'ing
consumingan FD seems odd), it's more that it's still possible for a master DB's VACUUM to clean up a bogus or partial
pagewrite, and have the slave crash when the WAL entry is shipped over. 

It looks to me like vacuumlazy.c doesn't bother to emit a WAL record at
all when fixing an all-zeroes heap page.  I'm not sure if that's a
problem or not.  The first actual use of such a page ought to result in
re-init'ing it anyway (cf XLOG_HEAP_INIT_PAGE logic in heap_insert),
so right offhand I don't see a path from this to the slave-side failures
you saw.  (But on the other hand I'm only firing on one cylinder today
because of a head cold, so maybe I'm just missing it.)

Do the slave-side failures correspond to pages that were reported as
"fixed" on the master?

            regards, tom lane

Re: 'full_page_writes=off' , VACUUM and crashing streaming slaves...

From

Sean Chittenden

Date:

07 October 2012, 23:35:08

> So that explains the crashes, but it doesn't (directly) explain why you
> had data corruption.

The first crash message:

Oct  5 15:00:21 db01 postgres[75852]: [6406-1] failjava@dbcluster 75852 0: ERROR:  could not seek to end of file
"base/16387/18770":Too many open files in system 
Oct  5 15:00:22 db01 postgres[75852]: [6413-1] failjava@dbcluster 75852 0: ERROR:  could not seek to end of file
"base/16387/20259":Too many open files in system 
Oct  5 15:00:23 db01 postgres[75852]: [6424-1] failjava@dbcluster 75852 0: ERROR:  could not seek to end of file
"base/16387/20218":Too many open files in system 
Oct  5 15:00:23 db01 postgres[75852]: [6431-1] failjava@dbcluster 75852 0: ERROR:  could not seek to end of file
"base/16387/20218":Too many open files in system 
Oct  5 15:00:24 db01 postgres[75852]: [6442-1] failjava@dbcluster 75852 0: ERROR:  could not seek to end of file
"base/16387/20218":Too many open files in system 
Oct  5 15:00:24 db01 postgres[75852]: [6449-1] failjava@dbcluster 75852 0: ERROR:  could not open file
"base/16387/20302":Too many open files in system 
Oct  5 15:00:24 db01 postgres[75852]: [6456-1] failjava@dbcluster 75852 0: ERROR:  could not seek to end of file
"base/16387/20013":Too many open files in system 
Oct  5 15:00:24 db01 postgres[75852]: [6463-1] failjava@dbcluster 75852 0: ERROR:  could not seek to end of file
"base/16387/20259":Too many open files in system 
Oct  5 15:00:24 db01 postgres[75852]: [6470-1] failjava@dbcluster 75852 0: ERROR:  could not seek to end of file
"base/16387/20259":Too many open files in system 
Oct  5 15:00:24 db01 postgres[75852]: [6480-1] failjava@dbcluster 75852 0: ERROR:  could not open file
"base/16387/20275":Too many open files in system 
Oct  5 15:00:25 db01 postgres[75852]: [6481-1] failjava@dbcluster 75852 0: ERROR:  current transaction is aborted,
commandsignored until end of transaction block 
Oct  5 15:00:25 db01 postgres[76648]: [5944-1] failjava@dbcluster 76648 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:00:25 db01 postgres[76649]: [5944-1] failjava@dbcluster 76649 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:00:25 db01 postgres[76407]: [5894-1] failjava@dbcluster 76407 0: WARNING:  terminating connection because of
crashof another server process 

For the sake of completeness, I looked to correlate the above relations with the tables that had problems, and there is
zerocorrelation between seek(2) errors and tables that had VACUUM reported invalid pages. 

     nspname     |     relname     | relnamespace | relfilenode | relkind |  oid
-----------------+-----------------+--------------+-------------+---------+-------
 failjava_master | tbl_F           |        16388 |       16402 | r       | 16402
 failjava_db1    | tbl_D           |        16417 |       18473 | r       | 18473
 failjava_db1    | tbl_E           |        16417 |       20179 | r       | 20179
 failjava_db1    | tbl_C           |        16417 |       20187 | r       | 20187
 failjava_db1    | tbl_B           |        16417 |       20200 | r       | 20200
 failjava_db1    | tbl_A           |        16417 |       41731 | r       | 19533


But I'm wondering if it died a second time before the system had finished its recovery:

Oct  5 15:01:31 db01 postgres[76888]: [6503-1] @ 76888 0: WARNING:  could not open statistics file
"pg_stat_tmp/pgstat.stat":Too many open files in system 
Oct  5 15:01:31 db01 postgres[77571]: [5978-1] @ 77571 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:01:31 db01 postgres[77572]: [5979-1] failjava@dbcluster 77572 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:01:31 db01 postgres[77572]: [5980-1] failjava@dbcluster 77572 0: WARNING:  terminating connection because of
crashof another server process 
Oct  5 15:01:31 db01 postgres[77544]: [7490-1] failjava@dbcluster 77544 0: WARNING:  terminating connection because of
crashof another server process 

Unfortunately the logs that I have for the first set of crashes are limited to WARNING+ and don't include INFO log
levels(though I have it later on. Due to various IRL constraints - hardware failure of the intended host - the
provisioningof the master was a bit of a cannonball dash). 

> I think the uninitialized pages are showing up because you had crashes
> in the midst of relation-extension operations, ie, some other backend
> had successfully done an smgrextend but hadn't yet laid down any valid
> data in the new page.  However, this theory would not explain more than
> one uninitialized page per crash, and your previous message seems to
> show rather a lot of uninitialized pages.  How many pipe-failed crashes
> did you have?

Assuming ~1min per recovery, it looks like around 38.

Oct  5 15:00:25 db01 postgres[76648]: [5944-1] failjava@dbcluster 76648 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:00:25 db01 postgres[76649]: [5944-1] failjava@dbcluster 76649 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:01:31 db01 postgres[77571]: [5978-1] @ 77571 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:01:31 db01 postgres[77572]: [5979-1] failjava@dbcluster 77572 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:02:20 db01 postgres[78473]: [5987-1] failjava@dbcluster 78473 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:03:47 db01 postgres[79408]: [6005-1] failjava@dbcluster 79408 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:05:06 db01 postgres[80440]: [6112-1] failjava@dbcluster 80440 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:06:09 db01 postgres[81276]: [6117-1] @ 81276 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:07:04 db01 postgres[82102]: [6131-1] failjava@dbcluster 82102 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:08:19 db01 postgres[82336]: [6136-1] @ 82336 0: FATAL:  could not open archive status directory
"pg_xlog/archive_status":Too many open files in system 
Oct  5 15:08:20 db01 postgres[83084]: [6161-1] @ 83084 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:08:22 db01 postgres[83091]: [6184-1] failjava@dbcluster 83091 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:09:36 db01 postgres[84096]: [6218-1] failjava@dbcluster 84096 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:10:50 db01 postgres[85064]: [6226-1] failjava@dbcluster 85064 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:12:05 db01 postgres[86106]: [6233-1] failjava@dbcluster 86106 0: FATAL:  could not open relation mapping file
"global/pg_filenode.map":Too many open files in system 
Oct  5 15:12:05 db01 postgres[86107]: [6233-1] failjava@dbcluster 86107 0: FATAL:  could not open relation mapping file
"global/pg_filenode.map":Too many open files in system 
Oct  5 15:12:05 db01 postgres[86108]: [6233-1] failjava@dbcluster 86108 0: FATAL:  could not open relation mapping file
"global/pg_filenode.map":Too many open files in system 
Oct  5 15:12:05 db01 postgres[86109]: [6233-1] failjava@dbcluster 86109 0: FATAL:  could not open relation mapping file
"global/pg_filenode.map":Too many open files in system 
Oct  5 15:12:05 db01 postgres[86105]: [6234-1] failjava@dbcluster 86105 0: FATAL:  could not open file
"base/16387/PG_VERSION":Too many open files in system 
Oct  5 15:12:08 db01 postgres[86119]: [6252-1] failjava@dbcluster 86119 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:13:36 db01 postgres[87126]: [6295-1] failjava@dbcluster 87126 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:14:23 db01 postgres[88122]: [6364-1] failjava@dbcluster 88122 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:15:31 db01 postgres[88370]: [6369-1] @ 88370 0: FATAL:  could not open archive status directory
"pg_xlog/archive_status":Too many open files in system 
Oct  5 15:15:31 db01 postgres[89057]: [6398-1] @ 89057 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:15:34 db01 postgres[89067]: [6426-1] failjava@dbcluster 89067 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:16:38 db01 postgres[90080]: [6430-1] @ 90080 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:17:19 db01 postgres[90941]: [6446-1] failjava@dbcluster 90941 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:18:43 db01 postgres[91898]: [6484-1] failjava@dbcluster 91898 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:20:02 db01 postgres[93068]: [6514-1] failjava@dbcluster 93068 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:22:03 db01 postgres[94275]: [6519-1] failjava@dbcluster 94275 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:22:03 db01 postgres[94276]: [6519-1] failjava@dbcluster 94276 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:23:39 db01 postgres[95392]: [6525-1] failjava@dbcluster 95392 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:24:52 db01 postgres[96371]: [6533-1] failjava@dbcluster 96371 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:26:22 db01 postgres[97337]: [6570-1] failjava@dbcluster 97337 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:27:48 db01 postgres[98419]: [6607-1] failjava@dbcluster 98419 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:29:02 db01 postgres[99379]: [6706-1] failjava@dbcluster 99379 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:30:20 db01 postgres[467]: [6719-1] failjava@dbcluster 467 0: FATAL:  pipe() failed: Too many open files in
system
Oct  5 15:32:08 db01 postgres[1479]: [6769-1] failjava@dbcluster 1479 0: FATAL:  pipe() failed: Too many open files in
system
Oct  5 15:33:28 db01 postgres[2714]: [6776-1] failjava@dbcluster 2714 0: FATAL:  pipe() failed: Too many open files in
system
Oct  5 15:34:18 db01 postgres[3600]: [6870-1] failjava@dbcluster 3600 0: FATAL:  pipe() failed: Too many open files in
system
Oct  5 15:36:07 db01 postgres[4608]: [6896-1] failjava@dbcluster 4608 0: FATAL:  pipe() failed: Too many open files in
system
Oct  5 15:37:18 db01 postgres[5652]: [6944-1] failjava@dbcluster 5652 0: FATAL:  pipe() failed: Too many open files in
system
Oct  5 15:38:56 db01 postgres[6640]: [7018-1] failjava@dbcluster 6640 0: FATAL:  pipe() failed: Too many open files in
system
Oct  5 15:39:46 db01 postgres[7680]: [7025-1] failjava@dbcluster 7680 0: FATAL:  pipe() failed: Too many open files in
system
Oct  5 15:41:08 db01 postgres[8609]: [7081-1] failjava@dbcluster 8609 0: FATAL:  pipe() failed: Too many open files in
system
Oct  5 15:42:20 db01 postgres[9546]: [7095-1] failjava@dbcluster 9546 0: FATAL:  pipe() failed: Too many open files in
system
Oct  5 15:43:53 db01 postgres[10468]: [7127-1] failjava@dbcluster 10468 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:45:32 db01 postgres[11634]: [7159-1] failjava@dbcluster 11634 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:47:10 db01 postgres[12721]: [7184-1] failjava@dbcluster 12721 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:48:47 db01 postgres[13831]: [7194-1] failjava@dbcluster 13831 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:49:47 db01 postgres[14912]: [7217-1] failjava@dbcluster 14912 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:51:32 db01 postgres[15928]: [7223-1] failjava@dbcluster 15928 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:52:34 db01 postgres[16954]: [7231-1] failjava@dbcluster 16954 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:53:49 db01 postgres[17963]: [7247-1] failjava@dbcluster 17963 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:55:36 db01 postgres[19309]: [7269-1] failjava@dbcluster 19309 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:57:08 db01 postgres[20273]: [7286-1] failjava@dbcluster 20273 0: FATAL:  pipe() failed: Too many open files
insystem 
Oct  5 15:58:34 db01 postgres[20616]: [7291-1] @ 20616 0: FATAL:  could not open archive status directory
"pg_xlog/archive_status":Too many open files in system 
Oct  5 15:58:34 db01 postgres[21404]: [7293-1] @ 21404 0: FATAL:  could not open archive status directory
"pg_xlog/archive_status":Too many open files in system 
Oct  5 15:58:47 db01 postgres[21442]: [7294-1] @ 21442 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:58:49 db01 postgres[21450]: [7318-1] failjava@dbcluster 21450 0: FATAL:  pipe() failed: Too many open files
insystem 
[ Once I setup monitoring, I became aware that the app (in a 100% idle state) was consuming ~80K fds because the app's
initializationprocedure touches nearly all 500 tables across it's ~2K connections ] 

<rant>It would be exceptionally nice if JDBC grew a brain and be written in an intelligent manner such that pgbouncer
couldbe used in transaction mode, thereby allowing exceptionally poorly written apps and database connection code which
establish5-20 connections per thread could be parked on the front of pgbouncer instead of needing to talk directly to a
backendprocess. Use of duplicate names for prepared statements, or preparing statements outside of a transaction really
tiesthe hands of folk who are attempting to insert additional layers in the stack to scale up applications that are
effectivelyin an immutable and broken state of disrepair. <pipedream>A more realistic but disgusting solution would be
toteach pgbouncer to intercept all prepared statements from a given client and on-the-fly rewrite the names to
somethingthat acts as a unique cookie and could be reused across backends if a given backend hasn't seen a unique
preparedstatement before.</pipedream></rant> 


I'm trying to determine that now. I actually only included a small snippet of the uninitialized pages. It looks like
thereare 28 total uninitialized pages that VACUUM discovered: 

[ autovacuum appears to have caught these two, however the slave wasn't turned up at this point ]
Oct  5 19:42:44 db01 postgres[37748]: [8908-1] @ 37748 0: WARNING:  relation "tbl_E" page 4263 is uninitialized ---
fixing
Oct  5 19:43:03 db01 postgres[37324]: [8908-1] @ 37324 0: WARNING:  relation "tbl_B" page 2628 is uninitialized ---
fixing

[ And the remaining 26 were caught via an explicitly executed `vacuumdb -az` ]
Oct  6 17:03:24 db01 postgres[73864]: [8911-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_F" page 65722 is
uninitialized--- fixing 
Oct  6 17:03:43 db01 postgres[73864]: [8912-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_D" page 3563 is
uninitialized--- fixing 
Oct  6 17:03:43 db01 postgres[73864]: [8913-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_D" page 3564 is
uninitialized--- fixing 
Oct  6 17:03:43 db01 postgres[73864]: [8914-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_D" page 3565 is
uninitialized--- fixing 
Oct  6 17:03:49 db01 postgres[73864]: [8915-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 65305 is
uninitialized--- fixing 
Oct  6 17:03:50 db01 postgres[73864]: [8916-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 70927 is
uninitialized--- fixing 
Oct  6 17:03:50 db01 postgres[73864]: [8917-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 75286 is
uninitialized--- fixing 
Oct  6 17:03:50 db01 postgres[73864]: [8918-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 86601 is
uninitialized--- fixing 
Oct  6 17:03:50 db01 postgres[73864]: [8919-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 91434 is
uninitialized--- fixing 
Oct  6 17:03:50 db01 postgres[73864]: [8920-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 91435 is
uninitialized--- fixing 
Oct  6 17:03:51 db01 postgres[73864]: [8921-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 95639 is
uninitialized--- fixing 
Oct  6 17:03:51 db01 postgres[73864]: [8922-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 105638 is
uninitialized--- fixing 
Oct  6 17:03:51 db01 postgres[73864]: [8923-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 114550 is
uninitialized--- fixing 
Oct  6 17:03:51 db01 postgres[73864]: [8924-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 118380 is
uninitialized--- fixing 
Oct  6 17:03:51 db01 postgres[73864]: [8925-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 121478 is
uninitialized--- fixing 
Oct  6 17:03:51 db01 postgres[73864]: [8926-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 122098 is
uninitialized--- fixing 
Oct  6 17:03:52 db01 postgres[73864]: [8927-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 127195 is
uninitialized--- fixing 
Oct  6 17:03:53 db01 postgres[73864]: [8928-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 152416 is
uninitialized--- fixing 
Oct  6 17:03:55 db01 postgres[73864]: [8929-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 230192 is
uninitialized--- fixing 
Oct  6 17:03:56 db01 postgres[73864]: [8930-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 241091 is
uninitialized--- fixing 
Oct  6 17:20:31 db01 postgres[73864]: [8931-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_A" page 32379133 is
uninitialized--- fixing 
Oct  6 17:20:32 db01 postgres[73864]: [8932-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_A" page 32385954 is
uninitialized--- fixing 
Oct  6 17:20:32 db01 postgres[73864]: [8933-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_A" page 32393740 is
uninitialized--- fixing 
Oct  6 17:20:32 db01 postgres[73864]: [8934-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_A" page 32394599 is
uninitialized--- fixing 
Oct  6 17:20:32 db01 postgres[73864]: [8935-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_A" page 32402510 is
uninitialized--- fixing 
Oct  6 17:20:33 db01 postgres[73864]: [8936-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_A" page 32409328 is
uninitialized--- fixing 


>> What's odd to me is not the failure scenarios that come from a system running out of FDs (though seek(2)'ing
consumingan FD seems odd), it's more that it's still possible for a master DB's VACUUM to clean up a bogus or partial
pagewrite, and have the slave crash when the WAL entry is shipped over. 
>
> It looks to me like vacuumlazy.c doesn't bother to emit a WAL record at
> all when fixing an all-zeroes heap page.  I'm not sure if that's a
> problem or not.  The first actual use of such a page ought to result in
> re-init'ing it anyway (cf XLOG_HEAP_INIT_PAGE logic in heap_insert),
> so right offhand I don't see a path from this to the slave-side failures
> you saw.  (But on the other hand I'm only firing on one cylinder today
> because of a head cold, so maybe I'm just missing it.)

(Good luck w/ the recovery)

> Do the slave-side failures correspond to pages that were reported as
> "fixed" on the master?


It doesn't appear so.

Oct  6 17:12:03 db02 postgres[15191]: [13-1] @ 15191 0: WARNING:  page 2943 of relation base/16387/20196 is
uninitialized

I do see a few other odd or related messages on the slave, however:

[ from a recovery ]
Oct  6 07:14:50 db02 postgres[11972]: [12-1] @ 11972 0: LOG:  record with zero length at 9E/699842C0
Oct  6 07:14:50 db02 postgres[11972]: [13-1] @ 11972 0: LOG:  redo done at 9E/69984290
Oct  6 07:14:50 db02 postgres[11972]: [14-1] @ 11972 0: LOG:  last completed transaction was at log time 2012-10-06
07:12:11.87694+00
Oct  6 07:14:50 db02 postgres[11972]: [15-1] @ 11972 0: LOG:  checkpoint starting: end-of-recovery immediate

[ Best to let sleeping dragons lie, I shut the slave down for the night in case there was some form of feedback that
couldbe sent back that could hang the master ] 
Oct  6 07:51:32 db02 postgres[15505]: [16-1] @ 15505 0: LOG:  database system is shut down

[ I think this is from trying to connect to an ipv6 addr for localhost, but IPv6 has been disabled. Including because
theverbosity level seems excessive. ] 
Oct  6 16:36:06 db02 postgres[4529]: [7-1] @ 4529 0: LOG:  could not create socket for statistics collector: Protocol
notsupported 
Oct  6 16:36:06 db02 postgres[4529]: [8-1] @ 4529 0: LOG:  trying another address for the statistics collector

[ Here the database noted the bogus page, but it was still able to start up ]
Oct  6 16:36:06 db02 postgres[4535]: [9-1] @ 4535 0: LOG:  database system was interrupted; last known up at 2012-10-06
08:07:41UTC 
Oct  6 16:36:06 db02 postgres[4535]: [10-1] @ 4535 0: LOG:  entering standby mode
Oct  6 16:36:06 db02 postgres[4535]: [11-1] @ 4535 0: LOG:  unexpected pageaddr 88/EE000000 in log file 158, segment
110,offset 0 

[ Including this as a small nit: there are two lines emitted for the following log entry, which seems incorrect ]
Oct  6 16:41:31 db02 postgres[6416]: [9-1] @ 6416 0: FATAL:  could not connect to the primary server: fe_sendauth: no
passwordsupplied 
Oct  6 16:41:31 db02 postgres[6416]: [9-2]

Oct  6 16:41:36 db02 postgres[6445]: [9-1] @ 6445 0: LOG:  streaming replication successfully connected to primary
Oct  6 16:41:37 db02 postgres[4535]: [12-1] @ 4535 0: LOG:  redo starts at 9E/6E000020
Oct  6 16:41:37 db02 postgres[4535]: [13-1] @ 4535 0: LOG:  consistent recovery state reached at 9E/6E003E10
Oct  6 16:41:37 db02 postgres[4529]: [9-1] @ 4529 0: LOG:  database system is ready to accept read only connections


[ Here's what the slave sees on a crash ]
Oct  6 17:09:17 db02 postgres[15561]: [10-1] pgsql@postgres 15561 0: FATAL:  the database system is starting up
Oct  6 17:11:52 db02 postgres[15191]: [12-1] @ 15191 0: LOG:  consistent recovery state reached at 9E/E3DDE2B0
Oct  6 17:11:52 db02 postgres[15185]: [9-1] @ 15185 0: LOG:  database system is ready to accept read only connections
Oct  6 17:12:03 db02 postgres[15191]: [13-1] @ 15191 0: WARNING:  page 2943 of relation base/16387/20196 is
uninitialized
Oct  6 17:12:03 db02 postgres[15191]: [13-2] @ 15191 0: CONTEXT:  xlog redo vacuum: rel 1663/16387/20196; blk 2944,
lastBlockVacuumed2403 
Oct  6 17:12:03 db02 postgres[15191]: [14-1] @ 15191 0: PANIC:  WAL contains references to invalid pages
Oct  6 17:12:03 db02 postgres[15191]: [14-2] @ 15191 0: CONTEXT:  xlog redo vacuum: rel 1663/16387/20196; blk 2944,
lastBlockVacuumed2403 
Oct  6 17:12:03 db02 postgres[15185]: [10-1] @ 15185 0: LOG:  startup process (PID 15191) was terminated by signal 6:
Aborttrap 
[ /me scratches head for a bit ]
Oct  6 17:21:34 db02 postgres[19614]: [9-1] @ 19614 0: LOG:  database system was interrupted while in recovery at log
time2012-10-06 08:07:24 UTC 
Oct  6 17:21:34 db02 postgres[19614]: [9-2] @ 19614 0: HINT:  If this has occurred more than once some data might be
corruptedand you might need to choose an earlier recovery target. 
Oct  6 17:21:34 db02 postgres[19614]: [10-1] @ 19614 0: LOG:  entering standby mode
Oct  6 17:21:34 db02 postgres[19614]: [11-1] @ 19614 0: LOG:  redo starts at 9E/6E000020

[ This is the point that I came up with my hypothesis and enabled full_page_writes ]
Oct  6 17:24:14 db02 postgres[19608]: [9-1] @ 19608 0: LOG:  received SIGHUP, reloading configuration files
Oct  6 17:24:14 db02 postgres[19608]: [10-1] @ 19608 0: LOG:  parameter "full_page_writes" changed to "on"
Oct  6 17:26:34 db02 postgres[19614]: [12-1] @ 19614 0: LOG:  consistent recovery state reached at 9E/E3DDE2B0
Oct  6 17:26:34 db02 postgres[19608]: [11-1] @ 19608 0: LOG:  database system is ready to accept read only connections
Oct  6 17:26:44 db02 postgres[19614]: [13-1] @ 19614 0: WARNING:  page 2943 of relation base/16387/20196 is
uninitialized
Oct  6 17:26:44 db02 postgres[19614]: [13-2] @ 19614 0: CONTEXT:  xlog redo vacuum: rel 1663/16387/20196; blk 2944,
lastBlockVacuumed2403 
Oct  6 17:26:44 db02 postgres[19614]: [14-1] @ 19614 0: PANIC:  WAL contains references to invalid pages
Oct  6 17:26:44 db02 postgres[19614]: [14-2] @ 19614 0: CONTEXT:  xlog redo vacuum: rel 1663/16387/20196; blk 2944,
lastBlockVacuumed2403 
Oct  6 17:26:44 db02 postgres[19608]: [12-1] @ 19608 0: LOG:  startup process (PID 19614) was terminated by signal 6:
Aborttrap 
Oct  6 17:26:44 db02 postgres[19608]: [13-1] @ 19608 0: LOG:  terminating any other active server processes
[ Given things were still panic'ing on the same page of data, I enabled full_page_writes on the master, performed a
VACUUM,CHECKPOINT and begin resync'ing data to the slave ] 

[ And here I started up the slave again ]
Oct  7 00:32:00 db02 postgres[66904]: [9-1] @ 66904 0: LOG:  database system was interrupted; last known up at
2012-10-0617:48:10 UTC 
Oct  7 00:32:00 db02 postgres[66904]: [10-1] @ 66904 0: LOG:  entering standby mode
Oct  7 00:32:00 db02 postgres[66904]: [11-1] @ 66904 0: LOG:  unexpected pageaddr 89/B9000000 in log file 159, segment
55,offset 0 
Oct  7 00:32:00 db02 postgres[66906]: [9-1] [unknown]@[unknown] 66906 0: LOG:  connection received: host=[local]
Oct  7 00:32:00 db02 postgres[66906]: [10-1] pgsql@postgres 66906 0: FATAL:  the database system is starting up
Oct  7 00:32:00 db02 postgres[66905]: [9-1] @ 66905 0: LOG:  streaming replication successfully connected to primary
Oct  7 00:32:01 db02 postgres[66913]: [9-1] [unknown]@[unknown] 66913 0: LOG:  connection received: host=[local]
Oct  7 00:32:01 db02 postgres[66913]: [10-1] pgsql@postgres 66913 0: FATAL:  the database system is starting up
Oct  7 00:32:01 db02 postgres[66904]: [12-1] @ 66904 0: LOG:  redo starts at 9F/36000020
Oct  7 00:32:01 db02 postgres[66904]: [13-1] @ 66904 0: LOG:  consistent recovery state reached at 9F/36003FD8
Oct  7 00:32:02 db02 postgres[66897]: [9-1] @ 66897 0: LOG:  database system is ready to accept read only connections

[ At this point I initiated a VACUUM and rebuilt a 400GB table using pg_reorg(1) to evict all WAL files that would
containpossibly poisonous data ] 


And that's about all I know at this point in time. Things have been up and running successfully and without issue ever
since.I'm tempted to pg_reorg(1) all of the tables in order to guarantee that all tables are 100% intact since I can't
easilyVACUUM FULL these tables (several of the largest, most frequently used are several TB in size). LMK if there's
anythingelse you're interested in seeing or having outlined. Again, the master recovered gracefully, didn't require any
manualintervention other than increasing kern.maxfiles, and hasn't skipped a beat once since kern.maxfiles was
increased.Having the slaves crap out when the master executed a VACUUM, however, is of interest. When running the
manualVACUUM, I had a window open to both db01 and db02 and db02's backend panic'ing was clearly linked to the VACUUM
warningson the db01. 

In the regression testing environments, is it possible to enable a test mode that runs with full_page_writes disabled
inorder to push the boundaries of disabling full_page_writes? The IO bandwidth savings are substantial for small
mutations,and for folks lucky enough to run on ZFS (i.e. not ufs, ext*), it'd be really cool to be able to realize
thosebandwidth savings. As is, it appears that standalone instances of Pg are full_page_writes safe, but that slaves
arestill making assumptions about full page WAL entries being the MTU during some corner cases. My observations lead me
tobelieve that replication works fine with full_page_writes disabled until the master cleans up a zero-filled or
damagedpage. 



For the archives (if anyone makes it this far to the bottom of the note), what I did to operationally "fix" things:

1) On the master I re-enabled full_page_writes and HUP'ed pg to reload the setting.
2) Fired off a VACUUM ANALYZE to generate some WAL data. This step isn't necessary, but I wanted to make sure that
therewas something to CHECKPOINT. 
3) Issue a CHECKPOINT & wait until this is completed.
4) Re-sync data to the slave (via rsync(1) or whatever your tool of choice is).
5) Restarted the slave.
6) Verify that the slave is in fact replicating via pg_stat_replication


-sc



--
Sean Chittenden
sean@chittenden.org

Re: 'full_page_writes=off' , VACUUM and crashing streaming slaves...

From

Sean Chittenden

Date:

11 October 2012, 03:03:34

>> Oct  5 15:00:25 db01 postgres[76648]: [5944-1] javafail@dbcluster 76648 0: FATAL:  pipe() failed: Too many open
filesin system 
>
> This message must be coming from initSelfPipe(), and after poking around
> a bit I think the failure must be occurring while a new backend is
> attempting to do "OwnLatch(&MyProc->procLatch)" in InitProcess.  The
> reason the postmaster treats this as a crash is that the new backend
> just armed the dead-man switch (MarkPostmasterChildActive) but it exits
> without doing ProcKill which would disarm it.  So this is just an
> order-of-operations bug in InitProcess: we're assuming that it can't
> fail before reaching "on_shmem_exit(ProcKill, 0)", and the latch
> additions broke that.  (Though looking at it, assuming that the
> PGSemaphoreReset call cannot fail seems a tad risky too.)
>
> So that explains the crashes, but it doesn't (directly) explain why you
> had data corruption.

I've since been able to repeat this with full_page_writes=on and have had identical corruption, so I don't think this
isfull_page_writes related any more. I can also confirm that there was just one crash by the master database because it
requiredmanual intervention to bring back up (a backup label was sitting stale in the data dir[1]). 

The slaves died during the post-crash VACUUM just as earlier. I'm completing the VACUUM now and am seeing multiple
warningsfrom VACUUM. 

WARNING:  relation "tbl_a" page 2115352 is uninitialized --- fixing
WARNING:  relation "tbl_a" page 2115353 is uninitialized --- fixing
WARNING:  relation "tbl_a" page 2115354 is uninitialized --- fixing
WARNING:  relation "tbl_a" page 2115355 is uninitialized --- fixing
WARNING:  relation "tbl_a" page 2115356 is uninitialized --- fixing
WARNING:  relation "tbl_a" page 2115357 is uninitialized --- fixing
WARNING:  relation "tbl_a" page 2115358 is uninitialized --- fixing
WARNING:  relation "tbl_a" page 2115359 is uninitialized --- fixing
WARNING:  relation "tbl_a" page 2115360 is uninitialized --- fixing

On one of the slaves:

Oct 11 00:17:36 db02 postgres[66904]: [21-1] @ 66904 0: WARNING:  page 120547 of relation base/16387/20196 is
uninitialized
Oct 11 00:17:36 db02 postgres[66904]: [21-2] @ 66904 0: CONTEXT:  xlog redo vacuum: rel 1663/16387/20196; blk 125016,
lastBlockVacuumed0 
Oct 11 00:17:36 db02 postgres[66904]: [22-1] @ 66904 0: PANIC:  WAL contains references to invalid pages
Oct 11 00:17:36 db02 postgres[66904]: [22-2] @ 66904 0: CONTEXT:  xlog redo vacuum: rel 1663/16387/20196; blk 125016,
lastBlockVacuumed0 
Oct 11 00:17:36 db02 postgres[66897]: [10-1] @ 66897 0: LOG:  startup process (PID 66904) was terminated by signal 6:
Aborttrap 
Oct 11 00:17:36 db02 postgres[66897]: [11-1] @ 66897 0: LOG:  terminating any other active server processes

Hopefully this is helpful information. -sc



[1] The backup label was residual from the initial sync and wasn't cleaned up for some reason during a
pg_stop_backup().For now I'm chalking this up as a bug in repmgr even though repmgr completed cloning the slave
successfully(supposedly). 

START WAL LOCATION: 9F/36000020 (file 000000010000009F00000036)
CHECKPOINT LOCATION: 9F/37E9D6D8
BACKUP METHOD: pg_start_backup
BACKUP FROM: master
START TIME: 2012-10-06 17:48:10 UTC
LABEL: repmgr_standby_clone_1349545601

--
Sean Chittenden
sean@chittenden.org