Re: 'full_page_writes=off' , VACUUM and crashing streaming slaves... - Mailing list pgsql-general

From Tom Lane
Subject Re: 'full_page_writes=off' , VACUUM and crashing streaming slaves...
Date
Msg-id 12579.1349626116@sss.pgh.pa.us
Whole thread Raw
In response to Re: 'full_page_writes=off' , VACUUM and crashing streaming slaves...  (Sean Chittenden <sean@chittenden.org>)
Responses Re: 'full_page_writes=off' , VACUUM and crashing streaming slaves...  (Sean Chittenden <sean@chittenden.org>)
Re: 'full_page_writes=off' , VACUUM and crashing streaming slaves...  (Sean Chittenden <sean@chittenden.org>)
List pgsql-general
Sean Chittenden <sean@chittenden.org> writes:
>> If you've got the postmaster logs from this episode, it would be useful
>> to see what complaints got logged.

> The first crash scenario:

> Oct  5 15:00:24 db01 postgres[75852]: [6449-2] javafail@dbcluster 75852 0: STATEMENT:  SELECT /* query */ FROM tbl AS
this_WHERE this_.user_id=$1 
> Oct  5 15:00:24 db01 postgres[75852]: [6456-1] javafail@dbcluster 75852 0: ERROR:  could not seek to end of file
"base/16387/20013":Too many open files in system 
> [snip - lots of could not seek to end of file errors. How does seek(2) consume a file descriptor??? ]

It doesn't, but FileSeek() might need to do an open if the file wasn't
currently open.  This isn't that surprising.

> Oct  5 15:00:25 db01 postgres[76648]: [5944-1] javafail@dbcluster 76648 0: FATAL:  pipe() failed: Too many open files
insystem 

This message must be coming from initSelfPipe(), and after poking around
a bit I think the failure must be occurring while a new backend is
attempting to do "OwnLatch(&MyProc->procLatch)" in InitProcess.  The
reason the postmaster treats this as a crash is that the new backend
just armed the dead-man switch (MarkPostmasterChildActive) but it exits
without doing ProcKill which would disarm it.  So this is just an
order-of-operations bug in InitProcess: we're assuming that it can't
fail before reaching "on_shmem_exit(ProcKill, 0)", and the latch
additions broke that.  (Though looking at it, assuming that the
PGSemaphoreReset call cannot fail seems a tad risky too.)

So that explains the crashes, but it doesn't (directly) explain why you
had data corruption.

I think the uninitialized pages are showing up because you had crashes
in the midst of relation-extension operations, ie, some other backend
had successfully done an smgrextend but hadn't yet laid down any valid
data in the new page.  However, this theory would not explain more than
one uninitialized page per crash, and your previous message seems to
show rather a lot of uninitialized pages.  How many pipe-failed crashes
did you have?

> What's odd to me is not the failure scenarios that come from a system running out of FDs (though seek(2)'ing
consumingan FD seems odd), it's more that it's still possible for a master DB's VACUUM to clean up a bogus or partial
pagewrite, and have the slave crash when the WAL entry is shipped over. 

It looks to me like vacuumlazy.c doesn't bother to emit a WAL record at
all when fixing an all-zeroes heap page.  I'm not sure if that's a
problem or not.  The first actual use of such a page ought to result in
re-init'ing it anyway (cf XLOG_HEAP_INIT_PAGE logic in heap_insert),
so right offhand I don't see a path from this to the slave-side failures
you saw.  (But on the other hand I'm only firing on one cylinder today
because of a head cold, so maybe I'm just missing it.)

Do the slave-side failures correspond to pages that were reported as
"fixed" on the master?

            regards, tom lane


pgsql-general by date:

Previous
From: Daniele Varrazzo
Date:
Subject: Re: Question about PARTIAL DATE type/s
Next
From: John R Pierce
Date:
Subject: Re: [Mac OS X Mountain Lion] FATAL: could not create shared memory segment: Cannot allocate memory