Re: All-zero page in GIN index causes assertion failure - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: All-zero page in GIN index causes assertion failure
Date
Msg-id 55AD0266.4050300@iki.fi
Whole thread Raw
In response to All-zero page in GIN index causes assertion failure  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: All-zero page in GIN index causes assertion failure
Re: All-zero page in GIN index causes assertion failure
Re: All-zero page in GIN index causes assertion failure
List pgsql-hackers
On 07/20/2015 11:14 AM, Heikki Linnakangas wrote:
> ISTM ginvacuumcleanup should check for PageIsNew, and put the page to
> the FSM. That's what btvacuumpage() gistvacuumcleanup() do.
> spgvacuumpage() seems to also check for PageIsNew(), but it seems broken
> in a different way: it initializes the page and marks the page as dirty,
> but it is not WAL-logged. That is a problem at least if checksums are
> enabled: if you crash you might have a torn page on disk, with invalid
> checksum.

Looking closer, heap vacuum does a similar thing, but there are two 
mitigating factors that make it safe in practice, I think:

1. An empty heap page is all-zeroes except for the small page header in 
the beginning of the page. For a torn page to matter, the page would 
need to be torn in the header, but we rely elsewhere (control file) 
anyway that a 512-byte sector update is atomic, so that shouldn't 
happen. Note that this hinges on the fact that there is no special area 
on heap pages, so you cannot rely on this for index pages.

2. The redo of the first insert/update on a heap page will always 
re-initialize the page, even when full-page-writes are disabled. This is 
the XLOG_HEAP_INIT_PAGE optimization. So it's not just an optimization, 
it's required for correctness.


Heap update can also leave behind a page in the buffer cache that's been 
initialized by RelationGetBufferForTuple but not yet WAL-logged. 
However, it doesn't mark the buffer dirty, so the torn-page problem 
cannot happen because the page will not be flushed to disk if nothing 
else touches it. The XLOG_HEAP_INIT_PAGE optimization is needed in that 
case too, however.

B-tree, GiST, and SP-GiST's relation extension work similarly, but they 
have other mitigating factors. If a newly-initialized B-tree page is 
left behind in the relation, it won't be reused for anything, and vacuum 
will ignore it (by accident, I think; there is no explicit comment on 
what will happen to such pages, but it will be treated like an internal 
page and ignored). Eventually the buffer will be evicted from cache, and 
because it's not marked as dirty, it will not be flushed to disk, and 
will later be read back as all-zeros and vacuum will recycle it.

BRIN update is not quite right, however. brin_getinsertbuffer() can 
initialize a page, but the caller might bail out without using the page 
and WAL-logging the change. If that happens, the next update that uses 
the same page will WAL-log the change but it will not use the 
XLOG_BRIN_INIT_PAGE option, and redo will not initialize the page. Redo 
will fail.

BTW, shouldn't there be a step in BRIN vacuum that scans all the BRIN 
pages? If an empty page is missing from the FSM for any reason, there's 
nothing to add it there.

This is all very subtle. The whole business of leaving behind an 
already-initialized page in the buffer cache, without marking the buffer 
as dirty, is pretty ugly. I wish we had a more robust pattern to handle 
all-zero pages and relation extension.

Thoughts? As a minimal backpatchable fix, I think we should add the 
check in ginvacuumpage() to initialize any all-zeros pages it 
encounters. That needs to be WAL-logged, and WAL-logging needs to be 
added to the page initialization in spgvacuumpage too.

- Heikki




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [BUGS] object_classes array is broken, again
Next
From: Alexander Korotkov
Date:
Subject: Re: pg_trgm version 1.2