Re: All-zero page in GIN index causes assertion failure - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: All-zero page in GIN index causes assertion failure |
Date | |
Msg-id | 55AD0266.4050300@iki.fi Whole thread Raw |
In response to | All-zero page in GIN index causes assertion failure (Heikki Linnakangas <hlinnaka@iki.fi>) |
Responses |
Re: All-zero page in GIN index causes assertion failure
Re: All-zero page in GIN index causes assertion failure Re: All-zero page in GIN index causes assertion failure |
List | pgsql-hackers |
On 07/20/2015 11:14 AM, Heikki Linnakangas wrote: > ISTM ginvacuumcleanup should check for PageIsNew, and put the page to > the FSM. That's what btvacuumpage() gistvacuumcleanup() do. > spgvacuumpage() seems to also check for PageIsNew(), but it seems broken > in a different way: it initializes the page and marks the page as dirty, > but it is not WAL-logged. That is a problem at least if checksums are > enabled: if you crash you might have a torn page on disk, with invalid > checksum. Looking closer, heap vacuum does a similar thing, but there are two mitigating factors that make it safe in practice, I think: 1. An empty heap page is all-zeroes except for the small page header in the beginning of the page. For a torn page to matter, the page would need to be torn in the header, but we rely elsewhere (control file) anyway that a 512-byte sector update is atomic, so that shouldn't happen. Note that this hinges on the fact that there is no special area on heap pages, so you cannot rely on this for index pages. 2. The redo of the first insert/update on a heap page will always re-initialize the page, even when full-page-writes are disabled. This is the XLOG_HEAP_INIT_PAGE optimization. So it's not just an optimization, it's required for correctness. Heap update can also leave behind a page in the buffer cache that's been initialized by RelationGetBufferForTuple but not yet WAL-logged. However, it doesn't mark the buffer dirty, so the torn-page problem cannot happen because the page will not be flushed to disk if nothing else touches it. The XLOG_HEAP_INIT_PAGE optimization is needed in that case too, however. B-tree, GiST, and SP-GiST's relation extension work similarly, but they have other mitigating factors. If a newly-initialized B-tree page is left behind in the relation, it won't be reused for anything, and vacuum will ignore it (by accident, I think; there is no explicit comment on what will happen to such pages, but it will be treated like an internal page and ignored). Eventually the buffer will be evicted from cache, and because it's not marked as dirty, it will not be flushed to disk, and will later be read back as all-zeros and vacuum will recycle it. BRIN update is not quite right, however. brin_getinsertbuffer() can initialize a page, but the caller might bail out without using the page and WAL-logging the change. If that happens, the next update that uses the same page will WAL-log the change but it will not use the XLOG_BRIN_INIT_PAGE option, and redo will not initialize the page. Redo will fail. BTW, shouldn't there be a step in BRIN vacuum that scans all the BRIN pages? If an empty page is missing from the FSM for any reason, there's nothing to add it there. This is all very subtle. The whole business of leaving behind an already-initialized page in the buffer cache, without marking the buffer as dirty, is pretty ugly. I wish we had a more robust pattern to handle all-zero pages and relation extension. Thoughts? As a minimal backpatchable fix, I think we should add the check in ginvacuumpage() to initialize any all-zeros pages it encounters. That needs to be WAL-logged, and WAL-logging needs to be added to the page initialization in spgvacuumpage too. - Heikki
pgsql-hackers by date: