Thread: All-zero page in GIN index causes assertion failure

All-zero page in GIN index causes assertion failure

From
Heikki Linnakangas
Date:
This is a continuation of the discussion at 
http://www.postgresql.org/message-id/CAMkU=1zUc=h0oCZntaJaqqW7gxxVxCWsYq8DD2t7oHgsgVEsgA@mail.gmail.com, 
I'm starting a new thread as this is a separate issue than the original 
LWLock bug.

> On Thu, Jul 16, 2015 at 12:03 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>
>> On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas <hlinnaka@iki.fi>
>> wrote:
>>
>> I don't see how this is related to the LWLock issue, but I didn't see it
>> without your patch.  Perhaps the system just didn't survive long enough to
>> uncover it without the patch (although it shows up pretty quickly).  It
>> could just be an overzealous Assert, since the casserts off didn't show
>> problems.
>
>> bt and bt full are shown below.
>>
>> Cheers,
>>
>> Jeff
>>
>> #0  0x0000003dcb632625 in raise () from /lib64/libc.so.6
>> #1  0x0000003dcb633e05 in abort () from /lib64/libc.so.6
>> #2  0x0000000000930b7a in ExceptionalCondition (
>>     conditionName=0x9a1440 "!(((PageHeader) (page))->pd_special >=
>> (__builtin_offsetof (PageHeaderData, pd_linp)))", errorType=0x9a12bc
>> "FailedAssertion",
>>     fileName=0x9a12b0 "ginvacuum.c", lineNumber=713) at assert.c:54
>> #3  0x00000000004947cf in ginvacuumcleanup (fcinfo=0x7fffee073a90) at
>> ginvacuum.c:713
>>
>
> It now looks like this *is* unrelated to the LWLock issue.  The assert that
> it is tripping over was added just recently (302ac7f27197855afa8c) and so I
> had not been testing under its presence until now.  It looks like it is
> finding all-zero pages (index extended but then a crash before initializing
> the page?) and it doesn't like them.
>
> (gdb) f 3
> (gdb) p *(char[8192]*)(page)
> $11 = '\000' <repeats 8191 times>
>
> Presumably before this assert, such pages would just be permanently
> orphaned.

Yeah, so it seems. It's normal to have all-zero pages in the index, if 
you crash immediately after the relation has been extended, but before 
the new page has been WAL-logged. What is your test case like; did you 
do crash-testing?

ISTM ginvacuumcleanup should check for PageIsNew, and put the page to 
the FSM. That's what btvacuumpage() gistvacuumcleanup() do. 
spgvacuumpage() seems to also check for PageIsNew(), but it seems broken 
in a different way: it initializes the page and marks the page as dirty, 
but it is not WAL-logged. That is a problem at least if checksums are 
enabled: if you crash you might have a torn page on disk, with invalid 
checksum.

- Heikki



Re: All-zero page in GIN index causes assertion failure

From
Heikki Linnakangas
Date:
On 07/20/2015 11:14 AM, Heikki Linnakangas wrote:
> ISTM ginvacuumcleanup should check for PageIsNew, and put the page to
> the FSM. That's what btvacuumpage() gistvacuumcleanup() do.
> spgvacuumpage() seems to also check for PageIsNew(), but it seems broken
> in a different way: it initializes the page and marks the page as dirty,
> but it is not WAL-logged. That is a problem at least if checksums are
> enabled: if you crash you might have a torn page on disk, with invalid
> checksum.

Looking closer, heap vacuum does a similar thing, but there are two 
mitigating factors that make it safe in practice, I think:

1. An empty heap page is all-zeroes except for the small page header in 
the beginning of the page. For a torn page to matter, the page would 
need to be torn in the header, but we rely elsewhere (control file) 
anyway that a 512-byte sector update is atomic, so that shouldn't 
happen. Note that this hinges on the fact that there is no special area 
on heap pages, so you cannot rely on this for index pages.

2. The redo of the first insert/update on a heap page will always 
re-initialize the page, even when full-page-writes are disabled. This is 
the XLOG_HEAP_INIT_PAGE optimization. So it's not just an optimization, 
it's required for correctness.


Heap update can also leave behind a page in the buffer cache that's been 
initialized by RelationGetBufferForTuple but not yet WAL-logged. 
However, it doesn't mark the buffer dirty, so the torn-page problem 
cannot happen because the page will not be flushed to disk if nothing 
else touches it. The XLOG_HEAP_INIT_PAGE optimization is needed in that 
case too, however.

B-tree, GiST, and SP-GiST's relation extension work similarly, but they 
have other mitigating factors. If a newly-initialized B-tree page is 
left behind in the relation, it won't be reused for anything, and vacuum 
will ignore it (by accident, I think; there is no explicit comment on 
what will happen to such pages, but it will be treated like an internal 
page and ignored). Eventually the buffer will be evicted from cache, and 
because it's not marked as dirty, it will not be flushed to disk, and 
will later be read back as all-zeros and vacuum will recycle it.

BRIN update is not quite right, however. brin_getinsertbuffer() can 
initialize a page, but the caller might bail out without using the page 
and WAL-logging the change. If that happens, the next update that uses 
the same page will WAL-log the change but it will not use the 
XLOG_BRIN_INIT_PAGE option, and redo will not initialize the page. Redo 
will fail.

BTW, shouldn't there be a step in BRIN vacuum that scans all the BRIN 
pages? If an empty page is missing from the FSM for any reason, there's 
nothing to add it there.

This is all very subtle. The whole business of leaving behind an 
already-initialized page in the buffer cache, without marking the buffer 
as dirty, is pretty ugly. I wish we had a more robust pattern to handle 
all-zero pages and relation extension.

Thoughts? As a minimal backpatchable fix, I think we should add the 
check in ginvacuumpage() to initialize any all-zeros pages it 
encounters. That needs to be WAL-logged, and WAL-logging needs to be 
added to the page initialization in spgvacuumpage too.

- Heikki




Re: All-zero page in GIN index causes assertion failure

From
Alvaro Herrera
Date:
Heikki Linnakangas wrote:

> Looking closer, heap vacuum does a similar thing, but there are two
> mitigating factors that make it safe in practice, I think:
> 
> 1. An empty heap page is all-zeroes except for the small page header in the
> beginning of the page. For a torn page to matter, the page would need to be
> torn in the header, but we rely elsewhere (control file) anyway that a
> 512-byte sector update is atomic, so that shouldn't happen. Note that this
> hinges on the fact that there is no special area on heap pages, so you
> cannot rely on this for index pages.
> 
> 2. The redo of the first insert/update on a heap page will always
> re-initialize the page, even when full-page-writes are disabled. This is the
> XLOG_HEAP_INIT_PAGE optimization. So it's not just an optimization, it's
> required for correctness.

Hmm, I guess this is a case of an optimization hiding a bug :-(  I mean,
if we didn't have that, we would have noticed this a lot sooner, I
imagine.


> BRIN update is not quite right, however. brin_getinsertbuffer() can
> initialize a page, but the caller might bail out without using the page and
> WAL-logging the change. If that happens, the next update that uses the same
> page will WAL-log the change but it will not use the XLOG_BRIN_INIT_PAGE
> option, and redo will not initialize the page. Redo will fail.

Hmm, interesting.

> BTW, shouldn't there be a step in BRIN vacuum that scans all the BRIN pages?
> If an empty page is missing from the FSM for any reason, there's nothing to
> add it there.

Probably, yeah, makes sense.  If you recall, the original design I
proposed had a scan of the whole index (though for other reasons I am
thankful we got rid of that).

> This is all very subtle.

No kidding ...

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: All-zero page in GIN index causes assertion failure

From
Heikki Linnakangas
Date:
On 07/20/2015 07:11 PM, Alvaro Herrera wrote:
> Heikki Linnakangas wrote:
>
>> Looking closer, heap vacuum does a similar thing, but there are two
>> mitigating factors that make it safe in practice, I think:
>>
>> 1. An empty heap page is all-zeroes except for the small page header in the
>> beginning of the page. For a torn page to matter, the page would need to be
>> torn in the header, but we rely elsewhere (control file) anyway that a
>> 512-byte sector update is atomic, so that shouldn't happen. Note that this
>> hinges on the fact that there is no special area on heap pages, so you
>> cannot rely on this for index pages.
>>
>> 2. The redo of the first insert/update on a heap page will always
>> re-initialize the page, even when full-page-writes are disabled. This is the
>> XLOG_HEAP_INIT_PAGE optimization. So it's not just an optimization, it's
>> required for correctness.
>
> Hmm, I guess this is a case of an optimization hiding a bug :-(  I mean,
> if we didn't have that, we would have noticed this a lot sooner, I
> imagine.

Yeah, probably.

I came up with the attached, for GIN and SP-GiST. Instead of WAL-logging
the page initialization in SP-GiST vacuum, I changed it so that it
simply leaves the page as all-zeroes, and adds it to the FSM. The code
that grabs a target page from the FSM handles that, and initializes the
page anyway, so that was simpler. This made the SP-GiST is-deleted flag
obsolete, it's no longer set, although the code still checks for it for
backwards-compatibility. (even that may actually be unnecessary, as a
page that's marked as deleted must also be empty, and wherever we check
for the deleted-flag, we also check for PageIsEmpty()))

- Heikki


Attachment

Re: All-zero page in GIN index causes assertion failure

From
Jeff Janes
Date:
On Mon, Jul 20, 2015 at 1:23 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

I came up with the attached, for GIN and SP-GiST. Instead of WAL-logging the page initialization in SP-GiST vacuum, I changed it so that it simply leaves the page as all-zeroes, and adds it to the FSM. The code that grabs a target page from the FSM handles that, and initializes the page anyway, so that was simpler. This made the SP-GiST is-deleted flag obsolete, it's no longer set, although the code still checks for it for backwards-compatibility. (even that may actually be unnecessary, as a page that's marked as deleted must also be empty, and wherever we check for the deleted-flag, we also check for PageIsEmpty()))

This patch, in conjunction with the LWLock deadlock patch, fixes all the issues I was having with GIN indexes in 9.5.

I haven't tried SP-GiST.

Cheers,

Jeff

Re: All-zero page in GIN index causes assertion failure

From
Alvaro Herrera
Date:
Heikki Linnakangas wrote:

> BRIN update is not quite right, however. brin_getinsertbuffer() can
> initialize a page, but the caller might bail out without using the page and
> WAL-logging the change. If that happens, the next update that uses the same
> page will WAL-log the change but it will not use the XLOG_BRIN_INIT_PAGE
> option, and redo will not initialize the page. Redo will fail.

There's even a worse case.  In brin_getinsertbuffer, if an old buffer is
passed, and the function extends the index, and then
brin_getinsertbuffer finds that the old buffer has been used to extend
the revmap, then we don't even return the newly extended page.  We do
enter it into the FSM, though, and the FSM is vacuumed.

So next time around somebody requests a page from FSM, it might return
that page; but since it wasn't initialized, the insertion will fail.
Everybody expects the page to have been initialized previously, but this
will not be the case here.

I guess that function needs some restructuring, but it's pretty hairy
already ...

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: All-zero page in GIN index causes assertion failure

From
Heikki Linnakangas
Date:
On 07/23/2015 07:43 PM, Jeff Janes wrote:
> On Mon, Jul 20, 2015 at 1:23 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
>> I came up with the attached, for GIN and SP-GiST. Instead of WAL-logging
>> the page initialization in SP-GiST vacuum, I changed it so that it simply
>> leaves the page as all-zeroes, and adds it to the FSM. The code that grabs
>> a target page from the FSM handles that, and initializes the page anyway,
>> so that was simpler. This made the SP-GiST is-deleted flag obsolete, it's
>> no longer set, although the code still checks for it for
>> backwards-compatibility. (even that may actually be unnecessary, as a page
>> that's marked as deleted must also be empty, and wherever we check for the
>> deleted-flag, we also check for PageIsEmpty()))
>
> This patch, in conjunction with the LWLock deadlock patch, fixes all the
> issues I was having with GIN indexes in 9.5.

Thanks, I've pushed this, as well a fix to similar failure from B-tree 
vacuum that Amit Langote reported in the other thread.

- Heikki




Re: All-zero page in GIN index causes assertion failure

From
Amit Langote
Date:
On Tue, Jul 28, 2015 at 12:21 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> Thanks, I've pushed this, as well a fix to similar failure from B-tree
> vacuum that Amit Langote reported in the other thread.
>

Thanks Heikki and sorry I didn't notice this new thread.

Regards,
Amit



Re: All-zero page in GIN index causes assertion failure

From
Alvaro Herrera
Date:
Heikki Linnakangas wrote:

> BRIN update is not quite right, however. brin_getinsertbuffer() can
> initialize a page, but the caller might bail out without using the page and
> WAL-logging the change. If that happens, the next update that uses the same
> page will WAL-log the change but it will not use the XLOG_BRIN_INIT_PAGE
> option, and redo will not initialize the page. Redo will fail.

Here's a fix for BRIN: when brin_getinsertbuffer extends the relation
and doesn't return the page, it initializes and logs the page as an
empty regular page before returning, and also records it into the FSM.
That way, some future process that gets a page from FSM will use it,
preventing this type of problem from bloating the index forever.  Also,
this function no longer initializes the page when it returns it to the
caller: instead the caller (brin_doinsert or brin_doupdate) must do
that.  (Previously, the code was initializing the page outside the
critical section that WAL-logs this action).

> BTW, shouldn't there be a step in BRIN vacuum that scans all the BRIN pages?
> If an empty page is missing from the FSM for any reason, there's nothing to
> add it there.

Probably.  I didn't change this part yet.  There are two things to fix:
1. since we use log_newpage_buffer(), we log the initialization but not
the recording into FSM, so the page would be forgotten about.  This can
be tested with PageIsEmpty().  An alternative to the vacuum scan is to
use our own WAL record that not only logs the initialization itself but
also the FSM update.  Not sure this is worth the trouble.

2. additionally, if brin_getinsertbuffer extends the relation but we
crash before the caller initializes it, the page would be detected by
PageIsNew instead and would also need initialization.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: All-zero page in GIN index causes assertion failure

From
Alvaro Herrera
Date:

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: All-zero page in GIN index causes assertion failure

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:

> > BTW, shouldn't there be a step in BRIN vacuum that scans all the BRIN pages?
> > If an empty page is missing from the FSM for any reason, there's nothing to
> > add it there.
>
> Probably.  I didn't change this part yet.  There are two things to fix:
> 1. since we use log_newpage_buffer(), we log the initialization but not
> the recording into FSM, so the page would be forgotten about.  This can
> be tested with PageIsEmpty().  An alternative to the vacuum scan is to
> use our own WAL record that not only logs the initialization itself but
> also the FSM update.  Not sure this is worth the trouble.
>
> 2. additionally, if brin_getinsertbuffer extends the relation but we
> crash before the caller initializes it, the page would be detected by
> PageIsNew instead and would also need initialization.

Added this part.  It's using log_newpage_buffer too.  The vacuum scan
fixes the whole FSM, though, so after vacuum the FSM is up to date.
I think we could shave off a few bytes by using a separate WAL record,
but I'm not sure it's worth the trouble.

I intend to push this tomorrow.

I now think the free space calculations are broken, but I'll leave that
for later.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment