Re: PANIC in GIN code - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: PANIC in GIN code
Date
Msg-id 55904B7F.9030000@iki.fi
Whole thread Raw
In response to Re: PANIC in GIN code  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: PANIC in GIN code  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers
On 06/26/2015 10:53 PM, Jeff Janes wrote:
> On Fri, Jun 26, 2015 at 11:40 AM, Heikki Linnakangas <hlinnaka@iki.fi>
> wrote:
>
>> The page is being split (that's evident from "info=48" above).
>> ginPlaceToPage calls GinNewBuffer, which calls GetFreeIndexPage(). That
>> finds a page that can be recycled, and marks it as used.
>> RecordUsedIndexPage calls MarkBufferDirtyHint(), which in turn calls
>> XLogSaveBufferForHint() to create a full-page image record of the page.
>> That calls XLogBeginInsert() + XLogInsert(), and leaves the
>> begininsert_called == false.
>>
>> If you had assertions enabled, you'd see the assertion in
>> XLogBeginInsert() to fail.
>>
>> I'll look into that over the weekend..

Committed a fix, fortunately it was easy to rearrange that code so that 
XLogBeginInsert() is called after the GinNewBuffer() calls. Can you 
verify that it fixed your test case, please?

> Should the assertion in XLogBeginInsert() be a elog(FATAL,...) instead?
> The performance impact should be negligible (touches process-local memory
> which is already being touched a few instructions later) and it will
> produce better bug reports, and backtraces, should there be more issues.

Hmm, my original thinking was "no", so that if there is a spurious 
XLogBeginInsert() call somewhere without a corresponding XLogInsert() 
call, that wouldn't be unnecessarily promoted into an error in a 
non-assertions-enabled build. But thinking a bit more, if any data has 
already been registered as well, the WAL record will be badly corrupt, 
which is much worse. Also, if we look at this GIN bug, if 
XLogBeginInsert() had caught the error instead of XLogInsert(), it 
would've been caught outside the critical section, which is again better.

So yeah, I turned that into an elog(ERROR). (I don't see why it would 
need to be FATAL.)

- Heikki




pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Solaris testers wanted for strxfrm() behavior
Next
From: Heikki Linnakangas
Date:
Subject: Re: pg_rewind failure by file deletion in source server