Thread: PANIC in GIN code

PANIC in GIN code

From
Jeff Janes
Date:
Under high load against f7bb7f0625771bc71869cda, I occasionally get:

PANIC:  XLogBeginInsert was not called

It seems to only come from vacuuming.

Here is an example back-trace:

#0  0x0000003dcb632625 in raise () from /lib64/libc.so.6
#1  0x0000003dcb633e05 in abort () from /lib64/libc.so.6
#2  0x000000000079a39c in errfinish (dummy=<value optimized out>) at elog.c:551
#3  0x000000000079b0e4 in elog_finish (elevel=<value optimized out>, fmt=<value optimized out>) at elog.c:1368
#4  0x00000000004dd1fe in XLogInsert (rmid=13 '\r', info=48 '0') at xloginsert.c:412
#5  0x0000000000478e13 in ginPlaceToPage (btree=0x7fff1fbb7f60, stack=0x2878760, insertdata=<value optimized out>, updateblkno=<value optimized out>,
    childbuf=0, buildStats=<value optimized out>) at ginbtree.c:582
#6  0x0000000000479a33 in ginInsertValue (btree=0x7fff1fbb7f60, stack=0x2878760, insertdata=0x7fff1fbb7fe0, buildStats=0x0) at ginbtree.c:751
#7  0x000000000047364a in ginEntryInsert (ginstate=0x7fff1fbb8280, attnum=54624, key=1, category=<value optimized out>, items=0x287d3c0, nitem=1,
    buildStats=0x0) at gininsert.c:234
#8  0x000000000047ef4b in ginInsertCleanup (ginstate=0x7fff1fbb8280, vac_delay=<value optimized out>, stats=0x2868d90) at ginfast.c:843
#9  0x000000000047e024 in ginbulkdelete (fcinfo=<value optimized out>) at ginvacuum.c:547


From the code, it seems obvious that XLogBeginInsert is getting called at ginbtree.c line 373, so I think that some obscure code path of btree->placeToPage must be either consuming or resetting that XLogBeginInsert before it returns control to ginbtree.c


Cheers,

Jeff

Re: PANIC in GIN code

From
Heikki Linnakangas
Date:
On 06/26/2015 08:02 PM, Jeff Janes wrote:
> Under high load against f7bb7f0625771bc71869cda, I occasionally get:
>
> PANIC:  XLogBeginInsert was not called
>
> It seems to only come from vacuuming.
>
> Here is an example back-trace:
>
> #0  0x0000003dcb632625 in raise () from /lib64/libc.so.6
> #1  0x0000003dcb633e05 in abort () from /lib64/libc.so.6
> #2  0x000000000079a39c in errfinish (dummy=<value optimized out>) at
> elog.c:551
> #3  0x000000000079b0e4 in elog_finish (elevel=<value optimized out>,
> fmt=<value optimized out>) at elog.c:1368
> #4  0x00000000004dd1fe in XLogInsert (rmid=13 '\r', info=48 '0') at
> xloginsert.c:412
> #5  0x0000000000478e13 in ginPlaceToPage (btree=0x7fff1fbb7f60,
> stack=0x2878760, insertdata=<value optimized out>, updateblkno=<value
> optimized out>,
>      childbuf=0, buildStats=<value optimized out>) at ginbtree.c:582
> #6  0x0000000000479a33 in ginInsertValue (btree=0x7fff1fbb7f60,
> stack=0x2878760, insertdata=0x7fff1fbb7fe0, buildStats=0x0) at
> ginbtree.c:751
> #7  0x000000000047364a in ginEntryInsert (ginstate=0x7fff1fbb8280,
> attnum=54624, key=1, category=<value optimized out>, items=0x287d3c0,
> nitem=1,
>      buildStats=0x0) at gininsert.c:234
> #8  0x000000000047ef4b in ginInsertCleanup (ginstate=0x7fff1fbb8280,
> vac_delay=<value optimized out>, stats=0x2868d90) at ginfast.c:843
> #9  0x000000000047e024 in ginbulkdelete (fcinfo=<value optimized out>) at
> ginvacuum.c:547
>
>
>  From the code, it seems obvious that XLogBeginInsert is getting called at
> ginbtree.c line 373, so I think that some obscure code path of
> btree->placeToPage must be either consuming or resetting that
> XLogBeginInsert before it returns control to ginbtree.c

Here's a theory:

First of all, you have checksums or wal_log_hints enabled.

The page is being split (that's evident from "info=48" above). 
ginPlaceToPage calls GinNewBuffer, which calls GetFreeIndexPage(). That 
finds a page that can be recycled, and marks it as used. 
RecordUsedIndexPage calls MarkBufferDirtyHint(), which in turn calls 
XLogSaveBufferForHint() to create a full-page image record of the page. 
That calls XLogBeginInsert() + XLogInsert(), and leaves the 
begininsert_called == false.

If you had assertions enabled, you'd see the assertion in 
XLogBeginInsert() to fail.

I'll look into that over the weekend..

- Heikki




Re: PANIC in GIN code

From
Jeff Janes
Date:
On Fri, Jun 26, 2015 at 11:40 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 06/26/2015 08:02 PM, Jeff Janes wrote:
Under high load against f7bb7f0625771bc71869cda, I occasionally get:

PANIC:  XLogBeginInsert was not called

It seems to only come from vacuuming.

Here is an example back-trace:

#0  0x0000003dcb632625 in raise () from /lib64/libc.so.6
#1  0x0000003dcb633e05 in abort () from /lib64/libc.so.6
#2  0x000000000079a39c in errfinish (dummy=<value optimized out>) at
elog.c:551
#3  0x000000000079b0e4 in elog_finish (elevel=<value optimized out>,
fmt=<value optimized out>) at elog.c:1368
#4  0x00000000004dd1fe in XLogInsert (rmid=13 '\r', info=48 '0') at
xloginsert.c:412
#5  0x0000000000478e13 in ginPlaceToPage (btree=0x7fff1fbb7f60,
stack=0x2878760, insertdata=<value optimized out>, updateblkno=<value
optimized out>,
     childbuf=0, buildStats=<value optimized out>) at ginbtree.c:582
#6  0x0000000000479a33 in ginInsertValue (btree=0x7fff1fbb7f60,
stack=0x2878760, insertdata=0x7fff1fbb7fe0, buildStats=0x0) at
ginbtree.c:751
#7  0x000000000047364a in ginEntryInsert (ginstate=0x7fff1fbb8280,
attnum=54624, key=1, category=<value optimized out>, items=0x287d3c0,
nitem=1,
     buildStats=0x0) at gininsert.c:234
#8  0x000000000047ef4b in ginInsertCleanup (ginstate=0x7fff1fbb8280,
vac_delay=<value optimized out>, stats=0x2868d90) at ginfast.c:843
#9  0x000000000047e024 in ginbulkdelete (fcinfo=<value optimized out>) at
ginvacuum.c:547


 From the code, it seems obvious that XLogBeginInsert is getting called at
ginbtree.c line 373, so I think that some obscure code path of
btree->placeToPage must be either consuming or resetting that
XLogBeginInsert before it returns control to ginbtree.c

Here's a theory:

First of all, you have checksums or wal_log_hints enabled.

Correct, checksums are on.  I forgot all about that.
 

The page is being split (that's evident from "info=48" above). ginPlaceToPage calls GinNewBuffer, which calls GetFreeIndexPage(). That finds a page that can be recycled, and marks it as used. RecordUsedIndexPage calls MarkBufferDirtyHint(), which in turn calls XLogSaveBufferForHint() to create a full-page image record of the page. That calls XLogBeginInsert() + XLogInsert(), and leaves the begininsert_called == false.

If you had assertions enabled, you'd see the assertion in XLogBeginInsert() to fail.

I'll look into that over the weekend..


Should the assertion in XLogBeginInsert() be a elog(FATAL,...) instead?  The performance impact should be negligible (touches process-local memory which is already being touched a few instructions later) and it will produce better bug reports, and backtraces, should there be more issues.

I thought it would takes days to reproduce this with a enable-cassert build, but it actually reproduced much faster that way.  Your theory looks completely correct.


#0  0x0000003dcb632625 in raise () from /lib64/libc.so.6
#1  0x0000003dcb633e05 in abort () from /lib64/libc.so.6
#2  0x00000000007d18bd in ExceptionalCondition (conditionName=<value optimized out>, errorType=<value optimized out>, fileName=<value optimized out>,
    lineNumber=<value optimized out>) at assert.c:54
#3  0x00000000004f8067 in XLogBeginInsert () at xloginsert.c:125
#4  0x00000000004f83d5 in XLogSaveBufferForHint (buffer=241, buffer_std=<value optimized out>) at xloginsert.c:907
#5  0x00000000006b64af in MarkBufferDirtyHint (buffer=241, buffer_std=<value optimized out>) at bufmgr.c:3085
#6  0x00000000006be859 in fsm_set_and_search (rel=<value optimized out>, addr=..., slot=1160, newValue=0 '\000', minValue=0 '\000') at freespace.c:621
#7  0x00000000006be8f6 in RecordPageWithFreeSpace (rel=0x7f01e7fe0560, heapBlk=1160, spaceAvail=<value optimized out>) at freespace.c:188
#8  0x00000000006bf105 in GetFreeIndexPage (rel=0x7f01e7fe0560) at indexfsm.c:43
#9  0x00000000004750b2 in GinNewBuffer (index=0x7f01e7fe0560) at ginutil.c:218
#10 0x000000000047d5c4 in ginPlaceToPage (btree=0x7fffe3a55840, stack=0x1307b78, insertdata=<value optimized out>, updateblkno=<value optimized out>,
    childbuf=0, buildStats=0x0) at ginbtree.c:442
#11 0x000000000047f2dd in ginInsertValue (btree=0x7fffe3a55840, stack=0x1307b78, insertdata=0x7fffe3a558c0, buildStats=0x0) at ginbtree.c:751
#12 0x0000000000475b7b in ginEntryInsert (ginstate=0x7fffe3a55b80, attnum=58216, key=1, category=<value optimized out>, items=0x7f01e7f61120, nitem=11,
    buildStats=0x0) at gininsert.c:234
#13 0x0000000000485dfc in ginInsertCleanup (ginstate=0x7fffe3a55b80, vac_delay=<value optimized out>, stats=0x12bbbc8) at ginfast.c:843
#14 0x0000000000484d83 in ginbulkdelete (fcinfo=<value optimized out>) at ginvacuum.c:547

Cheers,

Jeff

Re: PANIC in GIN code

From
Heikki Linnakangas
Date:
On 06/26/2015 10:53 PM, Jeff Janes wrote:
> On Fri, Jun 26, 2015 at 11:40 AM, Heikki Linnakangas <hlinnaka@iki.fi>
> wrote:
>
>> The page is being split (that's evident from "info=48" above).
>> ginPlaceToPage calls GinNewBuffer, which calls GetFreeIndexPage(). That
>> finds a page that can be recycled, and marks it as used.
>> RecordUsedIndexPage calls MarkBufferDirtyHint(), which in turn calls
>> XLogSaveBufferForHint() to create a full-page image record of the page.
>> That calls XLogBeginInsert() + XLogInsert(), and leaves the
>> begininsert_called == false.
>>
>> If you had assertions enabled, you'd see the assertion in
>> XLogBeginInsert() to fail.
>>
>> I'll look into that over the weekend..

Committed a fix, fortunately it was easy to rearrange that code so that 
XLogBeginInsert() is called after the GinNewBuffer() calls. Can you 
verify that it fixed your test case, please?

> Should the assertion in XLogBeginInsert() be a elog(FATAL,...) instead?
> The performance impact should be negligible (touches process-local memory
> which is already being touched a few instructions later) and it will
> produce better bug reports, and backtraces, should there be more issues.

Hmm, my original thinking was "no", so that if there is a spurious 
XLogBeginInsert() call somewhere without a corresponding XLogInsert() 
call, that wouldn't be unnecessarily promoted into an error in a 
non-assertions-enabled build. But thinking a bit more, if any data has 
already been registered as well, the WAL record will be badly corrupt, 
which is much worse. Also, if we look at this GIN bug, if 
XLogBeginInsert() had caught the error instead of XLogInsert(), it 
would've been caught outside the critical section, which is again better.

So yeah, I turned that into an elog(ERROR). (I don't see why it would 
need to be FATAL.)

- Heikki




Re: PANIC in GIN code

From
Jeff Janes
Date:
On Sun, Jun 28, 2015 at 12:31 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 06/26/2015 10:53 PM, Jeff Janes wrote:
On Fri, Jun 26, 2015 at 11:40 AM, Heikki Linnakangas <hlinnaka@iki.fi>
wrote:

The page is being split (that's evident from "info=48" above).
ginPlaceToPage calls GinNewBuffer, which calls GetFreeIndexPage(). That
finds a page that can be recycled, and marks it as used.
RecordUsedIndexPage calls MarkBufferDirtyHint(), which in turn calls
XLogSaveBufferForHint() to create a full-page image record of the page.
That calls XLogBeginInsert() + XLogInsert(), and leaves the
begininsert_called == false.

If you had assertions enabled, you'd see the assertion in
XLogBeginInsert() to fail.

I'll look into that over the weekend..

Committed a fix, fortunately it was easy to rearrange that code so that XLogBeginInsert() is called after the GinNewBuffer() calls. Can you verify that it fixed your test case, please?

Now I'm getting a different error, with or without checksums.

ERROR:  invalid page in block 0 of relation base/16384/16420
CONTEXT:  automatic vacuum of table "jjanes.public.foo"

16420 is the gin index.  I can't even get the page with pageinspect:

jjanes=# SELECT * FROM get_raw_page('foo_text_array_idx', 0);
ERROR:  invalid page in block 0 of relation base/16384/16420

This is the last few gin entries from pg_xlogdump


rmgr: Gin         len (rec/tot):      0/  3893, tx:          0, lsn: 0/77270E90, prev 0/77270E68, desc: VACUUM_PAGE , blkref #0: rel 1663/16384/16420 blk 27 FPW
rmgr: Gin         len (rec/tot):      0/  3013, tx:          0, lsn: 0/77272080, prev 0/77272058, desc: VACUUM_PAGE , blkref #0: rel 1663/16384/16420 blk 6904 FPW
rmgr: Gin         len (rec/tot):      0/  3093, tx:          0, lsn: 0/77272E08, prev 0/77272DE0, desc: VACUUM_PAGE , blkref #0: rel 1663/16384/16420 blk 1257 FPW
rmgr: Gin         len (rec/tot):      8/  4662, tx:  318119897, lsn: 0/77A2CF10, prev 0/77A2CEC8, desc: INSERT_LISTPAGE , blkref #0: rel 1663/16384/16420 blk 22184
rmgr: Gin         len (rec/tot):     88/   134, tx:  318119897, lsn: 0/77A2E188, prev 0/77A2E160, desc: UPDATE_META_PAGE , blkref #0: rel 1663/16384/16420 blk 0


And the metapage is mostly zeros:
 
head -c 8192 /tmp/data2_invalid_page/base/16384/16420 | od
0000000 000000 000000 161020 073642 000000 000000 000000 000000
0000020 000000 000000 000000 000000 053250 000000 053250 000000
0000040 006140 000000 000001 000000 000001 000000 000000 000000
0000060 031215 000000 000452 000000 000000 000000 000000 000000
0000100 025370 000000 000000 000000 000002 000000 000000 000000
0000120 000000 000000 000000 000000 000000 000000 000000 000000
*
0020000


Thanks,

Jeff

Re: PANIC in GIN code

From
Heikki Linnakangas
Date:
On 06/29/2015 01:12 AM, Jeff Janes wrote:
> Now I'm getting a different error, with or without checksums.
>
> ERROR:  invalid page in block 0 of relation base/16384/16420
> CONTEXT:  automatic vacuum of table "jjanes.public.foo"
>
> 16420 is the gin index.  I can't even get the page with pageinspect:
>
> jjanes=# SELECT * FROM get_raw_page('foo_text_array_idx', 0);
> ERROR:  invalid page in block 0 of relation base/16384/16420
>
> This is the last few gin entries from pg_xlogdump
>
>
> rmgr: Gin         len (rec/tot):      0/  3893, tx:          0, lsn:
> 0/77270E90, prev 0/77270E68, desc: VACUUM_PAGE , blkref #0: rel
> 1663/16384/16420 blk 27 FPW
> rmgr: Gin         len (rec/tot):      0/  3013, tx:          0, lsn:
> 0/77272080, prev 0/77272058, desc: VACUUM_PAGE , blkref #0: rel
> 1663/16384/16420 blk 6904 FPW
> rmgr: Gin         len (rec/tot):      0/  3093, tx:          0, lsn:
> 0/77272E08, prev 0/77272DE0, desc: VACUUM_PAGE , blkref #0: rel
> 1663/16384/16420 blk 1257 FPW
> rmgr: Gin         len (rec/tot):      8/  4662, tx:  318119897, lsn:
> 0/77A2CF10, prev 0/77A2CEC8, desc: INSERT_LISTPAGE , blkref #0: rel
> 1663/16384/16420 blk 22184
> rmgr: Gin         len (rec/tot):     88/   134, tx:  318119897, lsn:
> 0/77A2E188, prev 0/77A2E160, desc: UPDATE_META_PAGE , blkref #0: rel
> 1663/16384/16420 blk 0
>
>
> And the metapage is mostly zeros:
>
> head -c 8192 /tmp/data2_invalid_page/base/16384/16420 | od
> 0000000 000000 000000 161020 073642 000000 000000 000000 000000
> 0000020 000000 000000 000000 000000 053250 000000 053250 000000
> 0000040 006140 000000 000001 000000 000001 000000 000000 000000
> 0000060 031215 000000 000452 000000 000000 000000 000000 000000
> 0000100 025370 000000 000000 000000 000002 000000 000000 000000
> 0000120 000000 000000 000000 000000 000000 000000 000000 000000
> *
> 0020000

Hmm. Looking at ginRedoUpdateMetapage, I think I see the problem: it 
doesn't initialize the page. It copies the metapage data, but it doesn't 
touch the page headers. The only way I can see that that would cause 
trouble is if the index somehow got truncated away or removed in the 
standby. That could happen in crash recovery, if you drop the index and 
the crash, but that should be harmless, because crash recovery doesn't 
try to read the metapage, only update it (by overwriting it), and by the 
time crash recovery has completed, the index drop is replayed too.

But AFAICS that bug is present in earlier versions too. Can you 
reproduce this easily? How?

- Heikki




Re: PANIC in GIN code

From
Jeff Janes
Date:
On Mon, Jun 29, 2015 at 1:37 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 06/29/2015 01:12 AM, Jeff Janes wrote:
Now I'm getting a different error, with or without checksums.

ERROR:  invalid page in block 0 of relation base/16384/16420
CONTEXT:  automatic vacuum of table "jjanes.public.foo"

16420 is the gin index.  I can't even get the page with pageinspect:

jjanes=# SELECT * FROM get_raw_page('foo_text_array_idx', 0);
ERROR:  invalid page in block 0 of relation base/16384/16420

This is the last few gin entries from pg_xlogdump


rmgr: Gin         len (rec/tot):      0/  3893, tx:          0, lsn:
0/77270E90, prev 0/77270E68, desc: VACUUM_PAGE , blkref #0: rel
1663/16384/16420 blk 27 FPW
rmgr: Gin         len (rec/tot):      0/  3013, tx:          0, lsn:
0/77272080, prev 0/77272058, desc: VACUUM_PAGE , blkref #0: rel
1663/16384/16420 blk 6904 FPW
rmgr: Gin         len (rec/tot):      0/  3093, tx:          0, lsn:
0/77272E08, prev 0/77272DE0, desc: VACUUM_PAGE , blkref #0: rel
1663/16384/16420 blk 1257 FPW
rmgr: Gin         len (rec/tot):      8/  4662, tx:  318119897, lsn:
0/77A2CF10, prev 0/77A2CEC8, desc: INSERT_LISTPAGE , blkref #0: rel
1663/16384/16420 blk 22184
rmgr: Gin         len (rec/tot):     88/   134, tx:  318119897, lsn:
0/77A2E188, prev 0/77A2E160, desc: UPDATE_META_PAGE , blkref #0: rel
1663/16384/16420 blk 0

Another piece of info here that might be relevant.  Almost all UPDATE_META_PAGE xlog records other than the last one have two backup blocks.  The last UPDATE_META_PAGE record only has one backup block.

And the metapage is mostly zeros:

head -c 8192 /tmp/data2_invalid_page/base/16384/16420 | od
0000000 000000 000000 161020 073642 000000 000000 000000 000000
0000020 000000 000000 000000 000000 053250 000000 053250 000000
0000040 006140 000000 000001 000000 000001 000000 000000 000000
0000060 031215 000000 000452 000000 000000 000000 000000 000000
0000100 025370 000000 000000 000000 000002 000000 000000 000000
0000120 000000 000000 000000 000000 000000 000000 000000 000000
*
0020000

Hmm. Looking at ginRedoUpdateMetapage, I think I see the problem: it doesn't initialize the page. It copies the metapage data, but it doesn't touch the page headers. The only way I can see that that would cause trouble is if the index somehow got truncated away or removed in the standby. That could happen in crash recovery, if you drop the index and the crash, but that should be harmless, because crash recovery doesn't try to read the metapage, only update it (by overwriting it), and by the time crash recovery has completed, the index drop is replayed too.

But AFAICS that bug is present in earlier versions too.

Yes, I did see this error reported previously but it was always after the first appearance of the PANIC, so I assumed it was a sequella to that and didn't investigate it further at that time.

 
Can you reproduce this easily? How?

I can reproduce it fairly easy. 

I apply the attached patch and compile with enable-casssert (full list '--enable-debug' '--with-libxml' '--with-perl' '--with-python' '--with-ldap' '--with-openssl' '--with-gssapi' '--prefix=/home/jjanes/pgsql/torn_bisect/' '--enable-cassert')

Then edit do.sh to point to the data directory and installation directory you want, and run that.  It calls count.pl from the same directory.  I started getting the errors after about 10 minutes on a 8 core Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz.

sh do.sh >& do_cassert_fix.out2 &

The output is quite a mess, mingling logfile from PostgreSQL and from Perl together.  Since I already know what I'm looking for, I use:

tail -f do_cassert_fix.out2 |fgrep ERROR

Cheers,

Jeff



Attachment

Re: PANIC in GIN code

From
Heikki Linnakangas
Date:
On 06/29/2015 07:20 PM, Jeff Janes wrote:
> On Mon, Jun 29, 2015 at 1:37 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> Another piece of info here that might be relevant.  Almost all
> UPDATE_META_PAGE xlog records other than the last one have two backup
> blocks.  The last UPDATE_META_PAGE record only has one backup block.
>
> And the metapage is mostly zeros:
>>>
>>> head -c 8192 /tmp/data2_invalid_page/base/16384/16420 | od
>>> 0000000 000000 000000 161020 073642 000000 000000 000000 000000
>>> 0000020 000000 000000 000000 000000 053250 000000 053250 000000
>>> 0000040 006140 000000 000001 000000 000001 000000 000000 000000
>>> 0000060 031215 000000 000452 000000 000000 000000 000000 000000
>>> 0000100 025370 000000 000000 000000 000002 000000 000000 000000
>>> 0000120 000000 000000 000000 000000 000000 000000 000000 000000
>>> *
>>> 0020000
>>>
>>
>> Hmm. Looking at ginRedoUpdateMetapage, I think I see the problem: it
>> doesn't initialize the page. It copies the metapage data, but it doesn't
>> touch the page headers. The only way I can see that that would cause
>> trouble is if the index somehow got truncated away or removed in the
>> standby. That could happen in crash recovery, if you drop the index and the
>> crash, but that should be harmless, because crash recovery doesn't try to
>> read the metapage, only update it (by overwriting it), and by the time
>> crash recovery has completed, the index drop is replayed too.
>>
>> But AFAICS that bug is present in earlier versions too.

Nope, looking closer, in previous versions the page was always read from 
disk, even though we're overwriting it. That was made smarter in 9.5, by 
using the ZERO_OR_LOCK mode, but that means that the page headers indeed 
need to be initialized.

> Yes, I did see this error reported previously but it was always after the
> first appearance of the PANIC, so I assumed it was a sequella to that and
> didn't investigate it further at that time.
>
>> Can you reproduce this easily? How?
>
> I can reproduce it fairly easy.
>
> [instructions to reproduce]

I was actually not able to reproduce it that way, but came up with a 
much simpler method. The problem occurs when the metapage update record 
WAL record is replayed, and the metapage is not in the page cache yet. 
Usually it is, as if you do pretty much anything at all with the index, 
the metapage stays in cache. But VACUUM produces a metapage update 
record to update the stats, and it's pretty easy to arrange things so 
that that's the first record after checkpoint, and start recover from 
that checkpoint:

postgres=# create table foo (t text[]);
CREATE TABLE
postgres=# insert into foo values ('{foo}');
INSERT 0 1
postgres=# create index i_foo on foo using gin (t);
CREATE INDEX
postgres=# vacuum foo;
VACUUM
postgres=# vacuum foo;
VACUUM
postgres=# checkpoint;
CHECKPOINT
postgres=# vacuum foo;
VACUUM

Now kill -9 postmaster, and restart. Voila, the page headers are all zeros:

postgres=# select * from page_header(get_raw_page('i_foo', 0));    lsn    | checksum | flags | lower | upper | special
|pagesize | 
 
version |
prune_xid
-----------+----------+-------+-------+-------+---------+----------+---------+-
---------- 0/1891270 |        0 |     0 |     0 |     0 |       0 |        0 |     0 |        0
(1 row)

postgres=# select * from gin_metapage_info(get_raw_page('i_foo', 
0));ERROR:  input page is not a GIN metapage
DETAIL:  Flags 0189, expected 0008

I just pushed a fix for this, but unfortunately it didn't make it 9.5alpha1.

- Heikki



Re: PANIC in GIN code

From
Jeff Janes
Date:
On Mon, Jun 29, 2015 at 2:08 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
 

Now kill -9 postmaster, and restart. Voila, the page headers are all zeros:

postgres=# select * from page_header(get_raw_page('i_foo', 0));
    lsn    | checksum | flags | lower | upper | special | pagesize | version |
prune_xid
-----------+----------+-------+-------+-------+---------+----------+---------+-
----------
 0/1891270 |        0 |     0 |     0 |     0 |       0 |        0 |     0 |
        0
(1 row)

postgres=# select * from gin_metapage_info(get_raw_page('i_foo', 0));ERROR:  input page is not a GIN metapage
DETAIL:  Flags 0189, expected 0008

I just pushed a fix for this, but unfortunately it didn't make it 9.5alpha1.

Thanks.  I think that that fixed it.  It survived for over an hour this time.  Then it hit a different problem (apparent deadlock in the LWLock code, but that is probably a topic for another thread, I'm working on reproducing it.)

Cheers,

Jeff

Re: PANIC in GIN code

From
Heikki Linnakangas
Date:
On 06/30/2015 02:18 AM, Jeff Janes wrote:
> On Mon, Jun 29, 2015 at 2:08 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
>> I just pushed a fix for this, but unfortunately it didn't make it
>> 9.5alpha1.
>
> Thanks.  I think that that fixed it.  It survived for over an hour this
> time.

Thanks. I grepped through all the other call sites of 
XLogInitBufferForRedo() to look for similar bugs, and sure enough, there 
was one more bug of the same sort, in B-tree page deletion. That one was 
in 9.4 as well. Fixed that too, and added an assertion into 
PageGetSpecialPointer() that would help to catch this sort of bugs 
earlier in the future.

- Heikki