Thread: GIN stuck in loop during PITR

GIN stuck in loop during PITR

From
Andreas Seltenreich
Date:
I'm just experimenting a bit with GIN, and it is occasionally getting
stuck looping in findParents() during WAL replay.

The attached patch seems to fix it. I also had to set ptr->off as
advertised in the comment above the function to avoid triggering
assertions.

GIN isn't fully transparent to me yet, so it is quite likely that I am
missing something...

regards,
andreas

Index: ginbtree.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/gin/ginbtree.c,v
retrieving revision 1.1
diff -c -r1.1 ginbtree.c
*** ginbtree.c    2 May 2006 11:28:54 -0000    1.1
--- ginbtree.c    25 May 2006 18:12:13 -0000
***************
*** 202,208 ****     for(;;) {         buffer = ReadBuffer(btree->index, blkno);         LockBuffer(buffer,
GIN_EXCLUSIVE);
!         page = BufferGetPage(root->buffer);         if ( GinPageIsLeaf(page) )             elog(ERROR, "Lost path");

--- 202,208 ----     for(;;) {         buffer = ReadBuffer(btree->index, blkno);         LockBuffer(buffer,
GIN_EXCLUSIVE);
!         page = BufferGetPage(buffer);         if ( GinPageIsLeaf(page) )             elog(ERROR, "Lost path"); 
***************
*** 224,229 ****
--- 224,230 ----             ptr->blkno = blkno;             ptr->buffer = buffer;             ptr->parent = root; /*
it'smay be wrong, but in next call we will correct */
 
+             ptr->off = offset;             stack->parent = ptr;             return;         }


Re: GIN stuck in loop during PITR

From
Teodor Sigaev
Date:
Thanks a lot, applied. Can you describe test suite? It may be useful for test 
more...

GIN is young code and it needs to independently tests.

Andreas Seltenreich wrote:
> I'm just experimenting a bit with GIN, and it is occasionally getting
> stuck looping in findParents() during WAL replay.
-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: GIN stuck in loop during PITR

From
Andreas Seltenreich
Date:
Teodor Sigaev schrob:

> Thanks a lot, applied. Can you describe test suite? It may be useful
> for test more...

Here's a shell script that triggers the bug when I revert the patch.

regards,
andreas

#!/bin/sh

set -x
set -e

PGPORT=5434
CLUSTER="gintest/"
ARCHIVE="gintest-archive/"

[ -d "$ARCHIVE" ] || mkdir "$ARCHIVE"

initdb -D $CLUSTER
cat >> $CLUSTER/postgresql.conf <<EOF
port = $PGPORT
archive_command = 'cp %p $PWD/$ARCHIVE/%f'
EOF

pg_ctl -D $CLUSTER start
sleep 5
createdb
psql <<EOF
create table t(a text);
create index i on t using gin (string_to_array(a, ' '));
checkpoint;
select pg_start_backup('test');
EOF

tar cf gintest.tar $CLUSTER

psql <<EOF
select pg_stop_backup();
insert into t select generate_series(1,1000000);
EOF

pg_ctl -D "$CLUSTER" stop
sleep 5
cp "$CLUSTER/pg_xlog/0"* "$ARCHIVE"
rm -r "$CLUSTER"
tar xf gintest.tar
cat >> "$CLUSTER/recovery.conf" <<EOF
restore_command = 'cp "$PWD/$ARCHIVE"/%f %p'
EOF

pg_ctl -D "$CLUSTER" start

# LOG:  redo starts at 0/3D1740
# LOG:  restored log file "000000010000000000000001" from archive
# LOG:  restored log file "000000010000000000000002" from archive
# [...]
# LOG:  restored log file "00000001000000000000000D" from archive
# LOG:  record with zero length at 0/D085D50
# LOG:  redo done at 0/D085D0C
# LOG:  restored log file "00000001000000000000000D" from archive
# LOG:  archive recovery complete
#
# at this point the startup process is looping in ginbtree.c:findParents()
# 
# (gdb) where
# #0  0x080d9db9 in entryFindChildPtr (btree=0xbfbbba90, page=0xb605a680 "", blkno=1672, storedOff=0) at
ginentrypage.c:246
# #1  0x080dfc31 in findParents (btree=0xbfbbba90, stack=0xbfbbba7c, rootBlkno=0) at ginbtree.c:211
# #2  0x080d8167 in ginContinueSplit (split=0x8469bd8) at ginxlog.c:522
# #3  0x080d81c1 in gin_xlog_cleanup () at ginxlog.c:537
# #4  0x080c9731 in StartupXLOG () at xlog.c:4846
# #5  0x080e9f8e in BootstrapMain (argc=4, argv=0xbfbbbca4) at bootstrap.c:419
# #6  0x0820a484 in StartChildProcess (xlop=2) at postmaster.c:3671
# #7  0x08206add in PostmasterMain (argc=3, argv=0x840cff0) at postmaster.c:968
# #8  0x081b0e32 in main (argc=3, argv=0x840cff0) at main.c:254


Re: GIN stuck in loop during PITR

From
Andreas Seltenreich
Date:
Andreas Seltenreich schrob:

> Teodor Sigaev schrob:
>
>> Thanks a lot, applied. Can you describe test suite? It may be useful
>> for test more...
>
> Here's a shell script that triggers the bug when I revert the patch.

Just tried the script on HEAD, and it was triggering an assertion. I
guess it is because we are still returning InvalidOffsetNumber in the
trivial case (looks like a typo to me). I've attached a patch.

regards,
andreas

Index: ginbtree.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/gin/ginbtree.c,v
retrieving revision 1.2
diff -c -r1.2 ginbtree.c
*** ginbtree.c    26 May 2006 08:01:17 -0000    1.2
--- ginbtree.c    26 May 2006 20:09:45 -0000
***************
*** 189,195 ****     Assert( !GinPageIsLeaf(page) );      /* check trivial case */
!     if ( (root->off != btree->findChildPtr(btree, page, stack->blkno, InvalidOffsetNumber)) != InvalidBuffer ) {
  stack->parent = root;         return;     }
 
--- 189,195 ----     Assert( !GinPageIsLeaf(page) );      /* check trivial case */
!     if ( (root->off = btree->findChildPtr(btree, page, stack->blkno, InvalidOffsetNumber)) != InvalidOffsetNumber ) {
       stack->parent = root;         return;     }