Thread: GIN stuck in loop during PITR
I'm just experimenting a bit with GIN, and it is occasionally getting stuck looping in findParents() during WAL replay. The attached patch seems to fix it. I also had to set ptr->off as advertised in the comment above the function to avoid triggering assertions. GIN isn't fully transparent to me yet, so it is quite likely that I am missing something... regards, andreas Index: ginbtree.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/access/gin/ginbtree.c,v retrieving revision 1.1 diff -c -r1.1 ginbtree.c *** ginbtree.c 2 May 2006 11:28:54 -0000 1.1 --- ginbtree.c 25 May 2006 18:12:13 -0000 *************** *** 202,208 **** for(;;) { buffer = ReadBuffer(btree->index, blkno); LockBuffer(buffer, GIN_EXCLUSIVE); ! page = BufferGetPage(root->buffer); if ( GinPageIsLeaf(page) ) elog(ERROR, "Lost path"); --- 202,208 ---- for(;;) { buffer = ReadBuffer(btree->index, blkno); LockBuffer(buffer, GIN_EXCLUSIVE); ! page = BufferGetPage(buffer); if ( GinPageIsLeaf(page) ) elog(ERROR, "Lost path"); *************** *** 224,229 **** --- 224,230 ---- ptr->blkno = blkno; ptr->buffer = buffer; ptr->parent = root; /* it'smay be wrong, but in next call we will correct */ + ptr->off = offset; stack->parent = ptr; return; }
Thanks a lot, applied. Can you describe test suite? It may be useful for test more... GIN is young code and it needs to independently tests. Andreas Seltenreich wrote: > I'm just experimenting a bit with GIN, and it is occasionally getting > stuck looping in findParents() during WAL replay. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Teodor Sigaev schrob: > Thanks a lot, applied. Can you describe test suite? It may be useful > for test more... Here's a shell script that triggers the bug when I revert the patch. regards, andreas #!/bin/sh set -x set -e PGPORT=5434 CLUSTER="gintest/" ARCHIVE="gintest-archive/" [ -d "$ARCHIVE" ] || mkdir "$ARCHIVE" initdb -D $CLUSTER cat >> $CLUSTER/postgresql.conf <<EOF port = $PGPORT archive_command = 'cp %p $PWD/$ARCHIVE/%f' EOF pg_ctl -D $CLUSTER start sleep 5 createdb psql <<EOF create table t(a text); create index i on t using gin (string_to_array(a, ' ')); checkpoint; select pg_start_backup('test'); EOF tar cf gintest.tar $CLUSTER psql <<EOF select pg_stop_backup(); insert into t select generate_series(1,1000000); EOF pg_ctl -D "$CLUSTER" stop sleep 5 cp "$CLUSTER/pg_xlog/0"* "$ARCHIVE" rm -r "$CLUSTER" tar xf gintest.tar cat >> "$CLUSTER/recovery.conf" <<EOF restore_command = 'cp "$PWD/$ARCHIVE"/%f %p' EOF pg_ctl -D "$CLUSTER" start # LOG: redo starts at 0/3D1740 # LOG: restored log file "000000010000000000000001" from archive # LOG: restored log file "000000010000000000000002" from archive # [...] # LOG: restored log file "00000001000000000000000D" from archive # LOG: record with zero length at 0/D085D50 # LOG: redo done at 0/D085D0C # LOG: restored log file "00000001000000000000000D" from archive # LOG: archive recovery complete # # at this point the startup process is looping in ginbtree.c:findParents() # # (gdb) where # #0 0x080d9db9 in entryFindChildPtr (btree=0xbfbbba90, page=0xb605a680 "", blkno=1672, storedOff=0) at ginentrypage.c:246 # #1 0x080dfc31 in findParents (btree=0xbfbbba90, stack=0xbfbbba7c, rootBlkno=0) at ginbtree.c:211 # #2 0x080d8167 in ginContinueSplit (split=0x8469bd8) at ginxlog.c:522 # #3 0x080d81c1 in gin_xlog_cleanup () at ginxlog.c:537 # #4 0x080c9731 in StartupXLOG () at xlog.c:4846 # #5 0x080e9f8e in BootstrapMain (argc=4, argv=0xbfbbbca4) at bootstrap.c:419 # #6 0x0820a484 in StartChildProcess (xlop=2) at postmaster.c:3671 # #7 0x08206add in PostmasterMain (argc=3, argv=0x840cff0) at postmaster.c:968 # #8 0x081b0e32 in main (argc=3, argv=0x840cff0) at main.c:254
Andreas Seltenreich schrob: > Teodor Sigaev schrob: > >> Thanks a lot, applied. Can you describe test suite? It may be useful >> for test more... > > Here's a shell script that triggers the bug when I revert the patch. Just tried the script on HEAD, and it was triggering an assertion. I guess it is because we are still returning InvalidOffsetNumber in the trivial case (looks like a typo to me). I've attached a patch. regards, andreas Index: ginbtree.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/access/gin/ginbtree.c,v retrieving revision 1.2 diff -c -r1.2 ginbtree.c *** ginbtree.c 26 May 2006 08:01:17 -0000 1.2 --- ginbtree.c 26 May 2006 20:09:45 -0000 *************** *** 189,195 **** Assert( !GinPageIsLeaf(page) ); /* check trivial case */ ! if ( (root->off != btree->findChildPtr(btree, page, stack->blkno, InvalidOffsetNumber)) != InvalidBuffer ) { stack->parent = root; return; } --- 189,195 ---- Assert( !GinPageIsLeaf(page) ); /* check trivial case */ ! if ( (root->off = btree->findChildPtr(btree, page, stack->blkno, InvalidOffsetNumber)) != InvalidOffsetNumber ) { stack->parent = root; return; }