Thread: GiST: Bad newtup On Exit From gistSplit() ?
Just before the return from gistSplit() I see this: (gdb) p (*newtup)[0] $147 = {t_tid = {ip_blkid = {bi_hi = 0, bi_lo = 34}, ip_posid = 1}, t_info = 136} (gdb) p (*newtup)[1]$148 = {t_tid = {ip_blkid = {bi_hi = 65510, bi_lo = 65535}, ip_posid = 65535}, t_info = 24575} (gdb) p nlen $149= 2 (*newtup)[1] doesn't look right, correct? Everything *seems* OK. I need to go back and trace gistFormTuple() and ItemPointerSet(), maybe? -- Itai Zukerman <http://www.math-hat.com/~zukerman/>
> (gdb) p (*newtup)[0] > $147 = {t_tid = {ip_blkid = {bi_hi = 0, bi_lo = 34}, ip_posid = 1}, t_info = 136} > (gdb) p (*newtup)[1] > $148 = {t_tid = {ip_blkid = {bi_hi = 65510, bi_lo = 65535}, ip_posid = 65535}, t_info = 24575} > (gdb) p nlen > $149 = 2 > > (*newtup)[1] doesn't look right, correct? Sorry, my fault, that should've been *(newtup[1]), which indeed looks fine. I'm still not having any luck tracking down my SEGV. If anyone is interested in debugging or even just reproducing this, I'd be happy to post or E-mail you my code. It looks like shortly after calling gistSplit() there's maybe a double-free() somewhere? Here's a typical session; the NOTICEs are generated by my picksplit function: $ gdb ./postgres (gdb) run -D /var/lib/postgres/data test <x [... output as many tuples are inserted, and finally ...] NOTICE: g_sig_picksplit: <-- NOTICE: g_sig_picksplit: left ( 9) *.*.***..*..**....*....*.***.**.*..*.*.****....*..**..**.....*****...***..*.*.*.***..*.**..**.*..*.**.*..**.***.*...*.**.**.......*......***....**......*.***.*......*.*..**....**.*..*..**.*..*.*.***.*.*.**.*....*.**..*...*..*.*****.*.*...*..*.*...*..**..*..***....***.*.*.*..*...**.......*....**.*.*.*..*.*..***.*.****.....*.....**..**...*.*....*..**..**.***...*......*..*....*.*.*.....*.*..*....**......****.*.***.*........***..*....*.*....**.*..***.*.*...*..**.*...*...*...*.......*.*.....**......*............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ NOTICE: g_sig_picksplit: right ( 50) .**..********************************************************.*.********.****************************************************************.********.***************************************.**********************************.********************.*.*************.********.****************************.***.**.*******************.*.************************************************.******************************.*****.*******************************************************************************.********............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ Program received signal SIGSEGV, Segmentation fault. 0x4028e2ff in mallopt () from /lib/libc.so.6 (gdb) bt #0 0x4028e2ff in mallopt () from /lib/libc.so.6 #1 0x4028d2ca in free () from /lib/libc.so.6 #2 0x081d05e5 in AllocSetDelete (context=0x82ae098) at aset.c:460 #3 0x081d0f03 in MemoryContextDelete (context=0x82ae098) at mcxt.c:188 #4 0x081d0f36 in MemoryContextDeleteChildren (context=0x82ae010) at mcxt.c:207 #5 0x081d0e8f in MemoryContextDelete (context=0x82ae010) at mcxt.c:161 #6 0x08094070 in AtCommit_Memory () at xact.c:685 #7 0x080943e6 in CommitTransaction () at xact.c:1033 #8 0x080946b6 in CommitTransactionCommand (forceCommit=0 '\0') at xact.c:1304 #9 0x08160966 in finish_xact_command (forceCommit=0 '\0') at postgres.c:978 #10 0x081607ef in pg_exec_query_string (query_string=0x82e4570, dest=Debug, parse_context=0x82adf88) at postgres.c:897 #11 0x081619f0 in PostgresMain (argc=4, argv=0x82960c0, username=0x82966c0 "postgres") at postgres.c:2013 #12 0x08110625 in main (argc=4, argv=0xbffffc14) at main.c:235 -- Itai Zukerman <http://www.math-hat.com/~zukerman/>
Itai Zukerman <zukerman@math-hat.com> writes: > I'm still not having any luck tracking down my SEGV. If anyone is > interested in debugging or even just reproducing this, I'd be happy to > post or E-mail you my code. It looks like shortly after calling > gistSplit() there's maybe a double-free() somewhere? It might help to build with --enable-cassert, if you didn't already. regards, tom lane
>> I'm still not having any luck tracking down my SEGV. If anyone is >> interested in debugging or even just reproducing this, I'd be happy to >> post or E-mail you my code. It looks like shortly after calling >> gistSplit() there's maybe a double-free() somewhere? > > It might help to build with --enable-cassert, if you didn't already. After recompiling with --enable-cassert, nserting into an empty table, I get: TRAP: FailedAssertion("!((VfdCache[0].fd == (-1)))", File: "fd.c", Line: 1113) Program received signal SIGABRT, Aborted. 0x400eda51 in kill () from /lib/libc.so.6 (gdb) bt #0 0x400eda51 in kill () from/lib/libc.so.6 #1 0x400ed872 in raise () from /lib/libc.so.6 #2 0x400ee986 in abort () from /lib/libc.so.6 #3 0x081dc4d7in ExceptionalCondition (conditionName=0x8262d92 "!((VfdCache[0].fd == (-1)))", errorType=0x8262a13 "FailedAssertion",fileName=0x8262a0e "fd.c", lineNumber=1113) at assert.c:46 #4 0x0816a085 in AtEOXact_Files () atfd.c:1113 #5 0x080a1921 in CommitTransaction () at xact.c:1031 #6 0x080a1c85 in CommitTransactionCommand (forceCommit=0'\0') at xact.c:1304 #7 0x08177aaa in finish_xact_command (forceCommit=0 '\0') at postgres.c:978 #8 0x08177933in pg_exec_query_string (query_string=0x8314054, dest=Debug, parse_context=0x82ddb18) at postgres.c:897 #9 0x08178ba7in PostgresMain (argc=4, argv=0x82c5e00, username=0x82c6400 "postgres") at postgres.c:2013 #10 0x081233b9 in main(argc=4, argv=0xbffffc14) at main.c:235 But if I restart postgres and start with some data in the table, I get, as before: Program received signal SIGSEGV, Segmentation fault. 0x401312ff in mallopt () from /lib/libc.so.6 (gdb) bt #0 0x401312ffin mallopt () from /lib/libc.so.6 #1 0x401302ca in free () from /lib/libc.so.6 #2 0x081ea205 in AllocSetDelete(context=0x82efffc) at aset.c:460 #3 0x081eb107 in MemoryContextDelete (context=0x82efffc) at mcxt.c:188 #4 0x081eb172 in MemoryContextDeleteChildren (context=0x82eff70) at mcxt.c:207 #5 0x081eb093 in MemoryContextDelete (context=0x82eff70)at mcxt.c:161 #6 0x080a1587 in AtCommit_Memory () at xact.c:685 #7 0x080a192b in CommitTransaction ()at xact.c:1033 #8 0x080a1c85 in CommitTransactionCommand (forceCommit=0 '\0') at xact.c:1304 #9 0x08177aaa in finish_xact_command(forceCommit=0 '\0') at postgres.c:978 #10 0x08177933 in pg_exec_query_string (query_string=0x8313fcc,dest=Debug, parse_context=0x82efee4) at postgres.c:897 #11 0x08178ba7 in PostgresMain (argc=4, argv=0x82c5e00,username=0x82c6400 "postgres") at postgres.c:2013 #12 0x081233b9 in main (argc=4, argv=0xbffffc14) at main.c:235 Since this only happens after a call to gistSplit(), I still suspect GiST. But I've been going over the GiST code the past 3 days and I still can't find anything substantially wrong. -- Itai Zukerman <http://www.math-hat.com/~zukerman/>
Itai Zukerman <zukerman@math-hat.com> writes: > After recompiling with --enable-cassert, nserting into an empty table, > I get: > TRAP: FailedAssertion("!((VfdCache[0].fd == (-1)))", File: "fd.c", Line: 1113) Begins to look like a plain old wild store: ain't *nothing* should ever write into VfdCache[0].fd. If that's repeatable, you could try homing in on the place that is clobbering that variable by stepping through major routines with gdb. regards, tom lane
Tom Lane wrote: > Itai Zukerman <zukerman@math-hat.com> writes: > > After recompiling with --enable-cassert, nserting into an empty table, > > I get: > > > TRAP: FailedAssertion("!((VfdCache[0].fd == (-1)))", File: "fd.c", Line: 1113) > > Begins to look like a plain old wild store: ain't *nothing* should ever > write into VfdCache[0].fd. If that's repeatable, you could try homing > in on the place that is clobbering that variable by stepping through > major routines with gdb. If this is a system that supports hardware watchpoints (like Linux on x86), then you should be able to do a "watch VfdCache[0].fd". In fact, if you know the specific value it's going to be set to, you can do a conditional watchpoint: "watch VfdCache[0].fd if VfdCache[0] == <value>" (you can set the condition to be pretty much anything). The program will stop on the statement immediately following the one that scribbled on the area you're watching when the condition you specify is met. Note, though, that the condition is tested *after* the modification happens. -- Kevin Brown kevin@sysexperts.com
>> > TRAP: FailedAssertion("!((VfdCache[0].fd == (-1)))", File: "fd.c", Line: 1113) > If this is a system that supports hardware watchpoints (like Linux on > x86), then you should be able to do a "watch VfdCache[0].fd". That's exactly what I did, and tracked the problem down to a typo in my code *blush*. Thanks for all the great suggestions! -- Itai Zukerman <http://www.math-hat.com/~zukerman/>
Send me 'x' file and I'll try to help you. Itai Zukerman wrote: >> (gdb) p (*newtup)[0] >> $147 = {t_tid = {ip_blkid = {bi_hi = 0, bi_lo = 34}, ip_posid = 1}, t_info = 136} >> (gdb) p (*newtup)[1] >> $148 = {t_tid = {ip_blkid = {bi_hi = 65510, bi_lo = 65535}, ip_posid = 65535}, t_info = 24575} >> (gdb) p nlen >> $149 = 2 >> >>(*newtup)[1] doesn't look right, correct? > > > Sorry, my fault, that should've been *(newtup[1]), which indeed looks > fine. > > I'm still not having any luck tracking down my SEGV. If anyone is > interested in debugging or even just reproducing this, I'd be happy to > post or E-mail you my code. It looks like shortly after calling > gistSplit() there's maybe a double-free() somewhere? Here's a typical > session; the NOTICEs are generated by my picksplit function: > > $ gdb ./postgres > (gdb) run -D /var/lib/postgres/data test <x > > [... output as many tuples are inserted, and finally ...] > > NOTICE: g_sig_picksplit: <-- > NOTICE: g_sig_picksplit: left ( 9) *.*.***..*..**....*....*.***.**.*..*.*.****....*..**..**.....*****...***..*.*.*.***..*.**..**.*..*.**.*..**.***.*...*.**.**.......*......***....**......*.***.*......*.*..**....**.*..*..**.*..*.*.***.*.*.**.*....*.**..*...*..*.*****.*.*...*..*.*...*..**..*..***....***.*.*.*..*...**.......*....**.*.*.*..*.*..***.*.****.....*.....**..**...*.*....*..**..**.***...*......*..*....*.*.*.....*.*..*....**......****.*.***.*........***..*....*.*....**.*..***.*.*...*..**.*...*...*...*.......*.*.....**......*................................................................................................................................................................................................................................................................................................................................................................................................................................................................. ........................................................................... > NOTICE: g_sig_picksplit: right ( 50) .**..********************************************************.*.********.****************************************************************.********.***************************************.**********************************.********************.*.*************.********.****************************.***.**.*******************.*.************************************************.******************************.*****.*******************************************************************************.********................................................................................................................................................................................................................................................................................................................................................................................................................................................................. ........................................................................... > > Program received signal SIGSEGV, Segmentation fault. > 0x4028e2ff in mallopt () from /lib/libc.so.6 > (gdb) bt > #0 0x4028e2ff in mallopt () from /lib/libc.so.6 > #1 0x4028d2ca in free () from /lib/libc.so.6 > #2 0x081d05e5 in AllocSetDelete (context=0x82ae098) at aset.c:460 > #3 0x081d0f03 in MemoryContextDelete (context=0x82ae098) at mcxt.c:188 > #4 0x081d0f36 in MemoryContextDeleteChildren (context=0x82ae010) at mcxt.c:207 > #5 0x081d0e8f in MemoryContextDelete (context=0x82ae010) at mcxt.c:161 > #6 0x08094070 in AtCommit_Memory () at xact.c:685 > #7 0x080943e6 in CommitTransaction () at xact.c:1033 > #8 0x080946b6 in CommitTransactionCommand (forceCommit=0 '\0') at xact.c:1304 > #9 0x08160966 in finish_xact_command (forceCommit=0 '\0') at postgres.c:978 > #10 0x081607ef in pg_exec_query_string (query_string=0x82e4570, dest=Debug, parse_context=0x82adf88) at postgres.c:897 > #11 0x081619f0 in PostgresMain (argc=4, argv=0x82960c0, username=0x82966c0 "postgres") at postgres.c:2013 > #12 0x08110625 in main (argc=4, argv=0xbffffc14) at main.c:235 > -- Teodor Sigaev teodor@stack.net