Thread: pg_autovacuum w/ dbt2
After all this time I finally got around to vacuuming the database with dbt2 with pg_autovacuum. :)http://www.osdl.org/projects/dbt2dev/results/dev4-010/215/ Doesn't look so good though, probably because I'm not using optimal settings with pg_autovacuum. So far I have only tried the default settings (running without any arguments, except -D). The only thing that's peculiar is a number of unexpected rollbacks across all of the transactions. I suspect it was something to do with these messages coming from pg_autovacuum: [2004-12-20 15:48:18 PST] ERROR: Can not refresh statistics information from the database dbt2. [2004-12-20 15:48:18 PST] The error is [ERROR: failed to re-find parent key in "pk_district" ] This is with 8.0rc1. I can get rc2 installed since it just came out. So let me know what I can try and what not. Mark
Mark Wong <markw@osdl.org> writes: > [2004-12-20 15:48:18 PST] The error is [ERROR: failed to re-find parent key in "pk_district" > ] Yikes. Is this reproducible? regards, tom lane
Mark Wong wrote: >After all this time I finally got around to vacuuming the database >with dbt2 with pg_autovacuum. :) > http://www.osdl.org/projects/dbt2dev/results/dev4-010/215/ > > Thanks! >Doesn't look so good though, probably because I'm not using optimal >settings with pg_autovacuum. So far I have only tried the default >settings (running without any arguments, except -D). > > I don't know what you mean by "Not Good" since I don't have graphs from a similar test without pg_autovacuum handy. Do you have a link to such a test? As for better pg_autovacuum settings, It appears that the little performance dips are happening about once every 5 minutes, which if I remember correctly is the default sleep time. You might try playing with the lazy vacuum settings to see if that smooths out the curve. Beyond that all you can do is play with the thresholds to see if there is a better sweet spot than the defaults (which by the way I have no confidence in, they were just conservative guesses) >The only thing that's peculiar is a number of unexpected rollbacks >across all of the transactions. I suspect it was something to do with >these messages coming from pg_autovacuum: > >[2004-12-20 15:48:18 PST] ERROR: Can not refresh statistics information from the database dbt2. >[2004-12-20 15:48:18 PST] The error is [ERROR: failed to re-find parent key in "pk_district" >] > > Not sure what this is all about, but if you turn up the debug level to 4 or greater (pg_autovacuum -d4), pg_autovacuum will log the query that is causing the problems, that would be helpful output to have. >This is with 8.0rc1. I can get rc2 installed since it just came out. >So let me know what I can try and what not. > > I don't think anything has changed for pg_autovacuum between rc1 and rc2. thanks again for the good work!!!
On Tue, Dec 21, 2004 at 02:23:41PM -0500, Tom Lane wrote: > Mark Wong <markw@osdl.org> writes: > > [2004-12-20 15:48:18 PST] The error is [ERROR: failed to re-find parent key in "pk_district" > > ] > > Yikes. Is this reproducible? > > regards, tom lane Yes, and I think there is one for each of the rollbacks that are occuring in the workload. Except for the 1% that's supposed to happen for the new-order transaction. Mark
The overall throughput is better for a run like this:http://www.osdl.org/projects/dbt2dev/results/dev4-010/207/ A drop from 3865 to 2679 (31%) by just adding pg_autovacuum. That's what I meant by "not good". :) I'll start with the additional debug messages, with 8.0rc2, before I start changing the other settings, if that sounds good. Mark On Tue, Dec 21, 2004 at 02:33:57PM -0500, Matthew T. O'Connor wrote: > Mark Wong wrote: > > >After all this time I finally got around to vacuuming the database > >with dbt2 with pg_autovacuum. :) > > http://www.osdl.org/projects/dbt2dev/results/dev4-010/215/ > > > > > Thanks! > > >Doesn't look so good though, probably because I'm not using optimal > >settings with pg_autovacuum. So far I have only tried the default > >settings (running without any arguments, except -D). > > > > > I don't know what you mean by "Not Good" since I don't have graphs from > a similar test without pg_autovacuum handy. Do you have a link to such > a test? > > As for better pg_autovacuum settings, It appears that the little > performance dips are happening about once every 5 minutes, which if I > remember correctly is the default sleep time. You might try playing > with the lazy vacuum settings to see if that smooths out the curve. > Beyond that all you can do is play with the thresholds to see if there > is a better sweet spot than the defaults (which by the way I have no > confidence in, they were just conservative guesses) > > >The only thing that's peculiar is a number of unexpected rollbacks > >across all of the transactions. I suspect it was something to do with > >these messages coming from pg_autovacuum: > > > >[2004-12-20 15:48:18 PST] ERROR: Can not refresh statistics information from the database dbt2. > >[2004-12-20 15:48:18 PST] The error is [ERROR: failed to re-find parent key in "pk_district" > >] > > > > > Not sure what this is all about, but if you turn up the debug level to 4 > or greater (pg_autovacuum -d4), pg_autovacuum will log the query that is > causing the problems, that would be helpful output to have. > > >This is with 8.0rc1. I can get rc2 installed since it just came out. > >So let me know what I can try and what not. > > > > > I don't think anything has changed for pg_autovacuum between rc1 and rc2. > > > thanks again for the good work!!!
Mark Wong <markw@osdl.org> writes: > On Tue, Dec 21, 2004 at 02:23:41PM -0500, Tom Lane wrote: >> Mark Wong <markw@osdl.org> writes: >>> [2004-12-20 15:48:18 PST] The error is [ERROR: failed to re-find parent key in "pk_district" >> >> Yikes. Is this reproducible? > Yes, and I think there is one for each of the rollbacks that are > occuring in the workload. Except for the 1% that's supposed to happen > for the new-order transaction. Well, we need to find out what's causing that. There are two possible sources of that error (one elog in src/backend/access/nbtree/nbtinsert.c, and one in src/backend/access/nbtree/nbtpage.c) and neither of them should ever fire. If you want to track it yourself, please change those elog(ERROR)s to elog(PANIC) so that they'll generate core dumps, then build with --enable-debug if you didn't already (--enable-cassert would be good too) and get a debugger stack trace from the core dump. Otherwise, can you extract a test case that causes this without needing vast resources to run? regards, tom lane
On Tue, Dec 21, 2004 at 05:56:47PM -0500, Tom Lane wrote: > Mark Wong <markw@osdl.org> writes: > > On Tue, Dec 21, 2004 at 02:23:41PM -0500, Tom Lane wrote: > >> Mark Wong <markw@osdl.org> writes: > >>> [2004-12-20 15:48:18 PST] The error is [ERROR: failed to re-find parent key in "pk_district" > >> > >> Yikes. Is this reproducible? > > > Yes, and I think there is one for each of the rollbacks that are > > occuring in the workload. Except for the 1% that's supposed to happen > > for the new-order transaction. > > Well, we need to find out what's causing that. There are two possible > sources of that error (one elog in src/backend/access/nbtree/nbtinsert.c, > and one in src/backend/access/nbtree/nbtpage.c) and neither of them > should ever fire. > > If you want to track it yourself, please change those elog(ERROR)s to > elog(PANIC) so that they'll generate core dumps, then build with > --enable-debug if you didn't already (--enable-cassert would be good too) > and get a debugger stack trace from the core dump. > > Otherwise, can you extract a test case that causes this without needing > vast resources to run? > > regards, tom lane I was going to try Matthew's suggestion of turning up the debug on pg_autovacuum, unless you don't that'll help find the cause. I'm not sure if I can more easily reproduce the problem but i can try. I'll go ahead and make the elog() changes you recommended and do a run overnight either way. Mark
Mark Wong <markw@osdl.org> writes: > I was going to try Matthew's suggestion of turning up the debug on > pg_autovacuum, unless you don't that'll help find the cause. It won't help --- this is a backend-internal bug of some kind. regards, tom lane
Mark Wong wrote: >The overall throughput is better for a run like this: > http://www.osdl.org/projects/dbt2dev/results/dev4-010/207/ > >A drop from 3865 to 2679 (31%) by just adding pg_autovacuum. That's >what I meant by "not good". :) > > I would agree that is "not good" :-) It sounds like pg_autovacuum is being to aggressive for this type of load, that is vacuuming more often than needed, however the lazy vacuum options were added so as to reduce the performance impact of running a vacuum while doing other things, so, I would recommend both higher autovacuum thresholds and trying out some of the lazy vacuum settings. >I'll start with the additional debug messages, with 8.0rc2, before >I start changing the other settings, if that sounds good. > Sounds fine. From Tom Lane's response, we have a backend bug that needs to be resolved and I think that is the priority.
I apologize for the significant delay, here's a link to results to a test with 8.0rc3:http://www.osdl.org/projects/dbt2dev/results/dev4-010/236/ These are the same parameters with as run 215, listed below with the but with --enable-debug --enable-cassert. I also ran pg_autovacuum with -d4, where the output can be seen here:http://www.osdl.org/projects/dbt2dev/results/dev4-010/236/dbt2.out I, uh, wasn't able to reproduce the previous errors after repairing my filesystems after a power outage. So I figure that might be good news. The performance is up from run 215 with the errors, so I'll continue with trying to tune some of the pg_autovacuum values. Mark On Tue, Dec 21, 2004 at 09:41:31AM -0800, Mark Wong wrote: > After all this time I finally got around to vacuuming the database > with dbt2 with pg_autovacuum. :) > http://www.osdl.org/projects/dbt2dev/results/dev4-010/215/ > > Doesn't look so good though, probably because I'm not using optimal > settings with pg_autovacuum. So far I have only tried the default > settings (running without any arguments, except -D). > > The only thing that's peculiar is a number of unexpected rollbacks > across all of the transactions. I suspect it was something to do with > these messages coming from pg_autovacuum: > > [2004-12-20 15:48:18 PST] ERROR: Can not refresh statistics information from the database dbt2. > [2004-12-20 15:48:18 PST] The error is [ERROR: failed to re-find parent key in "pk_district" > ] > > This is with 8.0rc1. I can get rc2 installed since it just came out. > So let me know what I can try and what not. > > Mark
I'm curious, the original run you posted with 3825 NOTPM is still 17% faster than the latest pg_autovacuum run which shows 3280 NOTPM. Is this on the same hardware? Also, did the original non-pg_autovacuum run any manual vacuum commands? Also, does the non-pg_autovacuum run start slowing down after a while? The graphs look like there is a slight decline in performance as time goes on, what happens if you double the length of the test? Thanks for doing the testing! Matthew Mark Wong wrote: >I apologize for the significant delay, here's a link to results to a >test with 8.0rc3: > http://www.osdl.org/projects/dbt2dev/results/dev4-010/236/ > >These are the same parameters with as run 215, listed below with the >but with --enable-debug --enable-cassert. I also ran pg_autovacuum >with -d4, where the output can be seen here: > http://www.osdl.org/projects/dbt2dev/results/dev4-010/236/dbt2.out > >I, uh, wasn't able to reproduce the previous errors after repairing my >filesystems after a power outage. So I figure that might be good news. >The performance is up from run 215 with the errors, so I'll continue >with trying to tune some of the pg_autovacuum values. > >Mark > > >On Tue, Dec 21, 2004 at 09:41:31AM -0800, Mark Wong wrote: > > >>After all this time I finally got around to vacuuming the database >>with dbt2 with pg_autovacuum. :) >> http://www.osdl.org/projects/dbt2dev/results/dev4-010/215/ >> >>Doesn't look so good though, probably because I'm not using optimal >>settings with pg_autovacuum. So far I have only tried the default >>settings (running without any arguments, except -D). >> >>The only thing that's peculiar is a number of unexpected rollbacks >>across all of the transactions. I suspect it was something to do with >>these messages coming from pg_autovacuum: >> >>[2004-12-20 15:48:18 PST] ERROR: Can not refresh statistics information from the database dbt2. >>[2004-12-20 15:48:18 PST] The error is [ERROR: failed to re-find parent key in "pk_district" >>] >> >>This is with 8.0rc1. I can get rc2 installed since it just came out. >>So let me know what I can try and what not. >> >>Mark >> >> > > >
Yeah, same hardware and database configuration. No manual vacuum commands before. The decline in performance has been pretty consistent in all my previous tests and people have told me on many occasions that the decline in performance was probably because I was never using vacuum... Mark On Fri, Jan 07, 2005 at 08:48:52AM -0500, Matthew T. O'Connor wrote: > I'm curious, the original run you posted with 3825 NOTPM is still 17% > faster than the latest pg_autovacuum run which shows 3280 NOTPM. Is > this on the same hardware? Also, did the original non-pg_autovacuum run > any manual vacuum commands? Also, does the non-pg_autovacuum run start > slowing down after a while? The graphs look like there is a slight > decline in performance as time goes on, what happens if you double the > length of the test? > > Thanks for doing the testing! > > Matthew > > > Mark Wong wrote: > > >I apologize for the significant delay, here's a link to results to a > >test with 8.0rc3: > > http://www.osdl.org/projects/dbt2dev/results/dev4-010/236/ > > > >These are the same parameters with as run 215, listed below with the > >but with --enable-debug --enable-cassert. I also ran pg_autovacuum > >with -d4, where the output can be seen here: > > http://www.osdl.org/projects/dbt2dev/results/dev4-010/236/dbt2.out > > > >I, uh, wasn't able to reproduce the previous errors after repairing my > >filesystems after a power outage. So I figure that might be good news. > >The performance is up from run 215 with the errors, so I'll continue > >with trying to tune some of the pg_autovacuum values. > > > >Mark > > > > > >On Tue, Dec 21, 2004 at 09:41:31AM -0800, Mark Wong wrote: > > > > > >>After all this time I finally got around to vacuuming the database > >>with dbt2 with pg_autovacuum. :) > >> http://www.osdl.org/projects/dbt2dev/results/dev4-010/215/ > >> > >>Doesn't look so good though, probably because I'm not using optimal > >>settings with pg_autovacuum. So far I have only tried the default > >>settings (running without any arguments, except -D). > >> > >>The only thing that's peculiar is a number of unexpected rollbacks > >>across all of the transactions. I suspect it was something to do with > >>these messages coming from pg_autovacuum: > >> > >>[2004-12-20 15:48:18 PST] ERROR: Can not refresh statistics information from the database dbt2. > >>[2004-12-20 15:48:18 PST] The error is [ERROR: failed to re-find parent key in "pk_district" > >>] > >> > >>This is with 8.0rc1. I can get rc2 installed since it just came out. > >>So let me know what I can try and what not. > >> > >>Mark > >> > >> > > > > > > -- Mark Wong - - markw@osdl.org Open Source Development Lab Inc - A non-profit corporation 12725 SW Millikan Way - Suite 400 - Beaverton, OR 97005 (503) 906-1932 (office) (503) 626-2436 (fax) http://developer.osdl.org/markw/
Ok, but what I'm curious to do is see if you run the non-pg_autovacuum test for a "long time" (4 hours? more?) when does it get slower that running with pg_autovacuum. And, can you demonstrate that running the tests with pg_autovacuum for a long time (say 4 hours) that the performance stays steady. Also, I would very much like to see this test run with pg_autovacuum and it's vacuum delay settings enabled. Matthew ps: I know time is limited and these tests take a lot of time to run, so please take my requests with a grain of salt, all I'm saying is that I think these would be interesting results to see. Mark Wong wrote: >Yeah, same hardware and database configuration. > >No manual vacuum commands before. The decline in performance has been >pretty consistent in all my previous tests and people have told me on >many occasions that the decline in performance was probably because I >was never using vacuum... > >Mark > >On Fri, Jan 07, 2005 at 08:48:52AM -0500, Matthew T. O'Connor wrote: > > >>I'm curious, the original run you posted with 3825 NOTPM is still 17% >>faster than the latest pg_autovacuum run which shows 3280 NOTPM. Is >>this on the same hardware? Also, did the original non-pg_autovacuum run >>any manual vacuum commands? Also, does the non-pg_autovacuum run start >>slowing down after a while? The graphs look like there is a slight >>decline in performance as time goes on, what happens if you double the >>length of the test? >> >>Thanks for doing the testing! >> >>Matthew >> >>
Mark, > No manual vacuum commands before. The decline in performance has been > pretty consistent in all my previous tests and people have told me on > many occasions that the decline in performance was probably because I > was never using vacuum... Hmmm ... what autovacuum params are you using? From the look of the graphs, VACUUM is running every 6 minutes or so, and having a huge impact on performance when it does run. Give me a bit of time to crunch the database log, and we'll see which table vacuums were taking the most time. Can you re-run this test with a fairly generous vacuum_delay setting? Given that the performance drops seem to correspond to wait times, I think this could mean a big difference. -- Josh Berkus Aglio Database Solutions San Francisco
On Fri, Jan 07, 2005 at 09:58:47AM -0800, Josh Berkus wrote: > Mark, > > > No manual vacuum commands before. The decline in performance has been > > pretty consistent in all my previous tests and people have told me on > > many occasions that the decline in performance was probably because I > > was never using vacuum... > > Hmmm ... what autovacuum params are you using? From the look of the graphs, > VACUUM is running every 6 minutes or so, and having a huge impact on > performance when it does run. Give me a bit of time to crunch the database > log, and we'll see which table vacuums were taking the most time. > > Can you re-run this test with a fairly generous vacuum_delay setting? Given > that the performance drops seem to correspond to wait times, I think this > could mean a big difference. > All default parameters. Matthew also recommended using the vacuum_delay setting so I was about to try that. Mark
Mark, > All default parameters. Matthew also recommended using the > vacuum_delay setting so I was about to try that. Interesting ... the default parameters are quite conservative, running only when the table has doubled in new rows. So if those spikes are vacuums, then the DBT2 test is updating or adding 100% of the rows in the orders table every 6-8 minutes? Does that seem right? -- Josh Berkus Aglio Database Solutions San Francisco
On Fri, Jan 07, 2005 at 10:13:52AM -0800, Josh Berkus wrote: > Mark, > > > All default parameters. Matthew also recommended using the > > vacuum_delay setting so I was about to try that. > > Interesting ... the default parameters are quite conservative, running only > when the table has doubled in new rows. So if those spikes are vacuums, > then the DBT2 test is updating or adding 100% of the rows in the orders table > every 6-8 minutes? Does that seem right? > Some hand-wavey math here: We start with 18,000,000 rows in the orders table. Based one 3281 transactions per minute for just the new-order transaction:288 transactions per minute for delivery New-order inserts 3281 rows per minute (not counting the 1% of rollbacks.) Delivery updates 288 rows per minute in orders. So no, we're not change that many rows per minute... Mark
Mark Wong wrote: >On Fri, Jan 07, 2005 at 10:13:52AM -0800, Josh Berkus wrote: > > >>Mark, >> >> >>>All default parameters. Matthew also recommended using the >>>vacuum_delay setting so I was about to try that. >>> >>> >>Interesting ... the default parameters are quite conservative, running only >>when the table has doubled in new rows. So if those spikes are vacuums, >>then the DBT2 test is updating or adding 100% of the rows in the orders table >>every 6-8 minutes? Does that seem right? >> >> > >Some hand-wavey math here: > >We start with 18,000,000 rows in the orders table. > >Based one 3281 transactions per minute for just the new-order transaction: > 288 transactions per minute for delivery > >New-order inserts 3281 rows per minute (not counting the 1% of rollbacks.) >Delivery updates 288 rows per minute in orders. > >So no, we're not change that many rows per minute... > > Except the thing pg_autovacuum is actually doing is Analyzing the district table (4800 rows, takes ~15 sec) and vacuuming the warehouse table (66,000, takes ~50sec) which takes about 50 seconds. Since the default vacuum sleep time is 5 minutes, that would explain the dip every 6 minutes. Matthew
On Tue, Dec 21, 2004 at 05:56:47PM -0500, Tom Lane wrote: > If you want to track it yourself, please change those elog(ERROR)s to > elog(PANIC) so that they'll generate core dumps, then build with > --enable-debug if you didn't already (--enable-cassert would be good too) > and get a debugger stack trace from the core dump. Ok, well I got a core dump with 8.0rc4, but I'm not sure if it's exactly the same problem. I have the postgres binary and the core here:http://developer.osdl.org/markw/pgsql/core/2files.tar.bz2 But it's for ia64, if you got one. Otherwise, this is what gdb is telling me with a bt: (gdb) bt #0 FunctionCall2 (flinfo=0x6000000000187850, arg1=16, arg2=1043) at fmgr.c:1141 #1 0x400000000007a320 in _bt_checkkeys (scan=0x60000000000c2d80, tuple=0x200000000101f660, dir=ForwardScanDirection, continuescan=0x60000fffffff8ae0 "\001") at nbtutils.c:542 #2 0x4000000000078eb0 in _bt_endpoint (scan=0x6000000000187690, dir=ForwardScanDirection) at nbtsearch.c:1309 #3 0x40000000000771e0 in _bt_first (scan=0x6000000000187690, dir=ForwardScanDirection) at nbtsearch.c:482 #4 0x4000000000074350 in btgettuple (fcinfo=0x1) at nbtree.c:265 #5 0x40000000003bd430 in FunctionCall2 (flinfo=0x6000000000187700, arg1=6917529027642685072, arg2=1) at fmgr.c:1141 #6 0x400000000006b3a0 in index_getnext (scan=0x6000000000187690, direction=ForwardScanDirection) at indexam.c:429 #7 0x400000000006a1e0 in systable_getnext (sysscan=0x6000000000187668) at genam.c:253 #8 0x400000000039c970 in SearchCatCache (cache=0x200000001f1e0140, v1=0, v2=6917529027641871376, v3=4294966252, v4=6917546619827097184) at catcache.c:1217 #9 0x40000000003a9ee0 in SearchSysCache (cacheId=33, key1=1043, key2=0, key3=0, key4=0) at syscache.c:524 #10 0x4000000000049110 in TupleDescInitEntry (desc=0x60000000001872c8, attributeNumber=4, attributeName=0x6000000000187614"\023\004", oidtypeid=1043, typmod=28, attdim=0) at tupdesc.c:444 #11 0x40000000001b5fc0 in ExecTypeFromTLInternal ( targetList=0x6000000000135d40, hasoid=-64 'À', skipjunk=1 '\001') at execTuples.c:570 #12 0x40000000001a4a20 in ExecInitJunkFilter (targetList=0x6000000000135b38, hasoid=-64 'À', slot=0x60000000001258a0)at execJunk.c:76 #13 0x40000000001a6890 in InitPlan (queryDesc=0x6000000000177ed0, explainOnly=0 '\0') at execMain.c:456 #14 0x40000000001a5800 in ExecutorStart (queryDesc=0x6000000000177ed0, explainOnly=0 '\0') at execMain.c:160 #15 0x40000000001d6ab0 in _SPI_pquery (queryDesc=0x6000000000177ed0, tcount=0) at spi.c:1521 #16 0x40000000001d6390 in _SPI_execute_plan (plan=0x60000fffffff9380, Values=0x0, Nulls=0x0, snapshot=0x0, crosscheck_snapshot=0x0, read_only=0 '\0', tcount=0) at spi.c:1452
Mark Wong <markw@osdl.org> writes: > Ok, well I got a core dump with 8.0rc4, but I'm not sure if it's > exactly the same problem. I have the postgres binary and the core > here: > http://developer.osdl.org/markw/pgsql/core/2files.tar.bz2 > But it's for ia64, if you got one. Poking around with gdb, it seems that the scankey structure being used by SearchCatCache got clobbered; which is a bit surprising because that's just a local variable in that function, and hence isn't really very exposed. The contents of cache->cc_skey are okay, but cur_skey[0] and cur_skey[1] don't match, which implies the clobber happened somewhere between lines 1110 and 1217 of catcache.c. (gdb) f 8 #8 0x400000000039c970 in SearchCatCache (cache=0x200000001f1e0140, v1=0, v2=6917529027641871376, v3=4294966252, v4=6917546619827097184) at catcache.c:1217 1217 in catcache.c (gdb) p cache->cc_skey $7 = {{sk_flags = 0, sk_attno = -2, sk_strategy = 3, sk_subtype = 0, sk_func = {fn_addr = 0x200000000003a9c8, fn_oid =184, fn_nargs = 2, fn_strict = 1 '\001', fn_retset = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x600000000009e550, fn_expr= 0x0}, sk_argument = 0}, { sk_flags = 0, sk_attno = 0, sk_strategy = 0, sk_subtype = 0, sk_func = { fn_addr= 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\0', fn_retset = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr= 0x0}, sk_argument = 0}, {sk_flags = 0, sk_attno = 0, sk_strategy = 0, sk_subtype = 0, sk_func = {fn_addr = 0,fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\0', fn_retset = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0},sk_argument = 0}, {sk_flags = 0, sk_attno = 0, sk_strategy = 0, sk_subtype = 0, sk_func = {fn_addr = 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\0', fn_retset = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0}, sk_argument= 0}} (gdb) p cur_skey $8 = {{sk_flags = 0, sk_attno = 1, sk_strategy = 24932, sk_subtype = 24948, sk_func = {fn_addr = 0, fn_oid = 0, fn_nargs= 0, fn_strict = 0 '\0', fn_retset = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0}, sk_argument = 1043},{sk_flags = 0, sk_attno = 1043, sk_strategy = 0, sk_subtype = 4294967295, sk_func = {fn_addr = 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\0', fn_retset = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0}, sk_argument = 0},{sk_flags = 0, sk_attno = 0, sk_strategy = 0, sk_subtype = 0, sk_func = {fn_addr = 0, fn_oid = 0, fn_nargs = 0,fn_strict = 0 '\0', fn_retset = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0}, sk_argument = 0}, { sk_flags= 0, sk_attno = 0, sk_strategy = 0, sk_subtype = 0, sk_func = { fn_addr = 0, fn_oid = 0, fn_nargs = 0, fn_strict= 0 '\0', fn_retset = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0}, sk_argument = 0}} The core dump happens because we eventually try to jump through the zeroed-out fn_addr function pointer. Not sure what to make of this. That's extremely heavily used, well-debugged code; it's hard to believe that there are any intermittent bugs in it. I notice that the backend seems to have been using some nonstandard C code: Error while reading shared library symbols: /home/markw/dbt2/storedproc/pgsql/c/../../../storedproc/pgsql/c/funcs.so: No such file or directory. What is that, and how much confidence have you got in it? regards, tom lane
On Wed, Jan 12, 2005 at 09:17:33PM -0500, Tom Lane wrote: > I notice that the backend seems to have been using some nonstandard C > code: > > Error while reading shared library symbols: > /home/markw/dbt2/storedproc/pgsql/c/../../../storedproc/pgsql/c/funcs.so: No such file or directory. > > What is that, and how much confidence have you got in it? That's my C stored function library. I'll attached it if anyone wants to take a persusal. Well, it was my first attempt with C stored functions and SPI calls, so it wouldn't surprise me if it was flawed. Would supplying the .so help the debugging? Mark