Thread: SRF's + SPI
With pg v7.4.7 I've written an SRF that uses SPI to return the results of a query. It's one of those functions that works perfectly for me in development but randomly crashes in production. Thus far I've been unable to reproduce the crash. The problem is surely in my code. And before I dig into debugging the hard stuff I would love a sanity check to verify that I'm using SPI_connect()/SPI_finish() correctly within the context of the SRF. The backtrace from a core dump: (gdb) bt #0 0x0820ee29 in pfree () #1 0x08204a18 in end_MultiFuncCall () #2 0x4cee0f99 in my_src (fcinfo=0xbfffccd0) at foo.c:93 #3 0x0811058b in ExecMakeTableFunctionResult () #4 0x08750c78 in ?? () #5 0x00004000 in ?? () #6 0xbfffccac in ?? () #7 0xbfffcd88 in ?? () And the code at foo.c:93SRF_RETURN_DONE(funcctx); I'm wondering if I'm using SPI incorrectly. In the past I've had issues with incorrectly using SPI (esp with recursion), and I'm not entirely sure how it should work with SRF's. Like I said, everything usually works without problems, but from time to time it crashes. Below is the basic outline of my code. Any input will be greatly appreciated. thanks! eric ------------------------------ Datum my_srf (PG_FUNCTION_ARGS) {FuncCallContext *funcctx; if (SRF_IS_FIRSTCALL()) { MemoryContext oldcontext; char *query; if(PG_ARGISNULL(0)) SRF_RETURN_DONE(NULL); /* nothing to expand when arg[0] is null */ funcctx = SRF_FIRSTCALL_INIT(); oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx); query = < function call to build a query string >; SPI_connect(); if (SPI_exec(query, 0) != SPI_OK_SELECT) elog(ERROR, "unable to execute fulltext query"); funcctx->slot = TupleDescGetSlot(SPI_tuptable->tupdesc); funcctx->attinmeta = TupleDescGetAttInMetadata(SPI_tuptable->tupdesc); funcctx->user_fctx = SPI_tuptable; funcctx->call_cntr = 0; funcctx->max_calls= SPI_processed; MemoryContextSwitchTo(oldcontext); } funcctx = SRF_PERCALL_SETUP(); if (funcctx->call_cntr < funcctx->max_calls) { SPITupleTable *tuptable = (SPITupleTable *) funcctx->user_fctx; HeapTuple tuple; Datum result; tuple = tuptable->vals[funcctx->call_cntr]; result = TupleGetDatum(funcctx->slot, tuple); SRF_RETURN_NEXT(funcctx, result); } else { SPI_finish(); line_93: SRF_RETURN_DONE(funcctx); /** XXX: CRASH HERE **/} PG_RETURN_NULL(); }
"Eric B.Ridge" <ebr@tcdi.com> writes: > Like I said, everything > usually works without problems, but from time to time it crashes. If you rebuild with --enable-cassert, does the crash get more reproducible? (Personally I would never consider developing any backend C code without that turned on.) It sounds like a use-of- already-freed-memory bug, and if so --enable-cassert will help you find it by clobbering memory blocks immediately during pfree. > Below is the basic outline of my code. I think you've forgotten that SPI_connect() and SPI_finish change memory contexts. The stuff you allocated in your startup step goes away as soon as you SPI_finish, leaving dangling pointers that SRF_RETURN_DONE tries to clean up. I'm also pretty uncomfortable with the fact that you're returning out of your function while still connected to SPI. That would certainly cause problems for anything else trying to use SPI in the same query. You would probably be better off using the return-a-tuplestore mechanism that plpgsql uses for set-valued results. That way you could connect to SPI, issue your query, push the tuples into the tuplestore, disconnect from SPI, and be done with only one pass through the function. This would be a tad inefficient with a huge number of tuples of course ... but you could convert the operation to use a SPI cursor and thereby avoid duplicate storage of the tuples. We didn't have tuplestores at the time the SPI API was designed. I wonder whether we shouldn't extend SPI to have an option to return tuples into a caller-provided tuplestore, instead of a SPI_tuptable. regards, tom lane
On Apr 1, 2005, at 3:55 PM, Tom Lane wrote: > "Eric B.Ridge" <ebr@tcdi.com> writes: >> Like I said, everything >> usually works without problems, but from time to time it crashes. > > If you rebuild with --enable-cassert, does the crash get more > reproducible? Indeed. Every time. This is now the default for my development environment. > I'm also pretty uncomfortable with the fact that you're returning out > of your function while still connected to SPI. That would certainly > cause problems for anything else trying to use SPI in the same query. Ditto. I knew this while writing the code but didn't see any other way to handle it. The tuplestore stuff sounds like the right solution, but in the interests of providing a quick patch to my production environment does it makes sense to make a copy of the SPI_tuptable during the first-call of the SRF (allocated in the SRF's memory context of course)? I need to look into what plpgsql does with the tuplestore business but I suppose it knows how to spill to disk and such. In the end, that's what I'd want, but I think it'll take me more than an hour to write that code. Thanks for your time and help. eric
"Eric B. Ridge" <ebr@tcdi.com> writes: > The tuplestore stuff sounds like the right solution, but in the > interests of providing a quick patch to my production environment does > it makes sense to make a copy of the SPI_tuptable during the first-call > of the SRF (allocated in the SRF's memory context of course)? You could do that, but I don't believe there's any existing code that copies a whole SPI_tuptable, which means that pushing the tuples into a tuplestore would be about the same amount of new code. For a quick-patch solution it would probably suffice to NULL out those pointers you put in the SRF state immediately before you do SRF_RETURN_DONE. SPI is deleting the stuff fine, the problem is just the double free attempt from SRF_RETURN_DONE. regards, tom lane