Thread: SRF's + SPI

SRF's + SPI

From
Eric B.Ridge
Date:
With pg v7.4.7 I've written an SRF that uses SPI to return the results 
of a query.  It's one of those functions that works perfectly for me in 
development but randomly crashes in production.  Thus far I've been 
unable to reproduce the crash.  The problem is surely in my code.  And 
before I dig into debugging the hard stuff I would love a sanity check 
to verify that I'm using SPI_connect()/SPI_finish() correctly within 
the context of the SRF.

The backtrace from a core dump:
(gdb) bt
#0  0x0820ee29 in pfree ()
#1  0x08204a18 in end_MultiFuncCall ()
#2  0x4cee0f99 in my_src (fcinfo=0xbfffccd0) at foo.c:93
#3  0x0811058b in ExecMakeTableFunctionResult ()
#4  0x08750c78 in ?? ()
#5  0x00004000 in ?? ()
#6  0xbfffccac in ?? ()
#7  0xbfffcd88 in ?? ()

And the code at foo.c:93SRF_RETURN_DONE(funcctx);

I'm wondering if I'm using SPI incorrectly.  In the past I've had 
issues with incorrectly using SPI (esp with recursion), and I'm not 
entirely sure how it should work with SRF's.  Like I said, everything 
usually works without problems, but from time to time it crashes.  
Below is the basic outline of my code.

Any input will be greatly appreciated.

thanks!

eric

------------------------------
Datum my_srf (PG_FUNCTION_ARGS) {FuncCallContext    *funcctx;
        if (SRF_IS_FIRSTCALL()) {            MemoryContext oldcontext;    char *query;
    if(PG_ARGISNULL(0))        SRF_RETURN_DONE(NULL);  /* nothing to expand when arg[0] is null */
    funcctx = SRF_FIRSTCALL_INIT();
                oldcontext = 
MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
    query = < function call to build a query string >;
    SPI_connect();    if (SPI_exec(query, 0) != SPI_OK_SELECT)        elog(ERROR, "unable to execute fulltext query");
    funcctx->slot = TupleDescGetSlot(SPI_tuptable->tupdesc);    funcctx->attinmeta =
TupleDescGetAttInMetadata(SPI_tuptable->tupdesc);   funcctx->user_fctx = SPI_tuptable;    funcctx->call_cntr = 0;
funcctx->max_calls= SPI_processed;
 
                 MemoryContextSwitchTo(oldcontext);        }
funcctx = SRF_PERCALL_SETUP();
        if (funcctx->call_cntr < funcctx->max_calls) {    SPITupleTable *tuptable = (SPITupleTable *)
funcctx->user_fctx;   HeapTuple tuple;    Datum result;
 
    tuple = tuptable->vals[funcctx->call_cntr];    result = TupleGetDatum(funcctx->slot, tuple);
    SRF_RETURN_NEXT(funcctx, result);        } else {    SPI_finish();
line_93:                SRF_RETURN_DONE(funcctx);  /** XXX: CRASH HERE **/}
PG_RETURN_NULL();
}



Re: SRF's + SPI

From
Tom Lane
Date:
"Eric B.Ridge" <ebr@tcdi.com> writes:
> Like I said, everything 
> usually works without problems, but from time to time it crashes.  

If you rebuild with --enable-cassert, does the crash get more
reproducible?  (Personally I would never consider developing any
backend C code without that turned on.)  It sounds like a use-of-
already-freed-memory bug, and if so --enable-cassert will help you
find it by clobbering memory blocks immediately during pfree.

> Below is the basic outline of my code.

I think you've forgotten that SPI_connect() and SPI_finish change
memory contexts.  The stuff you allocated in your startup step
goes away as soon as you SPI_finish, leaving dangling pointers that
SRF_RETURN_DONE tries to clean up.

I'm also pretty uncomfortable with the fact that you're returning out
of your function while still connected to SPI.  That would certainly
cause problems for anything else trying to use SPI in the same query.

You would probably be better off using the return-a-tuplestore mechanism
that plpgsql uses for set-valued results.  That way you could connect
to SPI, issue your query, push the tuples into the tuplestore,
disconnect from SPI, and be done with only one pass through the
function.  This would be a tad inefficient with a huge number of tuples
of course ... but you could convert the operation to use a SPI cursor
and thereby avoid duplicate storage of the tuples.

We didn't have tuplestores at the time the SPI API was designed.  I
wonder whether we shouldn't extend SPI to have an option to return
tuples into a caller-provided tuplestore, instead of a SPI_tuptable.
        regards, tom lane


Re: SRF's + SPI

From
Eric B.Ridge
Date:
On Apr 1, 2005, at 3:55 PM, Tom Lane wrote:

> "Eric B.Ridge" <ebr@tcdi.com> writes:
>> Like I said, everything
>> usually works without problems, but from time to time it crashes.
>
> If you rebuild with --enable-cassert, does the crash get more
> reproducible?

Indeed.  Every time.  This is now the default for my development 
environment.

> I'm also pretty uncomfortable with the fact that you're returning out
> of your function while still connected to SPI.  That would certainly
> cause problems for anything else trying to use SPI in the same query.

Ditto.  I knew this while writing the code but didn't see any other way 
to handle it.

The tuplestore stuff sounds like the right solution, but in the 
interests of providing a quick patch to my production environment does 
it makes sense to make a copy of the SPI_tuptable during the first-call 
of the SRF (allocated in the SRF's memory context of course)?

I need to look into what plpgsql does with the tuplestore business but 
I suppose it knows how to spill to disk and such.  In the end, that's 
what I'd want, but I think it'll take me more than an hour to write 
that code.

Thanks for your time and help.

eric



Re: SRF's + SPI

From
Tom Lane
Date:
"Eric B. Ridge" <ebr@tcdi.com> writes:
> The tuplestore stuff sounds like the right solution, but in the 
> interests of providing a quick patch to my production environment does 
> it makes sense to make a copy of the SPI_tuptable during the first-call 
> of the SRF (allocated in the SRF's memory context of course)?

You could do that, but I don't believe there's any existing code that
copies a whole SPI_tuptable, which means that pushing the tuples into
a tuplestore would be about the same amount of new code.

For a quick-patch solution it would probably suffice to NULL out those
pointers you put in the SRF state immediately before you do
SRF_RETURN_DONE.  SPI is deleting the stuff fine, the problem is just
the double free attempt from SRF_RETURN_DONE.
        regards, tom lane