Thread: Unstable C Function

Unstable C Function

From
Ian Campbell
Date:

I'm running PG 9.5 on Win 10 64-bit. I'm compiling C under VS 2016.

I am forming a function that will evolve into a somewhat complex beast. To test out my initial efforts, the function accepts an array of int4 (this works fine and the code for processing it is not shown here). The function then grabs a few rows from a table, pushes the values into a FIFO, then pulls them out and renders the results. This approach is strategic to how the function will operate when complete.

The function works fine on first call, sometimes more, then either resets the connection or throws this on any further calls:

ERROR: cache lookup failed for type 0 SQL state: XX000

I noticed that removing references to the FIFO improves stability, but doesn't solve it. Here is my code as lean as I can get it for question purposes:

PG:

CREATE TABLE md_key
( id    serial NOT NULL PRIMARY KEY, pid   int4 NOT NULL, key   integer NOT NULL, vals  int4[]
);


CREATE OR REPLACE FUNCTION md_key_query(int4[]) RETURNS TABLE (   id int4,   vals int4[]) AS E'\RoctPG', --abreviated for question   'md_key_query' LANGUAGE c IMMUTABLE STRICT;


select * from md_key_query(array[1,2,3,4]::int4[])

C:

    PG_FUNCTION_INFO_V1(md_key_query);
   typedef struct   {       Datum              id;       Datum              vals;   } MdKeyNode;
   typedef struct fifoAry   {       MdKeyNode           nodes[32];       struct fifoAry     *next;       int32               readpos;       int32               writepos;   } FifoAry;
   typedef struct   {       FifoAry            *fifo;       FifoAry            *tail;       FifoAry            *head;       uint32              nodescount;       Datum              *retvals[2];       bool               *retnulls[2];   } CtxArgs;
   inline void push(CtxArgs *args, Datum id, Datum vals)   {       if (args->head->writepos == 32)           args->head = args->head->next = (FifoAry*)palloc0(sizeof(FifoAry));
       MdKeyNode          *node = &(args->head->nodes[args->head->writepos++]);       node->id = id;       node->vals = vals;       args->nodescount++;   }


inline MdKeyNode* pop(CtxArgs *args)
{
//  if (!args->nodescount)
//      return NULL;   if (args->tail->readpos == 32)       args->tail = args->tail->next;
   args->nodescount--;
   return &(args->tail->nodes[args->tail->readpos++]);
}

// use STRICT in the caller wrapper to ensure a null isn't passed in
PGMODULEEXPORT Datum md_key_query(PG_FUNCTION_ARGS)
{   uint32              i;   FuncCallContext    *funcctx;   HeapTuple           tuple;   MdKeyNode          *node;   CtxArgs            *args;
   if (SRF_IS_FIRSTCALL())   {       funcctx = SRF_FIRSTCALL_INIT();
       MemoryContext   oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);       ArrayType      *a = PG_GETARG_ARRAYTYPE_P(0);       Datum          *in_datums;       bool           *in_nulls;       bool            fieldNull;       SPITupleTable  *tuptable = SPI_tuptable;       int32           ret;       uint32          proc;
       if (get_call_result_type(fcinfo, NULL, &funcctx->tuple_desc) != TYPEFUNC_COMPOSITE)           ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("function returning record called in context that cannot accept type record")));
       deconstruct_array(a, INT4OID, 4, true, 'i', &in_datums, &in_nulls, &ret);
       if (!ret)           PG_RETURN_NULL();
       (SPI_connect();
       // initialize and set the cross-call structure       funcctx->user_fctx = args = (CtxArgs*)palloc0(sizeof(CtxArgs));       args->fifo = args->tail = args->head = (FifoAry*)palloc0(sizeof(FifoAry));       args->retvals = (Datum*)palloc(sizeof(Datum) * 2);       args->retnulls = (bool*)palloc0(sizeof(bool) * 2);
       BlessTupleDesc(funcctx->tuple_desc);

// do some work here
       // this is simply a test to see if this function is behaving as expected       ret = SPI_execute("select id, vals from public.md_key where vals is not null limit 64", true, 0);
       if (ret <= 0)           ereport(ERROR, (errcode(ERRCODE_SQL_ROUTINE_EXCEPTION), errmsg("could not execute SQL")));
       proc = SPI_processed;
       if (proc > 0)       {           TupleDesc       tupdesc = SPI_tuptable->tupdesc;           SPITupleTable  *tuptable = SPI_tuptable;
           for (i = 0; i < proc; i++)           {               tuple = tuptable->vals[i];               push(args, SPI_getbinval(tuple, tupdesc, 1, &fieldNull), SPI_getbinval(tuple, tupdesc, 2, &fieldNull));           }       }
       SPI_finish();       MemoryContextSwitchTo(oldcontext);   }
   funcctx = SRF_PERCALL_SETUP();   args = funcctx->user_fctx;
   if (args->nodescount > 0)   {       node = pop(args);       args->retvals[0] = node->id;       args->retvals[1] = node->vals;       tuple = heap_form_tuple(funcctx->tuple_desc, args->retvals, args->retnulls);       SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(tuple));   }   else   {       SRF_RETURN_DONE(funcctx);   }
}

Re: Unstable C Function

From
Tom Lane
Date:
Ian Campbell <ianrc72@gmail.com> writes:
> The function works fine on first call, sometimes more, then either resets
> the connection or throws this on any further calls:
> ERROR: cache lookup failed for type 0 SQL state: XX000

I think the core problem here is that you're dealing with
pass-by-reference results from the SPI_execute() --- specifically,
the int4[] "vals" values --- as if they were pass-by-value.  You're
just saving the Datums, which are only pointers, and expecting what
they point to to still be good when you get around to doing
heap_form_tuple with them.  But in reality they stopped being good
the moment you did SPI_finish().

The failures would be a lot less intermittent if you were testing in
a debug build (with CLOBBER_FREED_MEMORY defined).

The two basic approaches you could take that would work reliably are

1. Copy all the int4[] values into the multi_call_memory_ctx before
doing SPI_finish.

2. Instead of using multi-call mode, form all the result tuples during
a single call and return them in a tuplestore, so that all the work
is done before you call SPI_finish.  You wouldn't really need the FIFO
data structure if you did it that way.

There are some other things I could criticize here, like labeling the
function IMMUTABLE when its results depend on table contents, but
they probably aren't causing your crashes.

            regards, tom lane


Re: Unstable C Function

From
Ian Campbell
Date:
Thanks for personally replying, Tom. I appreciate it.

You are correct. In the interim, I found the following change solved the issue:

SPI_finish(); // move to here
SRF_RETURN_DONE(funcctx);

I'll look into tuplestore. Thanks for the hint. Fixed IMMUTABLE.

Regards
Ian Campbell

On Thu, Sep 22, 2016 at 9:29 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ian Campbell <ianrc72@gmail.com> writes:
> The function works fine on first call, sometimes more, then either resets
> the connection or throws this on any further calls:
> ERROR: cache lookup failed for type 0 SQL state: XX000

I think the core problem here is that you're dealing with
pass-by-reference results from the SPI_execute() --- specifically,
the int4[] "vals" values --- as if they were pass-by-value.  You're
just saving the Datums, which are only pointers, and expecting what
they point to to still be good when you get around to doing
heap_form_tuple with them.  But in reality they stopped being good
the moment you did SPI_finish().

The failures would be a lot less intermittent if you were testing in
a debug build (with CLOBBER_FREED_MEMORY defined).

The two basic approaches you could take that would work reliably are

1. Copy all the int4[] values into the multi_call_memory_ctx before
doing SPI_finish.

2. Instead of using multi-call mode, form all the result tuples during
a single call and return them in a tuplestore, so that all the work
is done before you call SPI_finish.  You wouldn't really need the FIFO
data structure if you did it that way.

There are some other things I could criticize here, like labeling the
function IMMUTABLE when its results depend on table contents, but
they probably aren't causing your crashes.

                        regards, tom lane

Re: Unstable C Function

From
Tom Lane
Date:
Ian Campbell <ianrc72@gmail.com> writes:
> Thanks for personally replying, Tom. I appreciate it.
> You are correct. In the interim, I found the following change solved the
> issue:

> SPI_finish(); // move to here
> SRF_RETURN_DONE(funcctx);

That might work under light usage, but the problem with it is you're
blocking any other function in the same query from using SPI, since
you're leaving your own connection active when returning.  Sooner
or later that's gonna be a problem.

            regards, tom lane


Re: Unstable C Function

From
Ian Campbell
Date:
I'm going to rewrite it to use your tuplestore suggestion.

OK, so SPI is only suitable for single-call functions, right? If another function in the query attempted to use SPI, I assume there would be a deadlock?

Regards, Ian

On Thu, Sep 22, 2016 at 7:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ian Campbell <ianrc72@gmail.com> writes:
> Thanks for personally replying, Tom. I appreciate it.
> You are correct. In the interim, I found the following change solved the
> issue:

> SPI_finish(); // move to here
> SRF_RETURN_DONE(funcctx);

That might work under light usage, but the problem with it is you're
blocking any other function in the same query from using SPI, since
you're leaving your own connection active when returning.  Sooner
or later that's gonna be a problem.

                        regards, tom lane

Re: Unstable C Function

From
Tom Lane
Date:
Ian Campbell <ianrc72@gmail.com> writes:
> OK, so SPI is only suitable for single-call functions, right?

It's usable by strictly nested functions, but a multi-call SRF is more
like a coroutine.

> If another
> function in the query attempted to use SPI, I assume there would be a
> deadlock?

No, the second arrival would get a SPI_ERROR_CONNECT failure from
SPI_connect when there's already an open connection.  For the nested-
calls case you can prevent that with SPI_push/SPI_pop around a call that
might wish to use SPI, but that fix doesn't work in a coroutine situation.

            regards, tom lane