Thread: Memory error in user-defined aggregation function

Memory error in user-defined aggregation function

From
Adriaan Joubert
Date:
Hi,

I've implemented an aggregation function to compute quartiles in C
borrowing liberally from orafce code. I uses this code in a windowing
context and it worked fine until today - and I'm not sure what
changed. This is on 9.1.2 and I have also tried it on 9.1.4.

What I have determined so far (by sprinkling a lot of elog's
throughout the code) is that it does not seem to be data specific,
although it seems to depend on the number of aggregations I do (up to
about 1250 seems to be fine, beyond that it chokes). I also
established that there does not seem to be a problem with the transfer
function, and the data is accumulated without any issues. The error I
see is in the call to first_quartile_final (listed below). The pointer
to the transfer data structure is not null, but accessing the field
mstate->nelems causes a segflt. So the transfer data structure pointer
is bogus.

I've recompiled postgres with debugging enabled and have connected to
the backend with gdb, but haven't had any joy in persuading gdb to
actually stop in the correct file so that I can step through. I'll
keep on trying to make some headway with that.

In the meantime I would appreciate any comments as to whether the
approach taken is the right one, and whether additional checks can be
inserted to avoid this segmentation faults.

Many thanks,

Adriaan


My transfer data structure is

typedef struct
{
  int len; /* allocated length */
  int nextlen; /* next allocated length */
  int nelems; /* number of valid entries */
  float8  *values;
} quartile_state;

On the first call to the aggregate function this data structure is
allocated as follows:

static quartile_state *
quartile_accummulate(quartile_state *mstate, float8 value,
MemoryContext aggcontext)
{
    MemoryContext oldcontext;

    if (mstate == NULL)
    {
        /* First call - initialize */
        oldcontext = MemoryContextSwitchTo(aggcontext);
        mstate = palloc(sizeof(quartile_state));
        mstate->len = 512;
        mstate->nextlen = 2 * 512;
        mstate->nelems = 0;
        mstate->values = palloc(mstate->len * sizeof(float8));
        MemoryContextSwitchTo(oldcontext);
    }
    else
    {
        if (mstate->nelems >= mstate->len)
        {
            int newlen = mstate->nextlen;

            oldcontext = MemoryContextSwitchTo(aggcontext);
            mstate->nextlen += mstate->len;
            mstate->len = newlen;
            mstate->values = repalloc(mstate->values, mstate->len * sizeof(float8));
            MemoryContextSwitchTo(oldcontext);
        }
    }

    mstate->values[mstate->nelems++] = value;

    return mstate;
}


And the transfer function itself is

PG_FUNCTION_INFO_V1(quartile_transfer);
Datum
quartile_transfer(PG_FUNCTION_ARGS) {
    MemoryContext    aggcontext;
    quartile_state *state = NULL;
    float8 elem;

    if (!AggCheckCallContext(fcinfo, &aggcontext))
    {
        elog(ERROR, "quartile_transform called in non-aggregate context");
    }

    state = PG_ARGISNULL(0) ? NULL : (quartile_state *) PG_GETARG_POINTER(0);
    if (PG_ARGISNULL(1))
        PG_RETURN_POINTER(state);

    elem = PG_GETARG_FLOAT8(1);

    state = quartile_accummulate(state, elem, aggcontext);

    PG_RETURN_POINTER(state);
}

The final function for the computation of the first quartile is

PG_FUNCTION_INFO_V1(first_quartile_final);
Datum
first_quartile_final(PG_FUNCTION_ARGS) {
    quartile_state *state = NULL;
    float8 result;

    if (PG_ARGISNULL(0))
        PG_RETURN_NULL();

    state = (quartile_state *) PG_GETARG_POINTER(0);

        /** HERE state->nelems causes a segflt */
    if (state->nelems<4)
        PG_RETURN_NULL();

    result = quartile_result(state, 0.25);

    PG_RETURN_FLOAT8(result);
}

Re: Memory error in user-defined aggregation function

From
Tom Lane
Date:
Adriaan Joubert <adriaan.joubert@gmail.com> writes:
> I've implemented an aggregation function to compute quartiles in C
> borrowing liberally from orafce code. I uses this code in a windowing
> context and it worked fine until today - and I'm not sure what
> changed. This is on 9.1.2 and I have also tried it on 9.1.4.

Hm, it doesn't look very different from what's done in e.g. array_agg.
You do have the transition datatype declared as "internal", no?

I notice that your transition function is sloppy about returning a null
pointer (as opposed to a SQL null) if both input arguments are null.
If all the aggregated values were nulls then the null pointer would
reach the final function and cause a crash similar to the one described.
But that's hardly "not data specific".

> I've recompiled postgres with debugging enabled and have connected to
> the backend with gdb, but haven't had any joy in persuading gdb to
> actually stop in the correct file so that I can step through. I'll
> keep on trying to make some headway with that.

I've found that gdb takes extra persuasion to notice shared libraries
that are loaded after it attaches to the process.  Usually the path
of least resistance is to ensure that the library is loaded before
you attach.  Use LOAD, or just CREATE OR REPLACE one of the functions
in your library.

            regards, tom lane

Re: Memory error in user-defined aggregation function

From
Adriaan Joubert
Date:
Hi,

Finally got this running under the debugger and figured out what is
going on. I had been under the impression that

         if (PG_ARGISNULL(0))
                 PG_RETURN_NULL();

         state = (quartile_state *) PG_GETARG_POINTER(0);

would ensure that state was never a null pointer. However this is not
the case, and an additional check for state==0x0 solved the problem.
Somewhat unexpected, I have to say.

I would still be interested in any ways in which this implementation
could be improved. It would be good if there were some model
implementations for this type of thing - without orafce to guide me I
would have had a hard time figuring any of this out from the docs. I'd
gladly make the quartile implementation available for this purpose if
there is interest.

Adriaan


On 7 August 2012 15:04, Adriaan Joubert <adriaan.joubert@gmail.com> wrote:
> Hi,
>
> I've implemented an aggregation function to compute quartiles in C
> borrowing liberally from orafce code. I uses this code in a windowing
> context and it worked fine until today - and I'm not sure what
> changed. This is on 9.1.2 and I have also tried it on 9.1.4.
>
> What I have determined so far (by sprinkling a lot of elog's
> throughout the code) is that it does not seem to be data specific,
> although it seems to depend on the number of aggregations I do (up to
> about 1250 seems to be fine, beyond that it chokes). I also
> established that there does not seem to be a problem with the transfer
> function, and the data is accumulated without any issues. The error I
> see is in the call to first_quartile_final (listed below). The pointer
> to the transfer data structure is not null, but accessing the field
> mstate->nelems causes a segflt. So the transfer data structure pointer
> is bogus.
>
> I've recompiled postgres with debugging enabled and have connected to
> the backend with gdb, but haven't had any joy in persuading gdb to
> actually stop in the correct file so that I can step through. I'll
> keep on trying to make some headway with that.
>
> In the meantime I would appreciate any comments as to whether the
> approach taken is the right one, and whether additional checks can be
> inserted to avoid this segmentation faults.
>
> Many thanks,
>
> Adriaan
>
>
> My transfer data structure is
>
> typedef struct
> {
>   int len; /* allocated length */
>   int nextlen; /* next allocated length */
>   int nelems; /* number of valid entries */
>   float8  *values;
> } quartile_state;
>
> On the first call to the aggregate function this data structure is
> allocated as follows:
>
> static quartile_state *
> quartile_accummulate(quartile_state *mstate, float8 value,
> MemoryContext aggcontext)
> {
>         MemoryContext oldcontext;
>
>         if (mstate == NULL)
>         {
>                 /* First call - initialize */
>                 oldcontext = MemoryContextSwitchTo(aggcontext);
>                 mstate = palloc(sizeof(quartile_state));
>                 mstate->len = 512;
>                 mstate->nextlen = 2 * 512;
>                 mstate->nelems = 0;
>                 mstate->values = palloc(mstate->len * sizeof(float8));
>                 MemoryContextSwitchTo(oldcontext);
>         }
>         else
>         {
>                 if (mstate->nelems >= mstate->len)
>                 {
>                         int newlen = mstate->nextlen;
>
>                         oldcontext = MemoryContextSwitchTo(aggcontext);
>                         mstate->nextlen += mstate->len;
>                         mstate->len = newlen;
>                         mstate->values = repalloc(mstate->values, mstate->len * sizeof(float8));
>                         MemoryContextSwitchTo(oldcontext);
>                 }
>         }
>
>         mstate->values[mstate->nelems++] = value;
>
>         return mstate;
> }
>
>
> And the transfer function itself is
>
> PG_FUNCTION_INFO_V1(quartile_transfer);
> Datum
> quartile_transfer(PG_FUNCTION_ARGS) {
>         MemoryContext   aggcontext;
>         quartile_state *state = NULL;
>         float8 elem;
>
>         if (!AggCheckCallContext(fcinfo, &aggcontext))
>         {
>                 elog(ERROR, "quartile_transform called in non-aggregate context");
>         }
>
>         state = PG_ARGISNULL(0) ? NULL : (quartile_state *) PG_GETARG_POINTER(0);
>         if (PG_ARGISNULL(1))
>                 PG_RETURN_POINTER(state);
>
>         elem = PG_GETARG_FLOAT8(1);
>
>         state = quartile_accummulate(state, elem, aggcontext);
>
>         PG_RETURN_POINTER(state);
> }
>
> The final function for the computation of the first quartile is
>
> PG_FUNCTION_INFO_V1(first_quartile_final);
> Datum
> first_quartile_final(PG_FUNCTION_ARGS) {
>         quartile_state *state = NULL;
>         float8 result;
>
>         if (PG_ARGISNULL(0))
>                 PG_RETURN_NULL();
>
>         state = (quartile_state *) PG_GETARG_POINTER(0);
>
>         /** HERE state->nelems causes a segflt */
>         if (state->nelems<4)
>                 PG_RETURN_NULL();
>
>         result = quartile_result(state, 0.25);
>
>         PG_RETURN_FLOAT8(result);
> }