Thread: Memory error in user-defined aggregation function
Hi, I've implemented an aggregation function to compute quartiles in C borrowing liberally from orafce code. I uses this code in a windowing context and it worked fine until today - and I'm not sure what changed. This is on 9.1.2 and I have also tried it on 9.1.4. What I have determined so far (by sprinkling a lot of elog's throughout the code) is that it does not seem to be data specific, although it seems to depend on the number of aggregations I do (up to about 1250 seems to be fine, beyond that it chokes). I also established that there does not seem to be a problem with the transfer function, and the data is accumulated without any issues. The error I see is in the call to first_quartile_final (listed below). The pointer to the transfer data structure is not null, but accessing the field mstate->nelems causes a segflt. So the transfer data structure pointer is bogus. I've recompiled postgres with debugging enabled and have connected to the backend with gdb, but haven't had any joy in persuading gdb to actually stop in the correct file so that I can step through. I'll keep on trying to make some headway with that. In the meantime I would appreciate any comments as to whether the approach taken is the right one, and whether additional checks can be inserted to avoid this segmentation faults. Many thanks, Adriaan My transfer data structure is typedef struct { int len; /* allocated length */ int nextlen; /* next allocated length */ int nelems; /* number of valid entries */ float8 *values; } quartile_state; On the first call to the aggregate function this data structure is allocated as follows: static quartile_state * quartile_accummulate(quartile_state *mstate, float8 value, MemoryContext aggcontext) { MemoryContext oldcontext; if (mstate == NULL) { /* First call - initialize */ oldcontext = MemoryContextSwitchTo(aggcontext); mstate = palloc(sizeof(quartile_state)); mstate->len = 512; mstate->nextlen = 2 * 512; mstate->nelems = 0; mstate->values = palloc(mstate->len * sizeof(float8)); MemoryContextSwitchTo(oldcontext); } else { if (mstate->nelems >= mstate->len) { int newlen = mstate->nextlen; oldcontext = MemoryContextSwitchTo(aggcontext); mstate->nextlen += mstate->len; mstate->len = newlen; mstate->values = repalloc(mstate->values, mstate->len * sizeof(float8)); MemoryContextSwitchTo(oldcontext); } } mstate->values[mstate->nelems++] = value; return mstate; } And the transfer function itself is PG_FUNCTION_INFO_V1(quartile_transfer); Datum quartile_transfer(PG_FUNCTION_ARGS) { MemoryContext aggcontext; quartile_state *state = NULL; float8 elem; if (!AggCheckCallContext(fcinfo, &aggcontext)) { elog(ERROR, "quartile_transform called in non-aggregate context"); } state = PG_ARGISNULL(0) ? NULL : (quartile_state *) PG_GETARG_POINTER(0); if (PG_ARGISNULL(1)) PG_RETURN_POINTER(state); elem = PG_GETARG_FLOAT8(1); state = quartile_accummulate(state, elem, aggcontext); PG_RETURN_POINTER(state); } The final function for the computation of the first quartile is PG_FUNCTION_INFO_V1(first_quartile_final); Datum first_quartile_final(PG_FUNCTION_ARGS) { quartile_state *state = NULL; float8 result; if (PG_ARGISNULL(0)) PG_RETURN_NULL(); state = (quartile_state *) PG_GETARG_POINTER(0); /** HERE state->nelems causes a segflt */ if (state->nelems<4) PG_RETURN_NULL(); result = quartile_result(state, 0.25); PG_RETURN_FLOAT8(result); }
Adriaan Joubert <adriaan.joubert@gmail.com> writes: > I've implemented an aggregation function to compute quartiles in C > borrowing liberally from orafce code. I uses this code in a windowing > context and it worked fine until today - and I'm not sure what > changed. This is on 9.1.2 and I have also tried it on 9.1.4. Hm, it doesn't look very different from what's done in e.g. array_agg. You do have the transition datatype declared as "internal", no? I notice that your transition function is sloppy about returning a null pointer (as opposed to a SQL null) if both input arguments are null. If all the aggregated values were nulls then the null pointer would reach the final function and cause a crash similar to the one described. But that's hardly "not data specific". > I've recompiled postgres with debugging enabled and have connected to > the backend with gdb, but haven't had any joy in persuading gdb to > actually stop in the correct file so that I can step through. I'll > keep on trying to make some headway with that. I've found that gdb takes extra persuasion to notice shared libraries that are loaded after it attaches to the process. Usually the path of least resistance is to ensure that the library is loaded before you attach. Use LOAD, or just CREATE OR REPLACE one of the functions in your library. regards, tom lane
Hi, Finally got this running under the debugger and figured out what is going on. I had been under the impression that if (PG_ARGISNULL(0)) PG_RETURN_NULL(); state = (quartile_state *) PG_GETARG_POINTER(0); would ensure that state was never a null pointer. However this is not the case, and an additional check for state==0x0 solved the problem. Somewhat unexpected, I have to say. I would still be interested in any ways in which this implementation could be improved. It would be good if there were some model implementations for this type of thing - without orafce to guide me I would have had a hard time figuring any of this out from the docs. I'd gladly make the quartile implementation available for this purpose if there is interest. Adriaan On 7 August 2012 15:04, Adriaan Joubert <adriaan.joubert@gmail.com> wrote: > Hi, > > I've implemented an aggregation function to compute quartiles in C > borrowing liberally from orafce code. I uses this code in a windowing > context and it worked fine until today - and I'm not sure what > changed. This is on 9.1.2 and I have also tried it on 9.1.4. > > What I have determined so far (by sprinkling a lot of elog's > throughout the code) is that it does not seem to be data specific, > although it seems to depend on the number of aggregations I do (up to > about 1250 seems to be fine, beyond that it chokes). I also > established that there does not seem to be a problem with the transfer > function, and the data is accumulated without any issues. The error I > see is in the call to first_quartile_final (listed below). The pointer > to the transfer data structure is not null, but accessing the field > mstate->nelems causes a segflt. So the transfer data structure pointer > is bogus. > > I've recompiled postgres with debugging enabled and have connected to > the backend with gdb, but haven't had any joy in persuading gdb to > actually stop in the correct file so that I can step through. I'll > keep on trying to make some headway with that. > > In the meantime I would appreciate any comments as to whether the > approach taken is the right one, and whether additional checks can be > inserted to avoid this segmentation faults. > > Many thanks, > > Adriaan > > > My transfer data structure is > > typedef struct > { > int len; /* allocated length */ > int nextlen; /* next allocated length */ > int nelems; /* number of valid entries */ > float8 *values; > } quartile_state; > > On the first call to the aggregate function this data structure is > allocated as follows: > > static quartile_state * > quartile_accummulate(quartile_state *mstate, float8 value, > MemoryContext aggcontext) > { > MemoryContext oldcontext; > > if (mstate == NULL) > { > /* First call - initialize */ > oldcontext = MemoryContextSwitchTo(aggcontext); > mstate = palloc(sizeof(quartile_state)); > mstate->len = 512; > mstate->nextlen = 2 * 512; > mstate->nelems = 0; > mstate->values = palloc(mstate->len * sizeof(float8)); > MemoryContextSwitchTo(oldcontext); > } > else > { > if (mstate->nelems >= mstate->len) > { > int newlen = mstate->nextlen; > > oldcontext = MemoryContextSwitchTo(aggcontext); > mstate->nextlen += mstate->len; > mstate->len = newlen; > mstate->values = repalloc(mstate->values, mstate->len * sizeof(float8)); > MemoryContextSwitchTo(oldcontext); > } > } > > mstate->values[mstate->nelems++] = value; > > return mstate; > } > > > And the transfer function itself is > > PG_FUNCTION_INFO_V1(quartile_transfer); > Datum > quartile_transfer(PG_FUNCTION_ARGS) { > MemoryContext aggcontext; > quartile_state *state = NULL; > float8 elem; > > if (!AggCheckCallContext(fcinfo, &aggcontext)) > { > elog(ERROR, "quartile_transform called in non-aggregate context"); > } > > state = PG_ARGISNULL(0) ? NULL : (quartile_state *) PG_GETARG_POINTER(0); > if (PG_ARGISNULL(1)) > PG_RETURN_POINTER(state); > > elem = PG_GETARG_FLOAT8(1); > > state = quartile_accummulate(state, elem, aggcontext); > > PG_RETURN_POINTER(state); > } > > The final function for the computation of the first quartile is > > PG_FUNCTION_INFO_V1(first_quartile_final); > Datum > first_quartile_final(PG_FUNCTION_ARGS) { > quartile_state *state = NULL; > float8 result; > > if (PG_ARGISNULL(0)) > PG_RETURN_NULL(); > > state = (quartile_state *) PG_GETARG_POINTER(0); > > /** HERE state->nelems causes a segflt */ > if (state->nelems<4) > PG_RETURN_NULL(); > > result = quartile_result(state, 0.25); > > PG_RETURN_FLOAT8(result); > }