Thread: Workaround for custom aggregate which would need "internal" as statetype
Workaround for custom aggregate which would need "internal" as statetype
From
"Florian G. Pflug"
Date:
Hi I'm trying to write an aggrecate collect_distinct(int8) which puts all distinct values into an array. My first try was defining an aggregate "collect" using array_append, and doing "select collect(distinct <field>) ..", but this is quite slow - probably because distinct sorts the values, instead of using a hash to filter out duplicates. Using perl, and a perl-hash was even slower, so I wrote my to c-functions (actualy c++), which use a STL hash_set to filter out duplicates. I initially defined my state-transaction function as "collect_distinct(internal, int8) returns internal". The parameter marked internal was a pointer to a STL hash_set. But using this to define an aggregate failed, because internal seems to be forbidden as a state-type. I now resorted to a crude hack to get this running - I changed "internal" to "int4", and just cast my pointer to a int4 before returning it, and after receiving it as an argument. This at least enabled me to do some benchmarking, and performance-wise things look good... Before using this on a production system, I need to get rid of that hack, but I don't see how this could be done ATM... Maybe someone here could give me a hint how this could work... greetings, Florian Pflug
"Florian G. Pflug" <fgp@phlo.org> writes: > Using perl, and a perl-hash was even slower, so I wrote my to c-functions > (actualy c++), which use a STL hash_set to filter out duplicates. This makes me fairly nervous, because what's going to ensure that the memory used by the hash_set is reclaimed? Particularly if the query errors out partway through? regards, tom lane
Tom Lane wrote: > "Florian G. Pflug" <fgp@phlo.org> writes: > >>Using perl, and a perl-hash was even slower, so I wrote my to c-functions >>(actualy c++), which use a STL hash_set to filter out duplicates. > > This makes me fairly nervous, because what's going to ensure that the > memory used by the hash_set is reclaimed? Particularly if the query > errors out partway through? hash_set can be told to use a user-defined allocator class, which in turn can use palloc/pfree, with an appropriate memory context. I'm not really sure what the "appropriate context" is, as using CurrentMemoryContext leads to strange crashes. For now, i'm using the standard c++ allocator, because I figured it should make debugging easier. Still, the question remains how I can sanely use a c++ object as "state" of a aggregate... greetings, Florian Pflug
"Florian G. Pflug" <fgp@phlo.org> writes: > hash_set can be told to use a user-defined allocator class, which in turn > can use palloc/pfree, with an appropriate memory context. I'm not > really sure what the "appropriate context" is, as using CurrentMemoryContext > leads to strange crashes. For now, i'm using the standard c++ allocator, > because I figured it should make debugging easier. Yeah, the assumption is that anything allocated in CurrentMemoryContext other than the actual return value is just memory leakage, and it'll automatically get thrown away. You could probably use aggstate->aggcontext, which is accessible to aggregate functions since PG 8.1 (see the comments at the head of nodeAgg.c). regards, tom lane