Hi
I'm trying to write an aggrecate collect_distinct(int8) which puts all
distinct values into an array. My first try was defining an aggregate
"collect" using array_append, and doing "select collect(distinct <field>) ..",
but this is quite slow - probably because distinct sorts the values, instead
of using a hash to filter out duplicates.
Using perl, and a perl-hash was even slower, so I wrote my to c-functions
(actualy c++), which use a STL hash_set to filter out duplicates.
I initially defined my state-transaction function as
"collect_distinct(internal, int8) returns internal". The parameter marked internal was
a pointer to a STL hash_set. But using this to define an aggregate failed, because
internal seems to be forbidden as a state-type.
I now resorted to a crude hack to get this running - I changed "internal" to
"int4", and just cast my pointer to a int4 before returning it, and after receiving
it as an argument. This at least enabled me to do some benchmarking, and performance-wise
things look good...
Before using this on a production system, I need to get rid of that hack, but I don't
see how this could be done ATM... Maybe someone here could give me a hint how this could
work...
greetings, Florian Pflug