Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Jan 29, 2018 at 3:32 AM, Antonin Houska <ah@cybertec.at> wrote:
> > I think of a variant of this: implement an universal function that tests the
> > binary values for equality (besides the actual arguments, caller would have to
> > pass info like typlen, typalign, typbyval for each argument, and cache these
> > for repeated calls) and set pg_proc(oprcode) to 0 wherever this function is
> > sufficient. Thus the problematic cases like numeric, citext, etc. would be
> > detected by their non-zero oprcode.
>
> I don't think that's a good option, because we don't want int4eq to
> have to go through a code path that has branches to support varlenas
> and cstrings. Andres is busy trying to speed up expression evaluation
> by removing unnecessary code branches; adding new ones would be
> undoing that valuable work.
I spent some more time thinking about this. What about adding a new strategy
number for hash index operator classes, e.g. HTBinaryEqualStrategyNumber? For
most types both HTEqualStrategyNumber and HTBinaryEqualStrategyNumber strategy
would point to the same operator, but types like numeric would naturally have
them different.
Thus the pushed-down partial aggregation can only use the
HTBinaryEqualStrategyNumber's operator to compare grouping expressions. In the
initial version (until we have useful statistics for the binary values) we can
avoid the aggregation push-down if the grouping expression output type has the
two strategies implemented using different functions because, as you noted
upthread, grouping based on binary equality can result in excessive number of
groups.
One open question is whether the binary equality operator needs a separate
operator class or not. If an opclass cares only about the binary equality, its
hash function(s) can be a lot simpler.
--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com