Have you benchmarked this change except in first message in this thread?
While reviewing the patch more closely, I noticed
that compute_distinct_stats() is only used for types where we have =, !=
but not <. In practice, most common scalar types go through
compute_scalar_stats() instead.
That makes me wonder how often this optimization would actually trigger
in real workloads. Since compute_scalar_stats() is the more common path,
there's chance that the hash-table based improvement in
compute_distinct_stats() may not provide a noticeable overall benefit.
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/