On Wed, Dec 11, 2019 at 7:36 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> On Tue, Dec 10, 2019 at 02:35:16PM -0300, Alvaro Herrera wrote:
> >On 2019-Dec-10, Tomas Vondra wrote:
> >> It's probably still a good idea to do this, although I wonder if we
> >> might start doing this only when actually running out of bits (and use
> >> the current algorithm by default). But I have no idea if that would be
> >> any cheaper, and it would add complexity.
Yeah, I understand your desire to minimise the change here. One thing
we could do to get exactly the same algorithm as today right up until
we run out of bits and then begin stealing bucket bits would be to
rotate instead of shifting:
- *batchno = (hashvalue >> hashtable->log2_nbuckets) &
(nbatch - 1);
+ *batchno = rotate(hashvalue, hashtable->log2_nbuckets)
& (nbatch - 1);
Attached. Thoughts? At a glance, the definition of rotate() that I
used seems to be recognised by my compiler, so that the generated code
differs only by a SHR instruction changing to a ROR instruction on my
system.
The bit-reversing thing has once nice advantage: you are allowed to
increase nbuckets later. But then I thought of another way to achieve
that: take batchno from the low end, and bucketno from the high end.
I'm not persuing that, based on the suspicion that what we all want
here is the minimal possible change.
> >I'm under the impression that this patch is proposed for backpatching,
> >and that in master we can simply increase the hash width.
>
> Possibly, although AFAIK we don't have that patch yet, and I don't
> recall any measurements (so it might have overhead too, not sure). But
> it's true additional complexity might complicate backpatching.
What I'd like to experiment with in the new year is a code
specialisation scheme that further developers what I did with
ExecHashJoinImpl(). This would increase the executable size by
including a bunch of different variants of the hash join code, but
avoid adding new branches (and also remove some existing branches) in
the hot paths. We'd still write one "template" version of the
algorithm that contains if statements for the points of variation, and
then we'd call it from a load of wrapper functions with constant
arguments and forced inlining. For example, there might be a variant
for { non-parallel, 64 bit hashes, no skew optimisation }. A more
extreme version of the idea inlines the actual hashing functions for a
handful of common cases like test and integer keys, but that gets into
combination explosion territory among other problems. I have some
early prototype code along these lines but it's not good enough yet;
more soon. I should probably think about Andres's proposal to merge
Hash and Hash Join nodes first.