On Mon, Nov 11, 2019 at 10:38 AM James Coleman <jtc331@gmail.com> wrote:
> But now we're at the end of my understanding of how hash tables and
> joins are implemented in PG; is there a wiki page or design that might
> give me some current design description of how the buckets and batches
> work with the hash so I can keep following along?
We have something like the so-called "Grace" hash join (with the
"hybrid" refinement, irrelevant for this discussion):
https://en.wikipedia.org/wiki/Hash_join#Grace_hash_join
Our word "batch" just means partition. Most descriptions talk about
using two different hash functions for partition and bucket, but our
implementation uses a single hash function, and takes some of the bits
to choose the bucket and some of the bits to choose the batch. That
happens here:
https://github.com/postgres/postgres/blob/master/src/backend/executor/nodeHash.c#L1872