Re: Avoiding hash join batch explosions with extreme skew and weird stats - Mailing list pgsql-hackers

From Jesse Zhang
Subject Re: Avoiding hash join batch explosions with extreme skew and weird stats
Date
Msg-id CAGf+fX7Ayre-nH1JynJd3jnjEkA=bBjbfPXLKPfrw4Ethw5Cpw@mail.gmail.com
Whole thread Raw
In response to Re: Avoiding hash join batch explosions with extreme skew and weirdstats  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
Hi Tomas,

On Tue, Jun 23, 2020 at 3:24 PM Tomas Vondra wrote:
>
> Now, a couple comments / questions about the code.
>
>
> nodeHash.c
> ----------
>
>
> 1) MultiExecPrivateHash says this
>
>    /*
>     * Not subject to skew optimization, so either insert normally
>     * or save to batch file if it belongs to another stripe
>     */
>
> I wonder what it means to "belong to another stripe". I understand what
> that means for batches, which are identified by batchno computed from
> the hash value. But I thought "stripes" are just work_mem-sized pieces
> of a batch, so I don't quite understand this. Especially when the code
> does not actually check "which stripe" the row belongs to.

I have to concur that "stripe" did inspire a RAID vibe when I heard it,
but it seemed to be a better name than what it replaces

> 3) I'm a bit puzzled about this formula in ExecHashIncreaseNumBatches
>
>    childbatch = (1U << (my_log2(hashtable->nbatch) - 1)) | hashtable->curbatch;
>
> and also about this comment
>
>    /*
>     * TODO: what to do about tuples that don't go to the child
>     * batch or stay in the current batch? (this is why we are
>     * counting tuples to child and curbatch with two diff
>     * variables in case the tuples go to a batch that isn't the
>     * child)
>     */
>    if (batchno == childbatch)
>      childbatch_outgoing_tuples++;
>
> I thought each old batch is split into two new ones, and the tuples
> either stay in the current one, or are moved to the new one - which I
> presume is the childbatch, although I haven't tried to decode that
> formula. So where else could the tuple go, as the comment tried to
> suggest?

True, every old batch is split into two new ones, if you only consider
tuples coming from the batch file that _still belong in there_. i.e.
there are tuples in the old batch file that belong to a future batch. As
an example, if the current nbatch = 8, and we want to expand to nbatch =
16, (old) batch 1 will split into (new) batch 1 and batch 9, but it can
already contain tuples that need to go into (current) batches 3, 5, and
7 (soon-to-be batches 11, 13, and 15).

Cheers,
Jesse



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: extensible options syntax for replication parser?
Next
From: Robert Haas
Date:
Subject: Re: [PATCH] COPY command's data format option allows only lowercasecsv, text or binary