Home > mailing lists

Re: Hybrid Hash/Nested Loop joins and caching results from subplans - Mailing list pgsql-hackers

From	David Rowley
Subject	Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date	August 20, 2020 02:56:20
Msg-id	CAApHDvpd9bdsiH5CZSiEANUoHshOEkLJ92npbWKG7sT0CLSKCw@mail.gmail.com Whole thread Raw
In response to	Re: Hybrid Hash/Nested Loop joins and caching results from subplans (Alvaro Herrera <alvherre@2ndquadrant.com>)
List	pgsql-hackers

Tree view

On Thu, 20 Aug 2020 at 10:58, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> On the performance aspect, I wonder what the overhead is, particularly
> considering Tom's point of making these nodes more expensive for cases
> with no caching.

It's likely small. I've not written any code but only thought about it
and I think it would be something like if (node->tuplecache != NULL).
I imagine that in simple cases the branch predictor would likely
realise the likely prediction fairly quickly and predict with 100%
accuracy, once learned. But it's perhaps possible that some other
branch shares the same slot in the branch predictor and causes some
conflicting predictions. The size of the branch predictor cache is
limited, of course.  Certainly introducing new branches that
mispredict and cause a pipeline stall during execution would be a very
bad thing for performance.  I'm unsure what would happen if there's
say, 2 Nested loops, one with caching = on and one with caching = off
where the number of tuples between the two is highly variable.  I'm
not sure a branch predictor would handle that well given that the two
branches will be at the same address but have different predictions.
However, if the predictor was to hash in the stack pointer too, then
that might not be a problem. Perhaps someone with a better
understanding of modern branch predictors can share their insight
there.

> And also, as the JIT saga continues, aren't we going
> to get plan trees recompiled too, at which point it won't matter much?

I was thinking batch execution would be our solution to the node
overhead problem.  We'll get there one day... we just need to finish
with the infinite other optimisations there are to do first.

David

pgsql-hackers by date:

From: David Rowley
Date: 20 August 2020, 02:33:34
Subject: Re: [PG13] Planning (time + buffers) data structure in explain plan (format text)

From: "osumi.takamichi@fujitsu.com"
Date: 20 August 2020, 03:18:52
Subject: RE: Implement UNLOGGED clause for COPY FROM

Re: Hybrid Hash/Nested Loop joins and caching results from subplans - Mailing list pgsql-hackers

Previous

Next