Home > mailing lists

Re: A performance regression issue with Memoize - Mailing list pgsql-hackers

From	Andrei Lepikhov
Subject	Re: A performance regression issue with Memoize
Date	July 29, 2025 11:52:51
Msg-id	a627301a-a5dc-4836-a8ea-69598dd55a1f@gmail.com Whole thread Raw
In response to	Re: A performance regression issue with Memoize (David Rowley <dgrowleyml@gmail.com>)
List	pgsql-hackers

Tree view

On 29/7/2025 06:30, David Rowley wrote:
> On Tue, 29 Jul 2025 at 16:01, Richard Guo <guofenglinux@gmail.com> wrote:
>> Another possible direction is to support runtime plan correction or
>> feedback loops.  We've always followed a "plan-first, execute-next"
>> approach so far.  But perhaps we could extend that by monitoring plan
>> execution and triggering re-optimization when the executor detects
>> that actual result sizes or other runtime statistics differ
>> significantly from the estimates.  In recent years, there have been
>> more and more papers and research on adaptive query processing.  It
>> might be worth considering how PostgreSQL could support such
>> techniques in the future.
> 
> I've recently noticed that some databases are doing things like this.
> [1] is an example. For the record, I've absolutely no insight into
> what's going on there aside from what's mentioned in the public
> documentation. In any case, I don't think that idea is new on us as
> there's been discussion before about having some sort of hybrid hash
> join / nested loop before in regards to trying to fix the issue with
> extra large batches during hash joins.
We are constantly struggling with multi-level join trees, where 
estimations usually quickly end up with a 'rows=1' estimation and switch 
to Nested Loop on the upper levels, causing troubles during execution. 
Hence, we have attempted to play with the HybridJoin idea for a couple 
of years.
Designing a new execution node seems painful and hard to support. So, we 
have chosen an alternative way of preparing two separate subquery trees 
for the suspicious (an empirics inevitably involved) join and passing 
them to the executor. Implementation is simple - it is stored in 
CustomScan subpaths/subplans. It appears to be an Append, but our Custom 
node has the flexibility to choose any of the subtrees and switch to an 
alternative if no tuples are given to the upper level. You may find some 
demonstration of the approach here [1].
Right now it works with NL -> HJ switching, but it is not hard to add 
MergeJoin as well.
I think similar approach could be discussed for use in the core.

> 
> If we were to adopt something similar, I believe we'd need to have
> some sort of estimate on the certainty of the statistics we're working
> with, otherwise, how would you know when and when not to use the
> adaptive method?  There's also the PathKey issue when switching
> algorithms. For example, nested loops preserve the outer path's order,
> but multi-batch hash joins do not. That may not be an issue when
> switching a method that's more strict in terms of row output order,
> but it could be when going the other way. That means you don't get the
> full flexibility to adapt the plan as you'd get from having the
> planner choose the new plan in the first place.
Yep, PathKey issue really exists and we just build Sort->HJ alternative 
path if costs say that sorted NL dominates.
> 
> For the record, I 100% agree that there will always be cases where
> statistics are just unable to represent what is discovered at
> run-time, so having some sort of ability to adapt at run-time seems
> like a natural progression on the evolutionary chain. I just don't
> know if it's the best or best next step to make. I suspect we might be
> skipping a few steps from what we have now if we went there in the
> near future. We don't yet have extended statistics for joins yet, for
> example.
We ended up with a solution when we calculated a 'turning number of 
tuples' in the outer at the optimisation stage. If the outer provides 
even one more tuple, we immediately switch to the more conservative 
version of the plan at runtime.

[1] 

https://www.pgevents.ca/events/pgconfdev2025/schedule/session/346-switching-between-query-plans-in-real-time-switch-join/

-- 
regards, Andrei Lepikhov

pgsql-hackers by date:

From: Daniel Gustafsson
Date: 29 July 2025, 11:08:56
Subject: Re: delimiter inconsistency in generate-wait_event_types.pl

From: Stepan Neretin
Date: 29 July 2025, 11:56:04
Subject: Making WAL archiving faster — multi-file support and async ideas

Re: A performance regression issue with Memoize - Mailing list pgsql-hackers

Previous

Next