Re: A performance regression issue with Memoize - Mailing list pgsql-hackers

From Richard Guo
Subject Re: A performance regression issue with Memoize
Date
Msg-id CAMbWs49vw1abBV3twM5+11mzT39K1eQZE84qAJGnWqE5gF7KhQ@mail.gmail.com
Whole thread Raw
In response to Re: A performance regression issue with Memoize  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: A performance regression issue with Memoize
List pgsql-hackers
On Tue, Jul 29, 2025 at 6:30 AM David Rowley <dgrowleyml@gmail.com> wrote:
> On the whole, I don't really see this as a flaw in the Memoize code.
> We've plenty of other cases in the planner that produce inferior plans
> due to lack of enough detail or accuracy of table statistics, so I'm
> not planning on rushing to look into a fix. I will keep it in mind,
> however.

Yeah, I agree that this issue isn't limited to the Memoize node; it's
a more general problem.  The optimizer can sometimes choose plans that
are suboptimal by orders of magnitude due to inaccurate statistics.

One way to improve this is by improving the accuracy of statistics,
but that can be very difficult or even impossible in some cases,
especially when dealing with attribute correlations or highly skewed
data distributions.

Another possible direction is to support runtime plan correction or
feedback loops.  We've always followed a "plan-first, execute-next"
approach so far.  But perhaps we could extend that by monitoring plan
execution and triggering re-optimization when the executor detects
that actual result sizes or other runtime statistics differ
significantly from the estimates.  In recent years, there have been
more and more papers and research on adaptive query processing.  It
might be worth considering how PostgreSQL could support such
techniques in the future.

Thanks
Richard



pgsql-hackers by date:

Previous
From: Alexander Lakhin
Date:
Subject: Re: Improve error reporting in 027_stream_regress test
Next
From: David Rowley
Date:
Subject: Re: A performance regression issue with Memoize