Re: Planning performance problem (67626.278ms) - Mailing list pgsql-performance

From David Rowley
Subject Re: Planning performance problem (67626.278ms)
Date
Msg-id CAApHDvohGOZSAeFvx1mnkDTV3EzdcurAWdYJSqp29xu58FPh_w@mail.gmail.com
Whole thread Raw
In response to Re: Planning performance problem (67626.278ms)  (Manuel Weitzman <manuelweitzman@gmail.com>)
List pgsql-performance
On Thu, 1 Jul 2021 at 08:56, Manuel Weitzman <manuelweitzman@gmail.com> wrote:
> For each of these RestrictInfos there *could* be one cache miss on
> cached_scansel() that *could* force the planner to compute
> get_actual_variable_range() for the same variable (a.a) over and over,
> as mergejoinscansel() always computes the selectivity for the
> intervals that require actual extremal values. In practice this
> re-computing of the variable range seems to happen a lot.

Recently, for some other topic, I was thinking about if we were ever
to have the executor give up on a plan because something did not play
out the way the planner expected it to, that if the executor to ever
be given that ability to throw the plan back at the planner with some
hints about where it went wrong, then I wondered what exactly these
hints would look like.

I guessed these would be a list of some rough guidelines saying that x
JOIN y ON x.a=y.b produces at least Z rows. Then the planner would
find that when estimating the size of the join between x and y and
take the maximum of two values.  That would need to be designed in
such a way that the planner could quickly consult the feedback, e.g
hashtable lookup on Oid.

Anyway, I don't really have any clearly thought-through plans for that
idea as it would be a large project that would need a lot of changes
before it could be even thought about seriously. However, it did cause
me to think about that again when reading this thread as it seems
similar. You've learned the actual variable range through actual
execution, so it does not seem too unreasonable that information might
get stored in a similar place, if that place were to exist.

I'm not saying that's what needs to happen here. It's more just food
for thought.  The caching you need does seem more specific to Oid than
relid.

David



pgsql-performance by date:

Previous
From: Manuel Weitzman
Date:
Subject: Re: Planning performance problem (67626.278ms)
Next
From: Tomas Vondra
Date:
Subject: Re: slow performance with cursor