Re: Planner picks n² query plan when available - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Planner picks n² query plan when available
Date
Msg-id 600982.1732294465@sss.pgh.pa.us
Whole thread Raw
In response to Re: Planner picks n² query plan when available  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
List pgsql-hackers
Matthias van de Meent <boekewurm+postgres@gmail.com> writes:
> On Thu, 21 Nov 2024 at 13:03, Toto guyoyg <thomas.bessou@hotmail.fr> wrote:
>> It looks like this could be improved/fixed by either/all of:
>>
>> 1. Using a hashset (or sort + binary search) for recheck (past a certain array length or even always) instead of
alwayssearching linearly 

> IIUC, hashed "= ANY()" expressions were already implemented with
> commit 50e17ad2 (released in PG14) - look for
> EEOP_HASHED_SCALARARRAYOP in expression handling code.

I checked into that and verified that this test case isn't reaching
EEOP_HASHED_SCALARARRAYOP, because that commit only addressed the
scenario where the array argument of =ANY is a constant.  (A
post-const-folding constant, but still a constant.)

To apply that optimization here, where the array is an output of a
subplan, we'd need some mechanism whereby ExecEvalHashedScalarArrayOp
could find out when the array has changed underneath it.  That's
probably possible (by somehow extending the chgParam signaling logic),
but the details aren't obvious.  The planner's decision about when
to apply hashing would become much less obvious, too.

>> 2. Fixing planner assuming that all arrays are of size 10, using instead actual or estimated sizes.

> IIUC, this was also already implemented with commit 9391f715 (released in PG17).

That's not helpful here either, since the arrays are built on-the-fly
rather than being stored in table columns (where stats could be
collected on them).  You could imagine installing bespoke logic for
array_agg() that looks at the estimated rowcount for the sub-select,
perhaps.

The bottom line here is that improving these queries would take a
significant amount of work, and they just aren't very compelling
examples that seem to justify such effort.  Folding the output of
a subquery into an array is largely an anti-pattern in SQL in the
first place.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Jacob Champion
Date:
Subject: Re: Offsets of `struct Port` are no longer constant
Next
From: Robert Haas
Date:
Subject: Re: Replace current implementations in crypt() and gen_salt() to OpenSSL