Thread: Planner picks n² query plan when available

Planner picks n² query plan when available

From
Toto guyoyg
Date:
Offending O(n²) query:

```sql
SELECT id FROM indexed_table WHERE indexed_value = ANY (ARRAY[1,2,...])
```

I'm not posting this on the `pgsql-performance` mailing list because this is about fixing the issue, not working around it.
I'm not posting this on the `pgsql-bugs` mailing list because documentation states that performance issues are not bugs.
Please let me know if I'm not posting to the appropriate place.

From what I could investigate so far, it looks like:

1. When performing bitmap heap scan, if recheck is activated, that is O(n²) because it rechecks linearly in the array.
2. If using a Hash index, because hash indexes only point to entries that match the hash but the values aren't stored in the index and there may be hash collisions, so bitmap index scan on a Hash index forces the parent bitmap heap scan to perform a recheck. That recheck leads to the O(n²) of 1.
3. When a hash index is available, Postgres tends to prefer using it over Btree index, ignoring the two above properties.
4. The EXPLAIN ANALYZE output shows that the planner always supposes that arrays are of size 10, instead of using the estimated sizes of subqueries they are created from, or actual size provided as argument. This maybe causes 3.
5. If using a Btree index but work_mem is not sufficient for the heap read to be exact (as in Heap Blocks: exact=510 lossy=43738 as output of EXPLAIN (ANALYZE, BUFFERS)), this will also activate recheck, causing O(n²).

It looks like this could be improved/fixed by either/all of:

1. Using a hashset (or sort + binary search) for recheck (past a certain array length or even always) instead of always searching linearly
2. Fixing planner assuming that all arrays are of size 10, using instead actual or estimated sizes. Taking it into account for recheck cond cost estimation. (This wouldn't solve recheck perf, but at least would make the planner aware that lossy heap scans and Hash index bitmap heap scans are n times more expensive than exact BTree bitmap heap scans, and hopefully make it avoid picking them.)

Is my understanding correct?

Thanks,

PG Version: PostgreSQL 16.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.2.0, 64-bit

Reproducer:

```sql
BEGIN;
CREATE TABLE tmp_benchmark(
    id Int8 NOT NULL GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    value text NOT NULL
);
-- Insert 1M values
INSERT INTO tmp_benchmark(value) SELECT DISTINCT (floor(random() * 1000000000000)::int8)::text FROM generate_series(1,1000000);

ALTER TABLE tmp_benchmark ADD CONSTRAINT tmp_benchmark_unique_value_and_corresponding_index UNIQUE(value);

CREATE INDEX tmp_benchmark_benchmark_value_hash ON tmp_benchmark USING HASH (value);

ANALYZE tmp_benchmark;

-- Randomly sample 50k of those values
CREATE TEMPORARY TABLE lookup_test ON COMMIT DROP AS
WITH sub AS (SELECT value FROM tmp_benchmark LIMIT 10000000)
SELECT value FROM sub ORDER BY RANDOM() LIMIT 50000;

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM tmp_benchmark WHERE value = ANY (SELECT value FROM lookup_test);
/* Gives:
Nested Loop  (cost=864.00..2376.52 rows=43520 width=20) (actual time=6.890..51.473 rows=50000 loops=1)
  Buffers: shared hit=102085, local hit=320
  ->  HashAggregate  (cost=864.00..866.00 rows=200 width=32) (actual time=6.879..11.140 rows=50000 loops=1)
        Group Key: lookup_test.value
        Batches: 1  Memory Usage: 3617kB
        Buffers: local hit=320
        ->  Seq Scan on lookup_test  (cost=0.00..755.20 rows=43520 width=32) (actual time=0.003..1.702 rows=50000 loops=1)
              Buffers: local hit=320
  ->  Index Scan using tmp_benchmark_benchmark_value_hash on tmp_benchmark  (cost=0.00..7.88 rows=1 width=20) (actual time=0.001..0.001 rows=1 loops=50000)
        Index Cond: (value = lookup_test.value)
        Rows Removed by Index Recheck: 0
        Buffers: shared hit=102085
Planning:
  Buffers: shared hit=18 read=1
Planning Time: 0.196 ms
Execution Time: 52.520 ms
 */
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM tmp_benchmark WHERE value = ANY (CAST((SELECT array_agg(value) FROM lookup_test) AS text[]));
/* Gives:
Bitmap Heap Scan on tmp_benchmark  (cost=904.09..943.13 rows=10 width=20) (actual time=22.804..13931.788 rows=50000 loops=1)
  Recheck Cond: (value = ANY ($0))
  Rows Removed by Index Recheck: 15
  Heap Blocks: exact=6369
  Buffers: shared hit=68885, local hit=320
  InitPlan 1 (returns $0)
    ->  Aggregate  (cost=864.00..864.01 rows=1 width=32) (actual time=4.784..4.785 rows=1 loops=1)
          Buffers: local hit=320
          ->  Seq Scan on lookup_test  (cost=0.00..755.20 rows=43520 width=32) (actual time=0.007..2.040 rows=50000 loops=1)
                Buffers: local hit=320
  ->  Bitmap Index Scan on tmp_benchmark_benchmark_value_hash  (cost=0.00..40.08 rows=10 width=0) (actual time=22.166..22.166 rows=50015 loops=1)
        Index Cond: (value = ANY ($0))
        Buffers: shared hit=62516, local hit=320
Planning:
  Buffers: shared hit=9
Planning Time: 0.112 ms
Execution Time: 13933.270 ms
 */

-- Randomly sample 100k of those values

CREATE TEMPORARY TABLE lookup_test_2 ON COMMIT DROP AS
WITH sub AS (SELECT value FROM tmp_benchmark LIMIT 10000000)
SELECT value FROM sub ORDER BY RANDOM() LIMIT 100000;

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM tmp_benchmark WHERE value = ANY (SELECT value FROM lookup_test_2);
/* Gives:
Nested Loop  (cost=1555.20..3025.00 rows=78336 width=20) (actual time=15.661..104.241 rows=100000 loops=1)
  Buffers: shared hit=204153, local hit=576, temp read=20 written=44
  ->  HashAggregate  (cost=1555.20..1557.20 rows=200 width=32) (actual time=15.653..29.110 rows=100000 loops=1)
        Group Key: lookup_test_2.value
        Batches: 5  Memory Usage: 6977kB  Disk Usage: 232kB
        Buffers: local hit=576, temp read=20 written=44
        ->  Seq Scan on lookup_test_2  (cost=0.00..1359.36 rows=78336 width=32) (actual time=0.007..3.692 rows=100000 loops=1)
              Buffers: local hit=576
  ->  Index Scan using tmp_benchmark_benchmark_value_hash on tmp_benchmark  (cost=0.00..7.88 rows=1 width=20) (actual time=0.001..0.001 rows=1 loops=100000)
        Index Cond: (value = lookup_test_2.value)
        Rows Removed by Index Recheck: 0
        Buffers: shared hit=204153
Planning:
  Buffers: shared hit=4
Planning Time: 0.146 ms
Execution Time: 106.359 ms
 */
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM tmp_benchmark WHERE value = ANY (CAST((SELECT array_agg(value) FROM lookup_test_2) AS text[]));
/* Gives:
Bitmap Heap Scan on tmp_benchmark  (cost=1595.29..1634.33 rows=10 width=20) (actual time=45.983..55338.490 rows=100000 loops=1)
  Recheck Cond: (value = ANY ($0))
  Rows Removed by Index Recheck: 21
  Heap Blocks: exact=6370
  Buffers: shared hit=129063 read=2197, local hit=576
  InitPlan 1 (returns $0)
    ->  Aggregate  (cost=1555.20..1555.21 rows=1 width=32) (actual time=8.863..8.864 rows=1 loops=1)
          Buffers: local hit=576
          ->  Seq Scan on lookup_test_2  (cost=0.00..1359.36 rows=78336 width=32) (actual time=0.007..3.696 rows=100000 loops=1)
                Buffers: local hit=576
  ->  Bitmap Index Scan on tmp_benchmark_benchmark_value_hash  (cost=0.00..40.08 rows=10 width=0) (actual time=45.436..45.436 rows=100025 loops=1)
        Index Cond: (value = ANY ($0))
        Buffers: shared hit=124890, local hit=576
Planning Time: 0.089 ms
Execution Time: 55342.836 ms
 */

-- Note how with 2x the number of values, execution time is x4 the above. Monitoring CPU while this query runs shows 100% CPU, unlike the subselect-based variant.

DROP INDEX tmp_benchmark_benchmark_value_hash;
-- Once we remove the hash index, even with the same 100k sample and bitmap heap scan plan it doesn't do n²

-- On the same 100k sample:
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM tmp_benchmark WHERE value = ANY (CAST((SELECT array_agg(value) FROM lookup_test_2) AS text[]));
/* Gives:
Bitmap Heap Scan on tmp_benchmark  (cost=1599.54..1638.58 rows=10 width=20) (actual time=234.701..256.963 rows=100000 loops=1)
  Recheck Cond: (value = ANY ($0))
  Heap Blocks: exact=6370
  Buffers: shared hit=296148 read=10225 written=767, local hit=576
  InitPlan 1 (returns $0)
    ->  Aggregate  (cost=1555.20..1555.21 rows=1 width=32) (actual time=9.425..9.426 rows=1 loops=1)
          Buffers: local hit=576
          ->  Seq Scan on lookup_test_2  (cost=0.00..1359.36 rows=78336 width=32) (actual time=0.007..4.001 rows=100000 loops=1)
                Buffers: local hit=576
  ->  Bitmap Index Scan on tmp_benchmark_unique_value_and_corresponding_index  (cost=0.00..44.33 rows=10 width=0) (actual time=234.264..234.264 rows=100000 loops=1)
        Index Cond: (value = ANY ($0))
        Buffers: shared hit=296148 read=3855 written=763, local hit=576
Planning:
  Buffers: shared hit=8 read=3
Planning Time: 0.169 ms
Execution Time: 259.087 ms
*/

-- So despite doing a bitmap heap scan on the same 100k elements array as above, this one seemingly doesn't n².

-- With less work_mem to lead even the BTree index scan to perform a recheck:
SET LOCAL work_mem TO '64kB';
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM tmp_benchmark WHERE value = ANY (CAST((SELECT array_agg(value) FROM lookup_test_2) AS text[]));
/* Gives:
Bitmap Heap Scan on tmp_benchmark  (cost=1599.54..1638.58 rows=10 width=20) (actual time=248.154..974679.087 rows=100000 loops=1)
  Recheck Cond: (value = ANY ($0))
  Rows Removed by Index Recheck: 802166
  Heap Blocks: exact=683 lossy=5687
  Buffers: shared hit=300000 read=6370, local hit=576
  InitPlan 1 (returns $0)
    ->  Aggregate  (cost=1555.20..1555.21 rows=1 width=32) (actual time=9.274..9.276 rows=1 loops=1)
          Buffers: local hit=576
          ->  Seq Scan on lookup_test_2  (cost=0.00..1359.36 rows=78336 width=32) (actual time=0.009..3.944 rows=100000 loops=1)
                Buffers: local hit=576
  ->  Bitmap Index Scan on tmp_benchmark_unique_value_and_corresponding_index  (cost=0.00..44.33 rows=10 width=0) (actual time=234.699..234.699 rows=100000 loops=1)
        Index Cond: (value = ANY ($0))
        Buffers: shared hit=300000, local hit=576
Planning Time: 0.099 ms
Execution Time: 974687.222 ms
 */
-- => It's clear that the recheck is n² when performed

ROLLBACK;
```

Re: Planner picks n² query plan when available

From
Matthias van de Meent
Date:
On Thu, 21 Nov 2024 at 13:03, Toto guyoyg <thomas.bessou@hotmail.fr> wrote:
>
> Offending O(n²) query:

I disagree with the O(n^2) claims. The number of live matched rows in
a single table's bitmap scan may be anywhere from 0 (leading to O(1)
complexity in the rescan) to 970_662_608_670 (= 226 tuples per page *
(2^32 - 1) pages), so such a rescan's complexity - given O(n)
complexity for always choosing to use a linear scan on the array to
find matches - would better be parameterized as O(n*m), for m live
tuples in the generated bitmap, or even O(n*m + k), as the bitmap may
contain k dead tuples which can also take some measurable amount of
effort to scan.

> 4. The EXPLAIN ANALYZE output shows that the planner always supposes that arrays are of size 10, instead of using the
estimatedsizes of subqueries they are created from, or actual size provided as argument. 

Did you recheck this on a recently released version of PostgreSQL?
IIRC this was updated in PG17, and with `= ANY (ARRAY[...])`
expressions you will get the expected number of rows for that array
expression (assuming accurate statistics on your table).

> It looks like this could be improved/fixed by either/all of:
>
> 1. Using a hashset (or sort + binary search) for recheck (past a certain array length or even always) instead of
alwayssearching linearly 

IIUC, hashed "= ANY()" expressions were already implemented with
commit 50e17ad2 (released in PG14) - look for
EEOP_HASHED_SCALARARRAYOP in expression handling code.

> 2. Fixing planner assuming that all arrays are of size 10, using instead actual or estimated sizes.

IIUC, this was also already implemented with commit 9391f715 (released in PG17).

> Taking it into account for recheck cond cost estimation. (This wouldn't solve recheck perf, but at least would make
theplanner aware that lossy heap scans and Hash index bitmap heap scans are n times more expensive than exact BTree
bitmapheap scans, and hopefully make it avoid picking them.) 
>
> Is my understanding correct?

I believe your understanding may be quite out of date. Not all planner
or executor features and optimizations are explicitly present in the
output of EXPLAIN, and the examples all indicate you may be working
with an outdated view of Postgres' capabilities.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



Re: Planner picks n² query plan when available

From
Tom Lane
Date:
Matthias van de Meent <boekewurm+postgres@gmail.com> writes:
> On Thu, 21 Nov 2024 at 13:03, Toto guyoyg <thomas.bessou@hotmail.fr> wrote:
>>
>> Offending O(n²) query:

> I disagree with the O(n^2) claims.

I think these cases actually are O(n^2).  But I'm finding it hard
to care.  What we have here is a straightforward way to write a
query versus a much-less-straightforward way --- indeed, a way
that isn't even syntactically legal per the SQL standard.
The straightforward way is already well optimized, and no reason has
been given why the much-less-straightforward way should be considered
preferable.  So I'm not seeing why we should put our finite
development resources into optimizing the much-less-straightforward
way.

>> 4. The EXPLAIN ANALYZE output shows that the planner always supposes that arrays are of size 10, instead of using
theestimated sizes of subqueries they are created from, or actual size provided as argument. 

It's a little disingenuous to complain about bad estimates with
this test methodology: the test tables are never vacuumed or
analyzed.  And since they're temporary, there's no hope of
autovacuum curing that oversight.  It's not clear that having
done that would improve anything in this particular case,
but it's certainly not helping.

            regards, tom lane



RE: Planner picks n² query plan when available

From
Toto guyoyg
Date:
Thanks for your answers.

> What we have here is a straightforward way to write a query versus a much-less-straightforward way [...] So I'm not seeing why we should put our finite development resources into optimizing the much-less-straightforward way.

Ah, I should have explained this: this was meant as a pure-SQL reproducer for n² query plans with:

```sql
SELECT id FROM indexed_table WHERE indexed_value = ANY ($1)
```

where `$1` is an array bind parameter.

I recall having seen the same O(n²) issue with this, even after PG 14 (50e17ad2), so I thought the SQL version being slow would be the same effect...
However I have just attempted a reproducer for the `$1` variant (writing the corresponding application code...), and couldn't reproduce the inefficiency.
I also thought I saw that even `= ANY(ARRAY[1,2])` would lose the size to `10` so I assumed the same issue would happen with `$1` (array) but I tried to reproduce that as well and couldn't, so I must have been looking at a different planner node. 😵‍💫

Considering this seems to not affect `= ANY ($1)`, this indeed makes this issue significantly less important than I first thought. I'll let you know if I manage to reproduce this again but that seems unlikely at this point.

> It's a little disingenuous to complain about bad estimates with this test methodology: the test tables are never vacuumed or analyzed.

Apologies, I had originally put ANALYZEs for the temp tables as well but then removed them to minimize the example as they didn't change the plans or resulting estimate of `10`, and row estimates for the subselect were still of the correct order of magnitude. I shouldn't have done that, as that did spend your time wondering about this.

Just for completeness (but I wouldn't have created this thread just for this):

While this is an example of a scenario where = ANY(array) performs badly and = ANY(subquery) performs well, I have also found occurences where = ANY (ARRAY(subquery)) lets the planner push filters further down and performs an order of magnitude better as well, so if this was improved we might be able to say "just always use `= ANY(array)`", which would be simpler.

> IIUC, hashed "= ANY()" expressions were already implemented

Thanks for pointing to the particular commit.

I can see [here](https://github.com/postgres/postgres/blob/4c4aaa19a6fed39e0eb0247625331c3df34d8211/src/backend/optimizer/util/clauses.c#L2278) that it is a requirement that "The 2nd argument of the array contain only Consts"

Since the limit of elements before it chooses hash is 9, it looks like if this wasn't a constraint, the hardcoded 10 of the subquery case would be handled with reasonable efficiency.
I'm wondering what creates this constraint.

It looks like solving this may improve performance for the case where `= ANY(ARRAY(subquery))` allows the planner to push filters further down but the subquery produces many results.

(Also it looks like in that case the executor could even look at "would I know how to do this?" set by the planner, and decide whether to actually do it by checking the actual length of the array at that point rather than having decided based on estimates.)

> Did you recheck [that the planner always supposes that arrays are of size 10] on a recently released version of PostgreSQL? IIRC this was updated in PG17 (9391f715)

I have just run the reproducer script on PG 17.1 and observe the same thing: wrong row estimate, and O(n²) plan getting picked.
Writing it as `ARRAY(SELECT value ..)` produces the same results as writing it as `(SELECT array_agg(value) ..)` or `ARRAY(VALUES ..)`.

Simpler reproducer for loss of array size when building arrays from subselects (PG 17.1):

```sql
EXPLAIN ANALYZE SELECT UNNEST(ARRAY(VALUES (1), (2)));
/* Gives:
ProjectSet  (cost=0.03..0.09 rows=10 width=4) (actual time=0.009..0.011 rows=2 loops=1)
  InitPlan 1
    ->  Values Scan on ""*VALUES*""  (cost=0.00..0.03 rows=2 width=4) (actual time=0.001..0.002 rows=2 loops=1)
  ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.002 rows=1 loops=1)
Planning Time: 0.052 ms
Execution Time: 0.026 ms
*/
```

Should I open a separate "bug" thread for this particular issue given [the commit you mentioned](https://github.com/postgres/postgres/commit/9391f71523b6e57f1194d9f6543bc7948c16411b), or is that expected behavior ?

(My understanding is that this wouldn't solve the performance problem of the original query given the const array limitation but this may improve other cases.)

Thanks again for your time!

De : Tom Lane <tgl@sss.pgh.pa.us>
Envoyé : jeudi 21 novembre 2024 14:52
À : Matthias van de Meent <boekewurm+postgres@gmail.com>
Cc : Toto guyoyg <thomas.bessou@hotmail.fr>; pgsql-hackers@postgresql.org <pgsql-hackers@postgresql.org>
Objet : Re: Planner picks n² query plan when available
 
Matthias van de Meent <boekewurm+postgres@gmail.com> writes:
> On Thu, 21 Nov 2024 at 13:03, Toto guyoyg <thomas.bessou@hotmail.fr> wrote:
>>
>> Offending O(n²) query:

> I disagree with the O(n^2) claims.

I think these cases actually are O(n^2).  But I'm finding it hard
to care.  What we have here is a straightforward way to write a
query versus a much-less-straightforward way --- indeed, a way
that isn't even syntactically legal per the SQL standard.
The straightforward way is already well optimized, and no reason has
been given why the much-less-straightforward way should be considered
preferable.  So I'm not seeing why we should put our finite
development resources into optimizing the much-less-straightforward
way.

>> 4. The EXPLAIN ANALYZE output shows that the planner always supposes that arrays are of size 10, instead of using the estimated sizes of subqueries they are created from, or actual size provided as argument.

It's a little disingenuous to complain about bad estimates with
this test methodology: the test tables are never vacuumed or
analyzed.  And since they're temporary, there's no hope of
autovacuum curing that oversight.  It's not clear that having
done that would improve anything in this particular case,
but it's certainly not helping.

                        regards, tom lane

Re: Planner picks n² query plan when available

From
Tom Lane
Date:
Matthias van de Meent <boekewurm+postgres@gmail.com> writes:
> On Thu, 21 Nov 2024 at 13:03, Toto guyoyg <thomas.bessou@hotmail.fr> wrote:
>> It looks like this could be improved/fixed by either/all of:
>>
>> 1. Using a hashset (or sort + binary search) for recheck (past a certain array length or even always) instead of
alwayssearching linearly 

> IIUC, hashed "= ANY()" expressions were already implemented with
> commit 50e17ad2 (released in PG14) - look for
> EEOP_HASHED_SCALARARRAYOP in expression handling code.

I checked into that and verified that this test case isn't reaching
EEOP_HASHED_SCALARARRAYOP, because that commit only addressed the
scenario where the array argument of =ANY is a constant.  (A
post-const-folding constant, but still a constant.)

To apply that optimization here, where the array is an output of a
subplan, we'd need some mechanism whereby ExecEvalHashedScalarArrayOp
could find out when the array has changed underneath it.  That's
probably possible (by somehow extending the chgParam signaling logic),
but the details aren't obvious.  The planner's decision about when
to apply hashing would become much less obvious, too.

>> 2. Fixing planner assuming that all arrays are of size 10, using instead actual or estimated sizes.

> IIUC, this was also already implemented with commit 9391f715 (released in PG17).

That's not helpful here either, since the arrays are built on-the-fly
rather than being stored in table columns (where stats could be
collected on them).  You could imagine installing bespoke logic for
array_agg() that looks at the estimated rowcount for the sub-select,
perhaps.

The bottom line here is that improving these queries would take a
significant amount of work, and they just aren't very compelling
examples that seem to justify such effort.  Folding the output of
a subquery into an array is largely an anti-pattern in SQL in the
first place.

            regards, tom lane



Re: Planner picks n² query plan when available

From
Tom Lane
Date:
Toto guyoyg <thomas.bessou@hotmail.fr> writes:
>> What we have here is a straightforward way to write a query versus a much-less-straightforward way [...] So I'm not
seeingwhy we should put our finite development resources into optimizing the much-less-straightforward way. 

> Ah, I should have explained this: this was meant as a pure-SQL reproducer for n² query plans with:

> ```sql
> SELECT id FROM indexed_table WHERE indexed_value = ANY ($1)
> ```
> where `$1` is an array bind parameter.

Ah, well, you should have said that was what you wanted.  While the
existing EEOP_HASHED_SCALARARRAYOP logic only deals with a Const array
value, it seems to me that we could trivially let it use external
Params too.  The case you presented would require being able to cope
with intra-query changes of the array value, but a Param coming from
outside the query doesn't entail that.

> However I have just attempted a reproducer for the `$1` variant (writing the corresponding application code...), and
couldn'treproduce the inefficiency. 

Depending on what you tested, you might have only seen the behavior
with a "custom plan" where the Param is actually replaced with a
Const.  It would go bad again if the plan changed to generic.
I see that cost_qual_eval_walker does charge differently for
hashed than un-hashed ScalarArrayOp, so getting the planner to
incorrectly opt for a generic plan might require a bad estimate
of the array size, but I'm sure that's still possible.

> I also thought I saw that even `= ANY(ARRAY[1,2])` would lose the size to `10` so I assumed the same issue would
happenwith `$1` (array) but I tried to reproduce that as well and couldn't, so I must have been looking at a different
plannernode. 

Or old code ... as Matthias mentioned, we improved that not so long
ago.

            regards, tom lane