Thread: intarray planning/exeuction problem with multiple operators
I've found an interesting performance problem in the intarray extension module. It doesn't seem to be version dependent, I've verified it in 9.4.4 and 9.6devel.
If I do a query that involves both an = op and a @@ op ANDed together, it gives a high cost estimate, which is justified by the slow runtime. If I omit the @@ it gives a low estimate, also justified. If I trick it into thinking it cannot use the index to satisfy the @@ op, then it gives a low estimate and low runtime, applying the @@ in the filter step and only the fast = in the bitmap index scan.
Now it could use the index for the fast = operation and apply the @@ in the recheck, rather than the filter. But I guess it doesn't think of that, despite knowing that applying the @@ in index operation is slow.
So it seems to be missing a trick someplace, but I don't if it reasonable to expect it to find that trick, or how easy/hard it would be to implement.
The proposed selectivity estimate improvement patch for @@ does not fix this (and evaluating that patch was how I found this issue.)
Set up:
create table foobar as
select (
select
array_agg(floor(sqrt(random()*10000000+g.y/1000000+f.x/10000000))::int)
from generate_series(1,10) g(y)
) foo
from generate_series(1,10000000) f(x);
create index on foobar using gin(foo gin__int_ops);
analyze;
You can knock an order of magnitude off from the table size and should still be able to see the problem.
example:
explain(analyze, buffers) select * from foobar where foo = '{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}' and foo @@ '!1'::query_int;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Seq Scan on foobar (cost=0.00..263637.00 rows=1 width=61) (actual time=9717.957..9717.957 rows=0 loops=1)
Filter: ((foo @@ '!1'::query_int) AND (foo = '{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}'::integer[]))
Rows Removed by Filter: 10000000
Buffers: shared hit=64 read=113573
I/O Timings: read=361.402
Planning time: 0.101 ms
Execution time: 9717.991 ms
(7 rows)
explain(analyze, buffers) select * from foobar where foo = '{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}' ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on foobar (cost=112.01..116.02 rows=1 width=61) (actual time=0.027..0.027 rows=0 loops=1)
Recheck Cond: (foo = '{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}'::integer[])
Buffers: shared hit=21
-> Bitmap Index Scan on foobar_foo_idx (cost=0.00..112.01 rows=1 width=0) (actual time=0.023..0.023 rows=0 loops=1)
Index Cond: (foo = '{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}'::integer[])
Buffers: shared hit=21
Planning time: 0.097 ms
Execution time: 0.061 ms
If I trick it into thinking the @@ operator cannot be used in th eindex, then it gets really fast again:
explain(analyze, buffers) select * from foobar where foo = '{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}' and foo||'{}' @@ '!1'::query_int;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on foobar (cost=112.01..116.03 rows=1 width=61) (actual time=0.082..0.082 rows=0 loops=1)
Recheck Cond: (foo = '{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}'::integer[])
Filter: ((foo || '{}'::integer[]) @@ '!1'::query_int)
Buffers: shared hit=21
-> Bitmap Index Scan on foobar_foo_idx (cost=0.00..112.01 rows=1 width=0) (actual time=0.080..0.080 rows=0 loops=1)
Index Cond: (foo = '{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}'::integer[])
Buffers: shared hit=21
Planning time: 0.139 ms
Execution time: 0.129 ms
Cheers,
Jeff
On Sunday 12 July 2015 23:12:41 Jeff Janes wrote: > I've found an interesting performance problem in the intarray extension > module. It doesn't seem to be version dependent, I've verified it in 9.4.4 > and 9.6devel. Hello. Look at my patch. Maybe he solves this problem. https://commitfest.postgresql.org/5/253/ -- Uriy Zhuravlev Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Jeff Janes <jeff.janes@gmail.com> writes: > I've found an interesting performance problem in the intarray extension > module. It doesn't seem to be version dependent, I've verified it in 9.4.4 > and 9.6devel. Seems like this isn't specifically an issue for intarray, but rather one with the core GIN code not being smart about the combination of selective and unselective index conditions. In particular, it seems like the smart thing for GIN to do with this example is just ignore the @@ condition altogether so far as the index search goes, and mark all the results as needing recheck so that the @@ operator gets applied as a filter. You could also complain about the core planner not considering leaving some potentially-indexable quals out of the actual index condition, but TBH I don't see that that would be a useful use of planner cycles. In almost every case it would mean that if there are K quals potentially usable with a given index, we'd cost out 2^K-1 index paths and immediately reject all but the use-em-all alternative. That's a lot of cycles to spend to handle a corner case. It's always been assumed to be the index AM's problem to optimize use of the index quals it's handed, and I don't see why that shouldn't be true here. > The proposed selectivity estimate improvement patch for @@ does not fix > this (and evaluating that patch was how I found this issue.) Nah, it wouldn't: the cost estimates are correct, in the sense that gincostestimate realizes that GIN will be horribly stupid about this. regards, tom lane