Thread: intarray planning/exeuction problem with multiple operators

intarray planning/exeuction problem with multiple operators

From
Jeff Janes
Date:
I've found an interesting performance problem in the intarray extension module.  It doesn't seem to be version dependent, I've verified it in 9.4.4 and 9.6devel.

If I do a query that involves both an = op and a @@ op ANDed together, it gives a high cost estimate, which is justified by the slow runtime.  If I omit the @@ it gives a low estimate, also justified.  If I trick it into thinking it cannot use the index to satisfy the @@ op, then it gives a low estimate and low runtime, applying the @@ in the filter step and only the fast = in the bitmap index scan.  

Now it could use the index for the fast = operation and apply the @@ in the recheck, rather than the filter.  But I guess it doesn't think of that, despite knowing that applying the @@ in index operation is slow.  

So it seems to be missing a trick someplace, but I don't if it reasonable to expect it to find that trick, or how easy/hard it would be to implement.

The proposed selectivity estimate improvement patch for @@ does not fix this (and evaluating that patch was how I found this issue.)


Set up:

create table foobar as 
select (
  select 
    array_agg(floor(sqrt(random()*10000000+g.y/1000000+f.x/10000000))::int) 
    from generate_series(1,10) g(y) 
  ) foo
  from generate_series(1,10000000) f(x);
create index on foobar using gin(foo gin__int_ops);
analyze;

You can knock an order of magnitude off from the table size and should still be able to see the problem.

example:

explain(analyze, buffers) select * from foobar where foo = '{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}' and foo @@ '!1'::query_int;

                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Seq Scan on foobar  (cost=0.00..263637.00 rows=1 width=61) (actual time=9717.957..9717.957 rows=0 loops=1)
   Filter: ((foo @@ '!1'::query_int) AND (foo = '{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}'::integer[]))
   Rows Removed by Filter: 10000000
   Buffers: shared hit=64 read=113573
   I/O Timings: read=361.402
 Planning time: 0.101 ms
 Execution time: 9717.991 ms
(7 rows)


explain(analyze, buffers) select * from foobar where foo = '{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}' ;
                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on foobar  (cost=112.01..116.02 rows=1 width=61) (actual time=0.027..0.027 rows=0 loops=1)
   Recheck Cond: (foo = '{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}'::integer[])
   Buffers: shared hit=21
   ->  Bitmap Index Scan on foobar_foo_idx  (cost=0.00..112.01 rows=1 width=0) (actual time=0.023..0.023 rows=0 loops=1)
         Index Cond: (foo = '{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}'::integer[])
         Buffers: shared hit=21
 Planning time: 0.097 ms
 Execution time: 0.061 ms

If I trick it into thinking the @@ operator cannot be used in th eindex, then it gets really fast again:

explain(analyze, buffers) select * from foobar where foo = '{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}' and foo||'{}' @@ '!1'::query_int;
                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on foobar  (cost=112.01..116.03 rows=1 width=61) (actual time=0.082..0.082 rows=0 loops=1)
   Recheck Cond: (foo = '{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}'::integer[])
   Filter: ((foo || '{}'::integer[]) @@ '!1'::query_int)
   Buffers: shared hit=21
   ->  Bitmap Index Scan on foobar_foo_idx  (cost=0.00..112.01 rows=1 width=0) (actual time=0.080..0.080 rows=0 loops=1)
         Index Cond: (foo = '{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}'::integer[])
         Buffers: shared hit=21
 Planning time: 0.139 ms
 Execution time: 0.129 ms

Cheers,

Jeff

Re: intarray planning/exeuction problem with multiple operators

From
Uriy Zhuravlev
Date:
On Sunday 12 July 2015 23:12:41 Jeff Janes wrote:
> I've found an interesting performance problem in the intarray extension
> module.  It doesn't seem to be version dependent, I've verified it in 9.4.4
> and 9.6devel.

Hello.
Look at my patch. Maybe he solves this problem.
https://commitfest.postgresql.org/5/253/

-- 
Uriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: intarray planning/exeuction problem with multiple operators

From
Tom Lane
Date:
Jeff Janes <jeff.janes@gmail.com> writes:
> I've found an interesting performance problem in the intarray extension
> module.  It doesn't seem to be version dependent, I've verified it in 9.4.4
> and 9.6devel.

Seems like this isn't specifically an issue for intarray, but rather one
with the core GIN code not being smart about the combination of selective
and unselective index conditions.  In particular, it seems like the smart
thing for GIN to do with this example is just ignore the @@ condition
altogether so far as the index search goes, and mark all the results as
needing recheck so that the @@ operator gets applied as a filter.

You could also complain about the core planner not considering leaving
some potentially-indexable quals out of the actual index condition,
but TBH I don't see that that would be a useful use of planner cycles.
In almost every case it would mean that if there are K quals potentially
usable with a given index, we'd cost out 2^K-1 index paths and immediately
reject all but the use-em-all alternative.  That's a lot of cycles to
spend to handle a corner case.  It's always been assumed to be the index
AM's problem to optimize use of the index quals it's handed, and I don't
see why that shouldn't be true here.

> The proposed selectivity estimate improvement patch for @@ does not fix
> this (and evaluating that patch was how I found this issue.)

Nah, it wouldn't: the cost estimates are correct, in the sense that
gincostestimate realizes that GIN will be horribly stupid about this.
        regards, tom lane