Thread: [GENERAL] Fwd: Planner oversight for GIN indices?

[GENERAL] Fwd: Planner oversight for GIN indices?

From
Kurt Kartaltepe
Date:
On postgres 9.6 and 9.5 I have tested a structure like this

create table test (data text);
create index on test using gin(data gin_trgm_ops);
insert into test select md5(num::text) from generate_series(0,
1000000) as A(num);
analyze test;

explain select * from test where data like '%a%';
explain select * from test where data like '%abc%';
explain select * from test where data like '%abc%' and data like
'%a%'; -- Incorrect plan chosen

The final query will erroneously (in my opinion) attempt an index scan
for both clauses, on my machine this is marginally better than a
sequential scan for both clauses. However the correct and
significantly faster plan in a handful of cases including this one is
produced by this workaround found with help from #postgresql.

explain select * from test where data like '%abc%' and (data||'') like
'%a%'; -- Proper plan chosen

This causes the planner consider and then correctly pick the plan that
does an index scan for the GIN accelerated '%abc%' restriction and
then do filter on the remaining results for the '%a%' restriction.

I feel like this is potentially a question more for pgsql-hackers but
the mailing list suggests asking elsewhere before posting there and
this wasnt quite a "bug". A quick uninformed peek at the planner code
makes me think this isn't exactly trivial but from the "simplicity" of
the workaround id hope it is possible. This seems like an issue that
would affect all inverse indices or more generally any index where
multiple clauses against the same index might have different
performance characteristics that could be determined at plan time (so
only for constant restrictions).

--Kurt Kartaltepe


Re: [GENERAL] Fwd: Planner oversight for GIN indices?

From
Tom Lane
Date:
Kurt Kartaltepe <kkartaltepe@gmail.com> writes:
> ... I feel like this is potentially a question more for pgsql-hackers but
> the mailing list suggests asking elsewhere before posting there and
> this wasnt quite a "bug". A quick uninformed peek at the planner code
> makes me think this isn't exactly trivial but from the "simplicity" of
> the workaround id hope it is possible. This seems like an issue that
> would affect all inverse indices or more generally any index where
> multiple clauses against the same index might have different
> performance characteristics that could be determined at plan time (so
> only for constant restrictions).

Yeah, the planner's traditional behavior here is just to throw every
potentially indexable clause into the indexqual list and let the index AM
sort things out at runtime.  This is demonstrably the best thing to do
for btree, where the AM itself can identify redundant or contradictory
scan clauses with 100% accuracy at scan start.  I think that GIN might
be the only case where including seriously-unselective quals is really
a big loser --- the cause of that being that it might have to iterate
over some very long posting lists.  We did some work recently to make
GIN smarter about combinations of long and short posting lists, but it
seems that it's still not smart enough to cover this case completely.

I'm not sure offhand whether it's better to make the planner try to
identify indexable clauses it should not hand to the index AM, or to
insist that the index AM ought to be smart enough to cope.  The difficulty
with the former approach is that the planner can't be counted on to do the
right thing if there are non-constant comparison values, plus if there are
a lot of potentially-indexable clauses it's not cheap to consider all
combinations.  The difficulty with the latter approach is that the index
AM might not have the necessary information either.  If the best it can do
is to look into the planner's statistics, that feels a bit duplicative.
(But it might still be a win, because we'd definitely have the actual
comparison value at runtime.)

Anyway, sorry, this is a research problem rather than something that's
easy to fix.

            regards, tom lane