Re: [GENERAL] Fwd: Planner oversight for GIN indices? - Mailing list pgsql-general

From Tom Lane
Subject Re: [GENERAL] Fwd: Planner oversight for GIN indices?
Date
Msg-id 26525.1501710466@sss.pgh.pa.us
Whole thread Raw
In response to [GENERAL] Fwd: Planner oversight for GIN indices?  (Kurt Kartaltepe <kkartaltepe@gmail.com>)
List pgsql-general
Kurt Kartaltepe <kkartaltepe@gmail.com> writes:
> ... I feel like this is potentially a question more for pgsql-hackers but
> the mailing list suggests asking elsewhere before posting there and
> this wasnt quite a "bug". A quick uninformed peek at the planner code
> makes me think this isn't exactly trivial but from the "simplicity" of
> the workaround id hope it is possible. This seems like an issue that
> would affect all inverse indices or more generally any index where
> multiple clauses against the same index might have different
> performance characteristics that could be determined at plan time (so
> only for constant restrictions).

Yeah, the planner's traditional behavior here is just to throw every
potentially indexable clause into the indexqual list and let the index AM
sort things out at runtime.  This is demonstrably the best thing to do
for btree, where the AM itself can identify redundant or contradictory
scan clauses with 100% accuracy at scan start.  I think that GIN might
be the only case where including seriously-unselective quals is really
a big loser --- the cause of that being that it might have to iterate
over some very long posting lists.  We did some work recently to make
GIN smarter about combinations of long and short posting lists, but it
seems that it's still not smart enough to cover this case completely.

I'm not sure offhand whether it's better to make the planner try to
identify indexable clauses it should not hand to the index AM, or to
insist that the index AM ought to be smart enough to cope.  The difficulty
with the former approach is that the planner can't be counted on to do the
right thing if there are non-constant comparison values, plus if there are
a lot of potentially-indexable clauses it's not cheap to consider all
combinations.  The difficulty with the latter approach is that the index
AM might not have the necessary information either.  If the best it can do
is to look into the planner's statistics, that feels a bit duplicative.
(But it might still be a win, because we'd definitely have the actual
comparison value at runtime.)

Anyway, sorry, this is a research problem rather than something that's
easy to fix.

            regards, tom lane


pgsql-general by date:

Previous
From: Rob Sargent
Date:
Subject: Re: [GENERAL] bidirectional mapping?
Next
From: Peter Koukoulis
Date:
Subject: [GENERAL] select md5 result set