Re: btree_gin and BETWEEN - Mailing list pgsql-hackers

From Tom Lane
Subject Re: btree_gin and BETWEEN
Date
Msg-id 6776.1435087214@sss.pgh.pa.us
Whole thread Raw
In response to btree_gin and BETWEEN  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers
Jeff Janes <jeff.janes@gmail.com> writes:
> If I use the btree_gin extension to build a gin index on a scalar value, it
> doesn't work well with BETWEEN queries.  It looks like it scans the whole
> index, with the part of the index between the endpoints getting scanned
> twice.  It is basically executed as if "col1 between x and y" were "col1
> between -inf and y and col1 between x and +inf".

> It puts the correct tuples into the bitmap, because whichever inequality is
> not being used to set the query endpoint currently is used as a filter
> instead.

> So I could just not build that index.  But I want it for other reasons, and
> the problem is that the planner thinks the index can implement the BETWEEN
> query efficiently.  So even if it has truly better options available, it
> switches to using a falsely attractive btree_gin index.

> create table foo as select random() as btree, random() as gin from
> generate_series(1,3000000);
> create index on foo using gin (gin);
> create index on foo using btree (btree);
> explain ( analyze, buffers) select count(*) from foo where btree between
> 0.001 and 0.00105;
> explain ( analyze, buffers) select count(*) from foo where gin between
> 0.001 and 0.00105;

> It would be nice if btree_gin supported BETWEEN and other range queries
> efficiently, or at least if the planner knew it couldn't support them
> efficiently.  But I don't see where to begin on either one of these tasks.
> Is either one of them plausible?

ISTM this is a bug/oversight in the core GIN code: it has multiple
GinScanEntry's for the same index column, but fails to realize that it
could start the index scans at the largest match value of any of those
GinScanEntry's.  This did not matter so much before the partial-match
feature went in; but with a partial match we might be talking about
scanning a large fraction of the index, and it's important to optimize
the start point if possible.  I think there is also a failure to notice
when we could stop the scans because one of a group of AND'ed entries
says no more matches are possible.

I'm not sure where the best place to hack in this consideration would
be, though.  Oleg, Teodor, any thoughts?
        regards, tom lane



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Should we back-patch SSL renegotiation fixes?
Next
From: Jim Nasby
Date:
Subject: Re: less log level for success dynamic background workers for 9.5