GIN indexscans versus equality selectivity estimation - Mailing list pgsql-hackers

From Tom Lane
Subject GIN indexscans versus equality selectivity estimation
Date
Msg-id 631.1294616287@sss.pgh.pa.us
Whole thread Raw
Responses Re: GIN indexscans versus equality selectivity estimation  (Josh Berkus <josh@agliodbs.com>)
Re: GIN indexscans versus equality selectivity estimation  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Whilst fooling around with GIN over the past few days, I noticed the
following rather surprising behavior:

regression=# create table t1 (f1 int[]);
CREATE TABLE
regression=# insert into t1 values (array[42]);
INSERT 0 1
regression=# create index ti1 on t1 using gin (f1);
CREATE INDEX
regression=# set enable_seqscan to 0;
SET
regression=# explain select * from t1 where f1 = array[42];                             QUERY PLAN
        
 
-----------------------------------------------------------------------Seq Scan on t1
(cost=10000000000.00..10000000001.01rows=1 width=32)  Filter: (f1 = '{42}'::integer[])
 
(2 rows)

The system absolutely will not use the index in this state, even though
the GIN array opclass does support equality.  I dug around and
eventually figured out why:

1. We don't have any pg_stats statistics in this state; but we do know
reltuples = 1, since the CREATE INDEX helpfully updated the pg_class row
with that information.  Without stats, eqsel() falls back to a
selectivity estimate of 1/numdistinct; and without stats,
get_variable_numdistinct estimates the number of distinct values as
equal to reltuples, if that's a small number.  The upshot is that we get
a selectivity estimate of exactly 1.0 for the equality condition.

2. In indxpath.c, we have the following comment and code about selecting
which paths to generate bitmap scan paths for:
    * ... Also, pick out the ones that might be useful    * as bitmap scans.  For that, we must discard indexes that
don'tsupport    * bitmap scans, and we also are only interested in paths that have some    * selectivity; we should
discardanything that was generated solely for    * ordering purposes.
 
       if (ipath->indexinfo->amhasgetbitmap &&           ipath->indexselectivity < 1.0 &&
!ScanDirectionIsBackward(ipath->indexscandir))          bitindexpaths = lappend(bitindexpaths, ipath);
 

Since GIN doesn't support plain indexscans, only bitmap scans, this
means that the planner simply won't use a GIN index unless the estimated
selectivity is less than 1.0.

There are several things we might choose to do about this:

1. Do nothing.  The issue seems quite unlikely to affect anyone in the
field, since in fact "use a seqscan" is probably the right answer
anytime reltuples = 1; and anyway using a GIN index for plain equality
is a corner case to begin with.  However, it could confuse people who
were doing testing (it confused me!).

2. Hack the selectivity estimators so that we don't get exactly 1.0
for this sort of situation.  It might be plausible for
get_variable_numdistinct to set a floor of 2 on its result, for example;
or we could hack eqsel() to bound the no-stats estimate to a bit less
than 1.

3. Change the indxpath.c code to have some different way of determining
whether a path is sensible to use as a bitmap scan.  We could for
instance make the test be "path has some indexquals and/or index is
partial".  Or maybe there should be an exception in the case where the
index type doesn't support plain indexscan.

I'm not really sold on any of these alternatives.  Thoughts?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Cédric Villemain
Date:
Subject: Re: Streaming base backups
Next
From: Tom Lane
Date:
Subject: Re: Compatibility GUC for serializable