Thread: BUG #4798: BitMapAnd never works with gin

BUG #4798: BitMapAnd never works with gin

From
"Alex"
Date:
The following bug has been logged online:

Bug reference:      4798
Logged by:          Alex
Email address:      alex@xdcom.org
PostgreSQL version: 8.3.6
Operating system:   rhel5
Description:        BitMapAnd never works with gin
Details:

CREATE TABLE foo
(
  id serial NOT NULL,
  name character varying(32),
  nick character varying(32),
  gender integer
)WITH (OIDS=FALSE);

CREATE INDEX name_idx
  ON foo
  USING gin(to_tsvector('english'::regconfig, name))
  WHERE gender = 1;

CREATE INDEX nick_idx
  ON foo
  USING gin(to_tsvector('english'::regconfig, nick))
  WHERE gender = 1;
-----------------------------------------

"select count(0) from foo where gender"
-----------------------------------------
100,000.

"select relname,relpages,relkind,reltuples from pg_class where relname ~
'name_idx';"
-----------------------------------------
 name_idx |      280 | i       |    100000

"SELECT count(id) FROM foo where gender = 1  and
"to_tsvector('english',name) @@ 'alex'"
-----------------------------------------
4000
"SELECT count(id) FROM foo where gender = 1  and
"to_tsvector('english',nick) @@ 'ali'"
-----------------------------------------
3000

"EXPLAIN SELECT id FROM foo where gender = 1  and
to_tsvector('english',name) @@ 'alex' and to_tsvector('english',nick) @@
'ali';"
-----------------------------------------
 Bitmap Heap Scan on foo  (cost=4.37..63.85 rows=1 width=4)
   Recheck Cond: ((to_tsvector('english'::regconfig, (name)::text) @@
'''alex'''::tsquery) AND (gender = 1))
   Filter: (to_tsvector('english'::regconfig, (nick)::text) @@
'''ali'''::tsquery)
   ->  Bitmap Index Scan on name_idx  (cost=0.00..4.37 rows=15 width=0)
         Index Cond: (to_tsvector('english'::regconfig, (name)::text) @@
'''alex'''::tsquery)

The actual cost is exhausting.
I think the gepo should give BitMapAnd better.

Re: BUG #4798: BitMapAnd never works with gin

From
Tom Lane
Date:
"Alex" <alex@xdcom.org> writes:
> CREATE TABLE foo
> (
>   id serial NOT NULL,
>   name character varying(32),
>   nick character varying(32),
>   gender integer
> )WITH (OIDS=FALSE);

> CREATE INDEX name_idx
>   ON foo
>   USING gin(to_tsvector('english'::regconfig, name))
>   WHERE gender = 1;

> CREATE INDEX nick_idx
>   ON foo
>   USING gin(to_tsvector('english'::regconfig, nick))
>   WHERE gender = 1;

GIN is not relevant --- the problem is the WHERE clauses.  The planner
won't use these two indexes together in a BitmapAnd because they have
identical predicates.  Per comments in choose_bitmap_and:

     * We will only consider AND combinations in which no two indexes use the
     * same WHERE clause.  This is a bit of a kluge: it's needed because
     * costsize.c and clausesel.c aren't very smart about redundant clauses.
     * They will usually double-count the redundant clauses, producing a
     * too-small selectivity that makes a redundant AND step look like it
     * reduces the total cost.    Perhaps someday that code will be smarter and
     * we can remove this limitation.  (But note that this also defends
     * against flat-out duplicate input paths, which can happen because
     * best_inner_indexscan will find the same OR join clauses that
     * create_or_index_quals has pulled OR restriction clauses out of.)
     *
     * For the same reason, we reject AND combinations in which an index
     * predicate clause duplicates another clause.    Here we find it necessary
     * to be even stricter: we'll reject a partial index if any of its
     * predicate clauses are implied by the set of WHERE clauses and predicate
     * clauses used so far.  This covers cases such as a condition "x = 42"
     * used with a plain index, followed by a clauseless scan of a partial
     * index "WHERE x >= 40 AND x < 50".  The partial index has been accepted
     * only because "x = 42" was present, and so allowing it would partially
     * double-count selectivity.  (We could use predicate_implied_by on
     * regular qual clauses too, to have a more intelligent, but much more
     * expensive, check for redundancy --- but in most cases simple equality
     * seems to suffice.)

My advice is to drop one or both of the index WHERE clauses --- it's not
apparent that they're really good for much in an example like this.

            regards, tom lane